crumb trail: > ompreview > Concepts review
crumb trail: > ompreview > Concepts review > Basic concepts
crumb trail: > ompreview > Concepts review > Parallel regions
execution by a team
crumb trail: > ompreview > Concepts review > Work sharing
crumb trail: > ompreview > Concepts review > Data scope
crumb trail: > ompreview > Concepts review > Synchronization
crumb trail: > ompreview > Concepts review > Tasks
crumb trail: > ompreview > Review questions
crumb trail: > ompreview > Review questions > Directives
What do the following program output?
\small
int main() { printf("procs %d\n", omp_get_num_procs()); printf("threads %d\n", omp_get_num_threads()); printf("num %d\n", omp_get_thread_num()); return 0; }
int main() { #pragma omp parallel { printf("procs %d\n", omp_get_num_procs()); printf("threads %d\n", omp_get_num_threads()); printf("num %d\n", omp_get_thread_num()); } return 0; }
\small
Program main use omp_lib print *,"Procs:",& omp_get_num_procs() print *,"Threads:",& omp_get_num_threads() print *,"Num:",& omp_get_thread_num() End Program
Program main use omp_lib !$OMP parallel print *,"Procs:",& omp_get_num_procs() print *,"Threads:",& omp_get_num_threads() print *,"Num:",& omp_get_thread_num() !$OMP end parallel End Program
\vfill\pagebreak
crumb trail: > ompreview > Review questions > Parallelism
Can the following loops be parallelized? If so, how? (Assume that all arrays are already filled in, and that there are no out-of-bounds errors.)
\small
// variant #1 for (i=0; i<N; i++) { x[i] = a[i]+b[i+1]; a[i] = 2*x[i] + c[i+1]; }
// variant #2 for (i=0; i<N; i++) { x[i] = a[i]+b[i+1]; a[i] = 2*x[i+1] + c[i+1]; }
// variant #3 for (i=1; i<N; i++) { x[i] = a[i]+b[i+1]; a[i] = 2*x[i-1] + c[i+1]; }
// variant #4 for (i=1; i<N; i++) { x[i] = a[i]+b[i+1]; a[i+1] = 2*x[i-1] + c[i+1]; }
\small
! variant #1 do i=1,N x(i) = a(i)+b(i+1) a(i) = 2*x(i) + c(i+1) end do
! variant #2 do i=1,N x(i) = a(i)+b(i+1) a(i) = 2*x(i+1) + c(i+1) end do
! variant #3 do i=2,N x(i) = a(i)+b(i+1) a(i) = 2*x(i-1) + c(i+1) end do
! variant #3 do i=2,N x(i) = a(i)+b(i+1) a(i+1) = 2*x(i-1) + c(i+1) end do
\vfill\pagebreak
crumb trail: > ompreview > Review questions > Data and synchronization
crumb trail: > ompreview > Review questions > Data and synchronization >
What is the output of the following fragments? Assume that there are four threads.
\small
// variant #1 int nt; #pragma omp parallel { nt = omp_get_thread_num(); printf("thread number: %d\n",nt); }
// variant #2 int nt; #pragma omp parallel private(nt) { nt = omp_get_thread_num(); printf("thread number: %d\n",nt); }
// variant #3 int nt; #pragma omp parallel { #pragma omp single { nt = omp_get_thread_num(); printf("thread number: %d\n",nt); } }
// variant #4 int nt; #pragma omp parallel { #pragma omp master { nt = omp_get_thread_num(); printf("thread number: %d\n",nt); } }
// variant #5 int nt; #pragma omp parallel { #pragma omp critical { nt = omp_get_thread_num(); printf("thread number: %d\n",nt); } }
\small
! variant #1 integer nt !$OMP parallel nt = omp_get_thread_num() print *,"thread number:",nt !$OMP end parallel
! variant #2 integer nt !$OMP parallel private(nt) nt = omp_get_thread_num() print *,"thread number:",nt !$OMP end parallel
! variant #3 integer nt !$OMP parallel !$OMP single nt = omp_get_thread_num() print *,"thread number:",nt !$OMP end single !$OMP end parallel
! variant #4 integer nt !$OMP parallel !$OMP master nt = omp_get_thread_num() print *,"thread number:",nt !$OMP end master !$OMP end parallel
! variant #5 integer nt !$OMP parallel !$OMP critical nt = omp_get_thread_num() print *,"thread number:",nt !$OMP end critical !$OMP end parallel
crumb trail: > ompreview > Review questions > Data and synchronization >
The following is an attempt to parallelize a serial code. Assume that all variables and arrays are defined. What errors and potential problems do you see in this code? How would you fix them?
\small
#pragma omp parallel { x = f(); #pragma omp for for (i=0; i<N; i++) y[i] = g(x,i); z = h(y); }
!$OMP parallel x = f() !$OMP do do i=1,N y(i) = g(x,i) end do !$OMP end do z = h(y) !$OMP end parallel
\vfill\pagebreak
crumb trail: > ompreview > Review questions > Data and synchronization >
Assume two threads. What does the following program output?
int a; #pragma omp parallel private(a) { ... a = 0; #pragma omp for for (int i = 0; i < 10; i++) { #pragma omp atomic a++; } #pragma omp single printf("a=%e\n",a); }
crumb trail: > ompreview > Review questions > Reductions
crumb trail: > ompreview > Review questions > Reductions >
Is the following code correct? Is it efficient? If not, can you improve it?
#pragma omp parallel shared(r) { int x; x = f(omp_get_thread_num()); #pragma omp critical r += f(x); }
crumb trail: > ompreview > Review questions > Reductions >
Compare two fragments:
// variant 1 #pragma omp parallel reduction(+:s) #pragma omp for for (i=0; i<N; i++) s += f(i);
// variant 2 #pragma omp parallel #pragma omp for reduction(+:s) for (i=0; i<N; i++) s += f(i);
! variant 1 !$OMP parallel reduction(+:s) !$OMP do do i=1,N s += f(i); end do !$OMP end do !$OMP end parallel
! variant 2 !$OMP parallel !$OMP do reduction(+:s) do i=1,N s += f(i); end do !$OMP end do !$OMP end parallel
Do they compute the same thing?
\vfill\pagebreak
crumb trail: > ompreview > Review questions > Barriers
Are the following two code fragments well defined?
#pragma omp parallel { #pragma omp for for (mytid=0; mytid<nthreads; mytid++) x[mytid] = some_calculation(); #pragma omp for for (mytid=0; mytid<nthreads-1; mytid++) y[mytid] = x[mytid]+x[mytid+1]; }
#pragma omp parallel { #pragma omp for for (mytid=0; mytid<nthreads; mytid++) x[mytid] = some_calculation(); #pragma omp for nowait for (mytid=0; mytid<nthreads-1; mytid++) y[mytid] = x[mytid]+x[mytid+1]; }
crumb trail: > ompreview > Review questions > Data scope
The following program is supposed to initialize as many rows of the array as there are threads.
\small
int main() { int i,icount,iarray[100][100]; icount = -1; #pragma omp parallel private(i) { #pragma omp critical { icount++; } for (i=0; i<100; i++) iarray[icount][i] = 1; } return 0; }
Program main integer :: i,icount,iarray(100,100) icount = 0 !$OMP parallel private(i) !$OMP critical icount = icount + 1 !$OMP end critical do i=1,100 iarray(icount,i) = 1 end do !$OMP end parallel End program
Describe the behavior of the program, with argumentation,
What do you think of this solution:
\small
#pragma omp parallel private(i) shared(icount) { #pragma omp critical { icount++; for (i=0; i<100; i++) iarray[icount][i] = 1; } } return 0; }
!$OMP parallel private(i) shared(icount) !$OMP critical icount = icount+1 do i=1,100 iarray(icount,i) = 1 end do !$OMP critical !$OMP end parallel
crumb trail: > ompreview > Review questions > Tasks
Fix two things in the following example:
\small
#pragma omp parallel #pragma omp single { int x,y,z; #pragma omp task x = f(); #pragma omp task y = g(); #pragma omp task z = h(); printf("sum=%d\n",x+y+z); }
integer :: x,y,z !$OMP parallel !$OMP single !$OMP task x = f() !$OMP end task !$OMP task y = g() !$OMP end task !$OMP task z = h() !$OMP end task print *,"sum=",x+y+z !$OMP end single !$OMP end parallel
crumb trail: > ompreview > Review questions > Scheduling
Compare these two fragments. Do they compute the same result? What can you say about their efficiency?
#pragma omp parallel #pragma omp single { for (i=0; i<N; i++) { #pragma omp task x[i] = f(i) } #pragma omp taskwait }
#pragma omp parallel #pragma omp for schedule(dynamic) { for (i=0; i<N; i++) { x[i] = f(i) } }
How would you make the second loop more efficient? Can you do something similar for the first loop?