crumb trail: > ompreview > Concepts review
crumb trail: > ompreview > Concepts review > Basic concepts
crumb trail: > ompreview > Concepts review > Parallel regions
execution by a team
crumb trail: > ompreview > Concepts review > Work sharing
crumb trail: > ompreview > Concepts review > Data scope
crumb trail: > ompreview > Concepts review > Synchronization
crumb trail: > ompreview > Concepts review > Tasks
crumb trail: > ompreview > Review questions
crumb trail: > ompreview > Review questions > Directives
What do the following program output?
\small
int main() {
printf("procs %d\n",
omp_get_num_procs());
printf("threads %d\n",
omp_get_num_threads());
printf("num %d\n",
omp_get_thread_num());
return 0;
}
int main() {
#pragma omp parallel
{
printf("procs %d\n",
omp_get_num_procs());
printf("threads %d\n",
omp_get_num_threads());
printf("num %d\n",
omp_get_thread_num());
}
return 0;
}
\small
Program main
use omp_lib
print *,"Procs:",&
omp_get_num_procs()
print *,"Threads:",&
omp_get_num_threads()
print *,"Num:",&
omp_get_thread_num()
End Program
Program main
use omp_lib
!$OMP parallel
print *,"Procs:",&
omp_get_num_procs()
print *,"Threads:",&
omp_get_num_threads()
print *,"Num:",&
omp_get_thread_num()
!$OMP end parallel
End Program
\vfill\pagebreak
crumb trail: > ompreview > Review questions > Parallelism
Can the following loops be parallelized? If so, how? (Assume that all arrays are already filled in, and that there are no out-of-bounds errors.)
\small
// variant #1
for (i=0; i<N; i++) {
x[i] = a[i]+b[i+1];
a[i] = 2*x[i] + c[i+1];
}
// variant #2
for (i=0; i<N; i++) {
x[i] = a[i]+b[i+1];
a[i] = 2*x[i+1] + c[i+1];
}
// variant #3
for (i=1; i<N; i++) {
x[i] = a[i]+b[i+1];
a[i] = 2*x[i-1] + c[i+1];
}
// variant #4
for (i=1; i<N; i++) {
x[i] = a[i]+b[i+1];
a[i+1] = 2*x[i-1] + c[i+1];
}
\small
! variant #1 do i=1,N x(i) = a(i)+b(i+1) a(i) = 2*x(i) + c(i+1) end do
! variant #2 do i=1,N x(i) = a(i)+b(i+1) a(i) = 2*x(i+1) + c(i+1) end do
! variant #3 do i=2,N x(i) = a(i)+b(i+1) a(i) = 2*x(i-1) + c(i+1) end do
! variant #3 do i=2,N x(i) = a(i)+b(i+1) a(i+1) = 2*x(i-1) + c(i+1) end do
\vfill\pagebreak
crumb trail: > ompreview > Review questions > Data and synchronization
crumb trail: > ompreview > Review questions > Data and synchronization >
What is the output of the following fragments? Assume that there are four threads.
\small
// variant #1
int nt;
#pragma omp parallel
{
nt = omp_get_thread_num();
printf("thread number: %d\n",nt);
}
// variant #2
int nt;
#pragma omp parallel private(nt)
{
nt = omp_get_thread_num();
printf("thread number: %d\n",nt);
}
// variant #3
int nt;
#pragma omp parallel
{
#pragma omp single
{
nt = omp_get_thread_num();
printf("thread number: %d\n",nt);
}
}
// variant #4
int nt;
#pragma omp parallel
{
#pragma omp master
{
nt = omp_get_thread_num();
printf("thread number: %d\n",nt);
}
}
// variant #5
int nt;
#pragma omp parallel
{
#pragma omp critical
{
nt = omp_get_thread_num();
printf("thread number: %d\n",nt);
}
}
\small
! variant #1 integer nt !$OMP parallel nt = omp_get_thread_num() print *,"thread number:",nt !$OMP end parallel
! variant #2 integer nt !$OMP parallel private(nt) nt = omp_get_thread_num() print *,"thread number:",nt !$OMP end parallel
! variant #3
integer nt
!$OMP parallel
!$OMP single
nt = omp_get_thread_num()
print *,"thread number:",nt
!$OMP end single
!$OMP end parallel
! variant #4
integer nt
!$OMP parallel
!$OMP master
nt = omp_get_thread_num()
print *,"thread number:",nt
!$OMP end master
!$OMP end parallel
! variant #5
integer nt
!$OMP parallel
!$OMP critical
nt = omp_get_thread_num()
print *,"thread number:",nt
!$OMP end critical
!$OMP end parallel
crumb trail: > ompreview > Review questions > Data and synchronization >
The following is an attempt to parallelize a serial code. Assume that all variables and arrays are defined. What errors and potential problems do you see in this code? How would you fix them?
\small
#pragma omp parallel
{
x = f();
#pragma omp for
for (i=0; i<N; i++)
y[i] = g(x,i);
z = h(y);
}
!$OMP parallel
x = f()
!$OMP do
do i=1,N
y(i) = g(x,i)
end do
!$OMP end do
z = h(y)
!$OMP end parallel
\vfill\pagebreak
crumb trail: > ompreview > Review questions > Data and synchronization >
Assume two threads. What does the following program output?
int a;
#pragma omp parallel private(a) {
...
a = 0;
#pragma omp for
for (int i = 0; i < 10; i++)
{
#pragma omp atomic
a++; }
#pragma omp single
printf("a=%e\n",a);
}
crumb trail: > ompreview > Review questions > Reductions
crumb trail: > ompreview > Review questions > Reductions >
Is the following code correct? Is it efficient? If not, can you improve it?
#pragma omp parallel shared(r)
{
int x;
x = f(omp_get_thread_num());
#pragma omp critical
r += f(x);
}
crumb trail: > ompreview > Review questions > Reductions >
Compare two fragments:
// variant 1
#pragma omp parallel reduction(+:s)
#pragma omp for
for (i=0; i<N; i++)
s += f(i);
// variant 2
#pragma omp parallel
#pragma omp for reduction(+:s)
for (i=0; i<N; i++)
s += f(i);
! variant 1
!$OMP parallel reduction(+:s)
!$OMP do
do i=1,N
s += f(i);
end do
!$OMP end do
!$OMP end parallel
! variant 2
!$OMP parallel
!$OMP do reduction(+:s)
do i=1,N
s += f(i);
end do
!$OMP end do
!$OMP end parallel
Do they compute the same thing?
\vfill\pagebreak
crumb trail: > ompreview > Review questions > Barriers
Are the following two code fragments well defined?
#pragma omp parallel
{
#pragma omp for
for (mytid=0; mytid<nthreads; mytid++)
x[mytid] = some_calculation();
#pragma omp for
for (mytid=0; mytid<nthreads-1; mytid++)
y[mytid] = x[mytid]+x[mytid+1];
}
#pragma omp parallel
{
#pragma omp for
for (mytid=0; mytid<nthreads; mytid++)
x[mytid] = some_calculation();
#pragma omp for nowait
for (mytid=0; mytid<nthreads-1; mytid++)
y[mytid] = x[mytid]+x[mytid+1];
}
crumb trail: > ompreview > Review questions > Data scope
The following program is supposed to initialize as many rows of the array as there are threads.
\small
int main() {
int i,icount,iarray[100][100];
icount = -1;
#pragma omp parallel private(i)
{
#pragma omp critical
{ icount++; }
for (i=0; i<100; i++)
iarray[icount][i] = 1;
}
return 0;
}
Program main
integer :: i,icount,iarray(100,100)
icount = 0
!$OMP parallel private(i)
!$OMP critical
icount = icount + 1
!$OMP end critical
do i=1,100
iarray(icount,i) = 1
end do
!$OMP end parallel
End program
Describe the behavior of the program, with argumentation,
What do you think of this solution:
\small
#pragma omp parallel private(i) shared(icount)
{
#pragma omp critical
{ icount++;
for (i=0; i<100; i++)
iarray[icount][i] = 1;
}
}
return 0;
}
!$OMP parallel private(i) shared(icount)
!$OMP critical
icount = icount+1
do i=1,100
iarray(icount,i) = 1
end do
!$OMP critical
!$OMP end parallel
crumb trail: > ompreview > Review questions > Tasks
Fix two things in the following example:
\small
#pragma omp parallel
#pragma omp single
{
int x,y,z;
#pragma omp task
x = f();
#pragma omp task
y = g();
#pragma omp task
z = h();
printf("sum=%d\n",x+y+z);
}
integer :: x,y,z !$OMP parallel !$OMP single !$OMP task x = f() !$OMP end task !$OMP task y = g() !$OMP end task !$OMP task z = h() !$OMP end task print *,"sum=",x+y+z !$OMP end single !$OMP end parallel
crumb trail: > ompreview > Review questions > Scheduling
Compare these two fragments. Do they compute the same result? What can you say about their efficiency?
#pragma omp parallel
#pragma omp single
{
for (i=0; i<N; i++) {
#pragma omp task
x[i] = f(i)
}
#pragma omp taskwait
}
#pragma omp parallel
#pragma omp for schedule(dynamic)
{
for (i=0; i<N; i++) {
x[i] = f(i)
}
}
How would you make the second loop more efficient? Can you do something similar for the first loop?