OpenMP topic: Parallel regions

Experimental html version of Parallel Programming in MPI, OpenMP, and PETSc by Victor Eijkhout. download the textbook at https:/theartofhpc.com/pcse
\[ \newcommand\inv{^{-1}}\newcommand\invt{^{-t}} \newcommand\bbP{\mathbb{P}} \newcommand\bbR{\mathbb{R}} \newcommand\defined{ \mathrel{\lower 5pt \hbox{${\equiv\atop\mathrm{\scriptstyle D}}$}}} \] 18.1 : Creating parallelism with parallel regions
18.2 : Nested parallelism
18.2.1 : Subprograms with parallel regions
18.2.2 : Fine control
18.3 : Cancel parallel construct
18.4 : Review questions
Back to Table of Contents

18 OpenMP topic: Parallel regions

18.1 Creating parallelism with parallel regions

crumb trail: > omp-parallel > Creating parallelism with parallel regions

In OpenMP you need to indicate explicitly what passages are parallel. Creating parallelism, which here means: creating a team of threads, is done with the parallel pragma. A block preceded by the omp parallel pragma is called a parallel region ; it is executed by a newly created team of threads. This is an instance of the SPMD model: all threads execute the same segment of code.

#pragma omp parallel
{
  // this is executed by a team of threads
}

It would be pointless to have the block be executed identically by all threads. One way to get a meaningful parallel code is to use the function omp_get_thread_num to find out which thread you are, and execute work that is individual to that thread. This function gives a number relative to the current team; recall from figure  17.3 that new teams can be created recursively.

There is also a function omp_get_num_threads to find out the total number of threads.

The first thing we want to do is create a team of threads. This is done with a parallel region  . Here is a very simple example:

// hello.c
#pragma omp parallel
  {
    int t = omp_get_thread_num();
    printf("Hello world from %d!\n",t);
  }
or in Fortran
!! hellocount.F90
!$omp parallel
  nthreads = omp_get_num_threads()
  mythread = omp_get_thread_num()
  write(*,'("Hello from",i3," out of",i3)') mythread,nthreads
!$omp end parallel
or in C++
// hello.cxx
#pragma omp parallel
  {
    int t = omp_get_thread_num();
    stringstream proctext;
    proctext << "Hello world from " << t << endl;
    cerr << proctext.str();
  }
(Note the use of stringstream : without that the output lines from the various threads may get mixed up.)

The following example uses parallelism for an actual calculation:

result = f(x)+g(x)+h(x)

you could parallelize this as

double result,fresult,gresult,hresult;
#pragma omp parallel
{ int num = omp_get_thread_num();
  if (num==0)      fresult = f(x);
  else if (num==1) gresult = g(x);
  else if (num==2) hresult = h(x);
}
result = fresult + gresult + hresult;

This code corresponds to the model we just discussed:

Remark In future versions of OpenMP, the master thread will be called the primary thread  . In 5.1 the master construct will be deprecated, and masked (with added functionality) will take its place. In 6.0 master will disappear from the Spec, including proc_bind master “variable” and combined master constructs (master taskloop, etc.)
End of remark

Exercise What happens if you call omp_get_thread_num and omp_get_num_threads outside a parallel region?
End of exercise

18.2 Nested parallelism

crumb trail: > omp-parallel > Nested parallelism

What happens if you call a function from inside a parallel region, and that function itself contains a parallel region?

int main() {
  ...
#pragma omp parallel
  {
  ...
  func(...)
  ...
  }
} // end of main
void func(...) {
#pragma omp parallel
  {
  ...
  }
}

By default, the nested parallel region will have only one thread. To allow nested thread creation, use the environment variable OMP_MAX_ACTIVE_LEVELS (default: 1) to set the number of levels of parallel nesting. Equivalently, there are functions omp_set_max_active_levels and omp_get_max_active_levels :

OMP_MAX_ACTIVE_LEVELS=3

or

void omp_set_max_active_levels(int);
int omp_get_max_active_levels(void);

Remark A deprecated mechanism is to set \indexompdepr{OMP_NESTED} (default: false):

OMP_NESTED=true
 or
omp_set_nested(1)


End of remark

Nested parallelism can happen with nested loops, but it's also possible to have a sections construct and a loop nested. Example: \csnippetwithoutput{ompsectionnest}{code/omp/c}{sectionnest}

18.2.1 Subprograms with parallel regions

crumb trail: > omp-parallel > Nested parallelism > Subprograms with parallel regions

A common application of nested parallelism is the case where you have a subprogram with a parallel region, which itself gets called from a parallel region.

Exercise Test nested parallelism by writing an OpenMP program as follows:

  1. Write a subprogram that contains a parallel region.

  2. Write a main program with a parallel region; call the subprogram both inside and outside the parallel region.

  3. Insert print statements

    1. in the main program outside the parallel region,

    2. in the parallel region in the main program,

    3. in the subprogram outside the parallel region,

    4. in the parallel region inside the subprogram.

Run your program and count how many print statements of each type you get.
End of exercise

Writing subprograms that are called in a parallel region illustrates the following point: directives are evaluation with respect to the dynamic scope parallel region, not just the lexical scope. In the following example:

#pragma omp parallel
{
  f();
}
void f() {
#pragma omp for
  for ( .... ) {
    ...
  }
}

the body of the function  f falls in the dynamic scope of the parallel region, so the for loop will be parallelized.

If the function may be called both from inside and outside parallel regions, you can test which is the case with omp_in_parallel  .

18.2.2 Fine control

crumb trail: > omp-parallel > Nested parallelism > Fine control

The amount of nested parallelism can be set:

OMP_NUM_THREADS=4,2

means that initially a parallel region will have four threads, and each thread can create two more threads.

OMP_MAX_ACTIVE_LEVELS=123

  omp_set_max_active_levels( n )
  n = omp_get_max_active_levels()

  OMP_THREAD_LIMIT=123

  n = omp_get_thread_limit()

  omp_set_max_active_levels
  omp_get_max_active_levels
  omp_get_level
  omp_get_active_level
  omp_get_ancestor_thread_num

  omp_get_team_size(level)

18.3 Cancel parallel construct

crumb trail: > omp-parallel > Cancel parallel construct

It is possible to terminate a parallel construct early with the cancel directive:

!$omp cancel construct [if (expr)]

where construct is

parallel  ,

sections  ,

do

or

taskgroup  .

See section  30.3 for an example.

Cancelling is disabled by default for performance reasons. To activate it, set the OMP_CANCELLATION variable to true.

The state of cancellation can be queried with omp_get_cancellation  , but there is no function to set it.

Cancellation can happen at most obvious places where OpenMP is active, but additional cancellation points can be set with

#pragma omp cancellation point <construct>

where the construct is parallel  , sections  , for  , do  , taskgroup  .

18.4 Review questions

crumb trail: > omp-parallel > Review questions

Exercise T/F? The function omp_get_num_threads() returns a number that is equal to the number of cores.
End of exercise

Exercise T/F? The function omp_set_num_threads() can not be set to a higher number than the number of cores.
End of exercise

Exercise What function can be used to detect the number of cores?
End of exercise

Back to Table of Contents