OpenMP topic: Parallel regions

Experimental html version of Parallel Programming in MPI, OpenMP, and PETSc by Victor Eijkhout. download the textbook at https:/
\[ \newcommand\inv{^{-1}}\newcommand\invt{^{-t}} \newcommand\bbP{\mathbb{P}} \newcommand\bbR{\mathbb{R}} \newcommand\defined{ \mathrel{\lower 5pt \hbox{${\equiv\atop\mathrm{\scriptstyle D}}$}}} \] 18.1 : Creating parallelism with parallel regions
18.2 : Nested parallelism
18.2.1 : Subprograms with parallel regions
18.3 : Cancel parallel construct
18.4 : Review questions
Back to Table of Contents

18 OpenMP topic: Parallel regions

18.1 Creating parallelism with parallel regions

crumb trail: > omp-parallel > Creating parallelism with parallel regions

In OpenMP you need to indicate explicitly what passages are parallel. Creating parallelism, which here means: creating a team of threads, is done with the parallel pragma. A block preceded by the omp parallel pragma is called a parallel region ; it is executed by a newly created team of threads. This is an instance of the SPMD model: all threads execute the same segment of code.

#pragma omp parallel
  // this is executed by a team of threads

It would be pointless to have the block be executed identically by all threads. One way to get a meaningful parallel code is to use the function omp_get_thread_num to find out which thread you are, and execute work that is individual to that thread. This function gives a number relative to the current team; recall from figure  17.3 that new teams can be created recursively.

There is also a function omp_get_num_threads to find out the total number of threads.

The first thing we want to do is create a team of threads. This is done with a parallel region  . Here is a very simple example where each thread outputs its number: \csnippetwithoutput{hello-omp}{examples/omp/c}{hello} or in Fortran \fsnippetwithoutput{hello-omp-f}{examples/omp/f}{hellof}

C++ note The use of cout may give jumbled output:\slidebreak lines can break at each << .\slidebreak Use stringstream to form a single stream to output.

// hello.cxx
#pragma omp parallel
    int t = omp_get_thread_num();
    stringstream proctext;
    proctext << "Hello world from " << t << endl;
    cerr << proctext.str();
End of C++ note

C++ note OpenMP parallel regions can be in functions, including lambda expressions.

const int s = [] () {
  int s;
#   pragma omp parallel
#   pragma omp master
  s = 2 * omp_get_num_threads();
  return s; }();
(`Immediately Invoked Function Expression') End of C++ note

The following example uses parallelism for an actual calculation:

result = f(x)+g(x)+h(x)
you could parallelize this as
double result,fresult,gresult,hresult;
#pragma omp parallel
{ int num = omp_get_thread_num();
  if (num==0)      fresult = f(x);
  else if (num==1) gresult = g(x);
  else if (num==2) hresult = h(x);
result = fresult + gresult + hresult;

This code corresponds to the model we just discussed:

Remark In future versions of OpenMP, the master thread will be called the primary thread  . In 5.1 the master construct will be deprecated, and masked (with added functionality) will take its place. In 6.0 master will disappear from the Spec, including proc_bind master “variable” and combined master constructs (master taskloop, etc.)
End of remark

Exercise What happens if you call omp_get_thread_num and omp_get_num_threads outside a parallel region?
End of exercise

18.2 Nested parallelism

crumb trail: > omp-parallel > Nested parallelism

What happens if you call a function from inside a parallel region, and that function itself contains a parallel region?

int main() {
#pragma omp parallel
} // end of main
void func(...) {
#pragma omp parallel

By default, the nested parallel region will have only one thread. To allow nested thread creation, use the environment variable OMP_MAX_ACTIVE_LEVELS (default: 1) to set the number of levels of parallel nesting. Equivalently, there are functions omp_set_max_active_levels and omp_get_max_active_levels :

void omp_set_max_active_levels(int);
int omp_get_max_active_levels(void);

Remark A deprecated mechanism is to set the environment variable \indexompdepr{OMP_NESTED} (default: false) or its corresponding function:


End of remark

Nested parallelism can happen with nested loops, but it's also possible to have a sections construct and a loop nested. Example: \csnippetwithoutput{ompsectionnest}{code/omp/c}{sectionnest}

The amount of nested parallelism can be set:

means that initially a parallel region will have four threads, and each thread can create two more threads. It is still necessary that set the number of active levels.


More functions: omp_get_level  , omp_get_active_level  , omp_get_ancestor_thread_num  , omp_get_team_size (level)  .

18.2.1 Subprograms with parallel regions

crumb trail: > omp-parallel > Nested parallelism > Subprograms with parallel regions

A common application of nested parallelism is the case where you have a subprogram with a parallel region, which itself gets called from a parallel region.

Exercise Test nested parallelism by writing an OpenMP program as follows:

  1. Write a subprogram that contains a parallel region.
  2. Write a main program with a parallel region; call the subprogram both inside and outside the parallel region.
  3. Insert print statements
    1. in the main program outside the parallel region,
    2. in the parallel region in the main program,
    3. in the subprogram outside the parallel region,
    4. in the parallel region inside the subprogram.
Run your program and count how many print statements of each type you get.
End of exercise

Writing subprograms that are called in a parallel region illustrates the following point: directives are evaluation with respect to the dynamic scope parallel region, not just the lexical scope. In the following example:

#pragma omp parallel
void f() {
#pragma omp for
  for ( .... ) {
the body of the function  f falls in the dynamic scope of the parallel region, so the for loop will be parallelized.

If the function may be called both from inside and outside parallel regions, you can test which is the case with omp_in_parallel  .

C++ note Dynamic scope holds for class methods as for any other function: \def\snippetcodefraction{.4} \def\snippetlistfraction{.55} \cxxsnippetwithoutput{nestclass}{examples/omp/cxx}{nested} End of C++ note

18.3 Cancel parallel construct

crumb trail: > omp-parallel > Cancel parallel construct

It is possible to terminate a parallel construct early with the cancel directive:

!$omp cancel construct [if (expr)]
where construct is parallel  , sections  , do or taskgroup  .

See section  30.3 for an example.

Cancelling is disabled by default for performance reasons. To activate it, set the OMP_CANCELLATION variable to true.

The state of cancellation can be queried with omp_get_cancellation  , but there is no function to set it.

Cancellation can happen at most obvious places where OpenMP is active, but additional cancellation points can be set with

#pragma omp cancellation point <construct>
where the construct is parallel  , sections  , for  , do  , taskgroup  .

18.4 Review questions

crumb trail: > omp-parallel > Review questions

Exercise T/F? The function omp_get_num_threads() returns a number that is equal to the number of cores.
End of exercise

Exercise T/F? The function omp_set_num_threads() can not be set to a higher number than the number of cores.
End of exercise

Exercise What function can be used to detect the number of cores?
End of exercise

Back to Table of Contents