crumb trail: > omp-parallel > Creating parallelism with parallel regions
In OpenMP you need to indicate explicitly what passages are parallel. Creating parallelism, which here means: creating a team of threads, is done with the parallel pragma. A block preceded by the omp parallel pragma is called a parallel region ; it is executed by a newly created team of threads. This is an instance of the SPMD model: all threads execute the same segment of code.
#pragma omp parallel { // this is executed by a team of threads }
It would be pointless to have the block be executed identically by all threads. One way to get a meaningful parallel code is to use the function omp_get_thread_num to find out which thread you are, and execute work that is individual to that thread. This function gives a number relative to the current team; recall from figure 17.3 that new teams can be created recursively.
There is also a function omp_get_num_threads to find out the total number of threads.
The first thing we want to do is create a team of threads. This is done with a parallel region . Here is a very simple example where each thread outputs its number: \csnippetwithoutput{hello-omp}{examples/omp/c}{hello} or in Fortran \fsnippetwithoutput{hello-omp-f}{examples/omp/f}{hellof}
C++ note The use of cout may give jumbled output:\slidebreak lines can break at each << .\slidebreak Use stringstream to form a single stream to output.
// hello.cxx #pragma omp parallel { int t = omp_get_thread_num(); stringstream proctext; proctext << "Hello world from " << t << endl; cerr << proctext.str(); }
C++ note OpenMP parallel regions can be in functions, including lambda expressions.
const int s = [] () { int s; # pragma omp parallel # pragma omp master s = 2 * omp_get_num_threads(); return s; }();
The following example uses parallelism for an actual calculation:
result = f(x)+g(x)+h(x)you could parallelize this as
double result,fresult,gresult,hresult; #pragma omp parallel { int num = omp_get_thread_num(); if (num==0) fresult = f(x); else if (num==1) gresult = g(x); else if (num==2) hresult = h(x); } result = fresult + gresult + hresult;
This code corresponds to the model we just discussed:
Remark
In future versions of OpenMP, the master thread will be called
the
primary thread
.
In 5.1 the master construct will be deprecated, and masked (with
added functionality) will take its place. In 6.0 master will
disappear from the Spec, including
proc_bind
master “variable”
and combined master constructs (master taskloop, etc.)
End of remark
Exercise
What happens if you call
omp_get_thread_num
and
omp_get_num_threads
outside a parallel region?
End of exercise
crumb trail: > omp-parallel > Nested parallelism
What happens if you call a function from inside a parallel region, and that function itself contains a parallel region?
int main() { ... #pragma omp parallel { ... func(...) ... } } // end of main void func(...) { #pragma omp parallel { ... } }
By default, the nested parallel region will have only one thread. To allow nested thread creation, use the environment variable OMP_MAX_ACTIVE_LEVELS (default: 1) to set the number of levels of parallel nesting. Equivalently, there are functions omp_set_max_active_levels and omp_get_max_active_levels :
OMP_MAX_ACTIVE_LEVELS=3or
void omp_set_max_active_levels(int); int omp_get_max_active_levels(void);
Remark A deprecated mechanism is to set the environment variable \indexompdepr{OMP_NESTED} (default: false) or its corresponding function:
OMP_NESTED=true or omp_set_nested(1)
Nested parallelism can happen with nested loops, but it's also possible to have a sections construct and a loop nested. Example: \csnippetwithoutput{ompsectionnest}{code/omp/c}{sectionnest}
The amount of nested parallelism can be set:
OMP_NUM_THREADS=4,2means that initially a parallel region will have four threads, and each thread can create two more threads. It is still necessary that set the number of active levels.
OMP_THREAD_LIMIT=123
More functions: omp_get_level , omp_get_active_level , omp_get_ancestor_thread_num , omp_get_team_size (level) .
crumb trail: > omp-parallel > Nested parallelism > Subprograms with parallel regions
A common application of nested parallelism is the case where you have a subprogram with a parallel region, which itself gets called from a parallel region.
Exercise Test nested parallelism by writing an OpenMP program as follows:
Writing subprograms that are called in a parallel region illustrates the following point: directives are evaluation with respect to the dynamic scope parallel region, not just the lexical scope. In the following example:
#pragma omp parallel { f(); } void f() { #pragma omp for for ( .... ) { ... } }the body of the function f falls in the dynamic scope of the parallel region, so the for loop will be parallelized.
If the function may be called both from inside and outside parallel regions, you can test which is the case with omp_in_parallel .
C++ note Dynamic scope holds for class methods as for any other function: \def\snippetcodefraction{.4} \def\snippetlistfraction{.55} \cxxsnippetwithoutput{nestclass}{examples/omp/cxx}{nested} End of C++ note
crumb trail: > omp-parallel > Cancel parallel construct
It is possible to terminate a parallel construct early with the cancel directive:
!$omp cancel construct [if (expr)]where construct is parallel , sections , do or taskgroup .
See section 30.3 for an example.
Cancelling is disabled by default for performance reasons. To activate it, set the OMP_CANCELLATION variable to true.
The state of cancellation can be queried with omp_get_cancellation , but there is no function to set it.
Cancellation can happen at most obvious places where OpenMP is active, but additional cancellation points can be set with
#pragma omp cancellation point <construct>where the construct is parallel , sections , for , do , taskgroup .
crumb trail: > omp-parallel > Review questions
Exercise
T/F?
The function omp_get_num_threads()
returns a number that is equal to the number of cores.
End of exercise
Exercise
T/F?
The function omp_set_num_threads()
can not be set to a higher number than the number of cores.
End of exercise
Exercise
What function can be used to detect the number of cores?
End of exercise