In a parallel region there are two types of data: private and shared. In this sections we will see the various way you can control what category your data falls under; for private data items we also discuss how their values relate to shared data.
crumb trail: > omp-data > Shared data
In a parallel region, any data declared outside it will be shared: any thread using a variable x will access the same memory location associated with that variable.
Example:
int x = 5; #pragma omp parallel { x = x+1; printf("shared: x is %d\n",x); }All threads increment the same variable, so after the loop it will have a value of five plus the number of threads; or maybe less because of the data races involved. This issue is discussed in Eijkhout:IntroHPC ; see 23.2.2 for a solution to data races in OpenMP.
crumb trail: > omp-data > Private data
In the C/C++ language it is possible to declare variables inside a lexical scope ; roughly: inside curly braces. This concept extends to OpenMP parallel regions and directives: any variable declared in a block following an OpenMP directive will be local to the executing thread.
In the following example, each thread creates a private variable x and sets it to a unique value: \csnippetwithoutput{privatex}{examples/omp/c}{private} After the parallel region the outer variable x will still have the value 5 : there is no storage association between the private variable and global one.
Fortran note The Fortran language does not have this concept of scope, so you have to use a \indexclause{private} clause: \fsnippetwithoutput{privatexf}{examples/omp/f}{private} End of Fortran note
The private directive declares data to have a separate copy in the memory of each thread. Such private variables are initialized as they would be in a main program. Any computed value goes away at the end of the parallel region. (However, see lastprivate below.) Thus, you should not rely on any initial value, or on the value of the outer variable after the region.
int x = 5; #pragma omp parallel private(x) { x = x+1; // dangerous printf("private: x is %d\n",x); } printf("after: x is %d\n",x);
Data that is declared private with the private directive is put on a separate stack per thread . The OpenMP standard does not dictate the size of these stacks, but beware of stack overflow . A typical default is a few megabytes; you can control it with the environment variable OMP_STACKSIZE . (You can find the current value by setting OMP_DISPLAY_ENV .) Its values can be literal or with suffixes:
123 456k 567K 678m 789M 246g 357G
A normal Unix process also has a stack, but this is independent of the OpenMP stacks for private data. You can query or set the Unix stack with ulimit :
[] ulimit -s 64000 [] ulimit -s 8192 [] ulimit -s 8192The Unix stack can grow dynamically as space is needed. This does not hold for the OpenMP stacks: they are immediately allocated at their requested size. Thus it is important not too make them too large.
crumb trail: > omp-data > Data in dynamic scope
Functions that are called from a parallel region fall in the dynamic scope parallel region. The rules for variables in that function are as follows:
Fortran note Variables in subprograms are private, as in C, except if the have the Save attribute. This attribute is implicitly given to any variable that has value-initialized.
In the following example we have two almost identical routines, except that the first does value-initialization on the local variable, thereby in effect making it shared. The second routine does not have that problem.
\fsnippetwithoutput{hellosavef}{examples/omp/f}{save} End of Fortran note
crumb trail: > omp-data > Temporary variables in a loop
It is common to have a variable that is set and used in each loop iteration:
#pragma omp parallel for for ( ... i ... ) { x = i*h; s = sin(x); c = cos(x); a[i] = s+c; b[i] = s-c; }By the above rules, the variables x,s,c are all shared variables. However, the values they receive in one iteration are not used in a next iteration, so they behave in fact like private variables to each iteration.
Sometimes, even if you forget to declare these temporaries as private, the code may still give the correct output. That is because the compiler can sometimes eliminate them from the loop body, since it detects that their values are not otherwise used.
crumb trail: > omp-data > Default
There are default rules for whether data in OpenMP constructs is private or shared, and you can control this explicitly.
First the default behavior:
#pragma omp parallel default(shared) private(x) { ... } #pragma omp parallel default(private) shared(matrix) { ... }and if you want to play it safe:
#pragma omp parallel default(none) private(x) shared(matrix) { ... }
crumb trail: > omp-data > First and last private
Above, you saw that private variables are completely separate from any variables by the same name in the surrounding scope. However, there are two cases where you may want some storage association between a private variable and a global counterpart.
First of all, private variables are created with an undefined value. You can force their initialization with \indexclause{firstprivate}.
int t=2; #pragma omp parallel firstprivate(t) { t += f( omp_get_thread_num() ); g(t); }The variable t behaves like a private variable, except that it is initialized to the outside value.
Remark
Variables are
\indexclause{firstprivate} by default in tasks;
see chapter
OpenMP topic: Tasks
.
End of remark
Secondly, you may want a private value to be preserved to the environment outside the parallel region. This really only makes sense in one case, where you preserve a private variable from the last iteration of a parallel loop, or the last section in an sections construct. This is done with \indexclause{lastprivate}:
#pragma omp parallel for \ lastprivate(tmp) for (int i=0; i<N; i+) { tmp = ...... x[i] = .... tmp .... } ..... tmp ....
crumb trail: > omp-data > Array data
The rules for arrays are slightly different from those for scalar data:
int array[100]; integer,dimension(:) :: array(100}can be shared or private, depending on the clause you use.
On the other hand, in the following example each thread gets a private pointer, but all pointers point to the same object: \csnippetwithoutput{privatepointer}{examples/omp/c}{pointarray}
C++ note Compare \csnippetwithoutput{privatepointer}{examples/omp/c}{pointarray} and \cxxsnippetwithoutput{privatevector}{examples/omp/c}{privvector} End of C++ note
crumb trail: > omp-data > Persistent data through threadprivate
Most data in OpenMP parallel regions is either inherited from the master thread and therefore shared, or temporary within the scope of the region and fully private. data}, which is not limited in lifetime to one parallel region. The threadprivate pragma is used to declare that each thread is to have a private copy of a variable:
#pragma omp threadprivate(var)The variable needs be:
crumb trail: > omp-data > Persistent data through threadprivate > Thread private initialization
If each thread needs a different value in its threadprivate variable, the initialization needs to happen in a parallel region.
In the following example a team of 7 threads is created, all of which set their thread-private variable. Later, this variable is read by a larger team: the variables that have not been set are undefined, though often simply zero:
// threadprivate.c static int tp; #pragma omp threadprivate(tp)int main(int argc,char **argv) {
#pragma omp parallel num_threads(7) tp = omp_get_thread_num();
#pragma omp parallel num_threads(9) printf("Thread %d has %d\n",omp_get_thread_num(),tp);
Fortran note Named common blocks can be made thread-private with the syntax
$!OMP threadprivate( /blockname/ )Example: \fsnippetwithoutput{threadprivf}{examples/omp/f}{private} End of Fortran note
On the other hand, if the thread private data starts out identical in all threads, the \indexclause{copyin} clause can be used:
#pragma omp threadprivate(private_var) private_var = 1; #pragma omp parallel copyin(private_var) private_var += omp_get_thread_num()
If one thread needs to set all thread private data to its value, the \indexclause{copyprivate} clause can be used:
#pragma omp parallel { ... #pragma omp single copyprivate(private_var) private_var = read_data(); ... }
Threadprivate variables require OMP_DYNAMIC to be switched off.
crumb trail: > omp-data > Persistent data through threadprivate > Thread private example
The typical application for thread-private variables is in random number generator s. A random number generator needs saved state, since it computes each next value from the current one. To have a parallel generator, each thread will create and initialize a private `current value' variable. This will persist even when the execution is not in a parallel region; it gets updated only in a parallel region.
Exercise
Calculate the area of the
Mandelbrot set
by random
sampling. Initialize the random number generator separately for each
thread; then use a parallel loop to evaluate the points.
Explore performance implications of the different loop scheduling strategies.
End of exercise
C++ note The new C++ random header has a threadsafe generator, by virtue of the statement in the standard that no STL object can rely on global state. The usual idiom can not be made threadsafe because of the initialization:
static random_device rd; static mt19937 rng(rd);
However, the following works:
// privaterandom.cxx static random_device rd; static mt19937 rng; #pragma omp threadprivate(rd) #pragma omp threadprivate(rng)int main() {
#pragma omp parallel rng = mt19937(rd());
#pragma omp parallel { stringstream res; uniform_int_distribution<int> percent(1, 100); res << "Thread " << omp_get_thread_num() << ": " << percent(rng) << "\n"; cout << res.str(); }
crumb trail: > omp-data > Allocators
OpenMP was initially designed for shared memory. With accelerators (see chapter OpenMP topic: Offloading ), non-coherent memory was added to this. In the OpenMP- standard, the story is further complicated, to account for new memory types such as high-bandwidth memory and non-volatile memory .
There are several ways of using the OpenMP memory allocators.
float A[N], B[N]; #pragma omp allocate(A) \ allocator(omp_large_cap_mem_alloc)
#pragma omp task private(B) allocate(omp_const_mem_alloc: B)
Next, there are memory spaces. The binding between OpenMP identifiers and hardware is implementation defined.
crumb trail: > omp-data > Allocators > Pre-defined types
Allocators: omp_default_mem_alloc , omp_large_cap_mem_alloc , omp_const_mem_alloc , omp_high_bw_mem_alloc , omp_low_lat_mem_alloc , omp_cgroup_mem_alloc , omp_pteam_mem_alloc , omp_thread_mem_alloc .
Memory spaces: omp_default_mem_space , omp_large_cap_mem_space , omp_const_mem_space , omp_high_bw_mem_space , omp_low_lat_mem_space .