OpenMP topic: Controlling thread data

Experimental html version of Parallel Programming in MPI, OpenMP, and PETSc by Victor Eijkhout. download the textbook at https:/theartofhpc.com/pcse
\[ \newcommand\inv{^{-1}}\newcommand\invt{^{-t}} \newcommand\bbP{\mathbb{P}} \newcommand\bbR{\mathbb{R}} \newcommand\defined{ \mathrel{\lower 5pt \hbox{${\equiv\atop\mathrm{\scriptstyle D}}$}}} \] 22.1 : Shared data
22.2 : Private data
22.3 : Data in dynamic scope
22.4 : Temporary variables in a loop
22.5 : Default
22.6 : Array data
22.7 : First and last private
22.8 : Persistent data through threadprivate
22.8.1 : Thread private initialization
22.8.2 : Thread private example
22.9 : Allocators
22.9.1 : Pre-defined types
Back to Table of Contents

22 OpenMP topic: Controlling thread data

In a parallel region there are two types of data: private and shared. In this sections we will see the various way you can control what category your data falls under; for private data items we also discuss how their values relate to shared data.

22.1 Shared data

crumb trail: > omp-data > Shared data

In a parallel region, any data declared outside it will be shared: any thread using a variable  x will access the same memory location associated with that variable.

Example:

  int x = 5;
#pragma omp parallel
  {
    x = x+1;
    printf("shared: x is %d\n",x);
  }

All threads increment the same variable, so after the loop it will have a value of five plus the number of threads; or maybe less because of the data races involved. This issue is discussed in Eijkhout:IntroHPC ; see 23.2.3 for a solution to data races in OpenMP.

Sometimes this global update is what you want; in other cases the variable is intended only for intermediate results in a computation. In that case there are various ways of creating data that is local to a thread, and therefore invisible to other threads.

22.2 Private data

crumb trail: > omp-data > Private data

In the C/C++ language it is possible to declare variables inside a lexical scope ; roughly: inside curly braces. This concept extends to OpenMP parallel regions and directives: any variable declared in a block following an OpenMP directive will be local to the executing thread.

In the following example, each thread creates a private variable  x

and sets it to a unique value: \csnippetwithoutput{privatex}{examples/omp/c}{private} After the parallel region the outer variable  x will still have the value  5 : there is no storage association between the private variable and global one.

Fortran note {Private variables in parallel region} The Fortran language does not have this concept of scope, so you have to use a

\indexclause{private} clause: \fsnippetwithoutput{privatexf}{examples/omp/f}{private}

The private directive declares data to have a separate copy in the memory of each thread. Such private variables are initialized as they would be in a main program. Any computed value goes away at the end of the parallel region. (However, see below.) Thus, you should not rely on any initial value, or on the value of the outer variable after the region.

  int x = 5;
#pragma omp parallel private(x)
  {
    x = x+1; // dangerous
    printf("private: x is %d\n",x);
  }
  printf("after: x is %d\n",x); // also dangerous

Data that is declared private with the private directive is put on a separate stack per thread  . The OpenMP standard does not dictate the size of these stacks, but beware of stack overflow  . A typical default is a few megabyte; you can control it with the environment variable OMP_STACKSIZE  . (You can find the current value by setting OMP_DISPLAY_ENV  .)

Its values can be literal or with suffixes:

123 456k 567K 678m 789M 246g 357G

A normal Unix process also has a stack, but this is independent of the OpenMP stacks for private data. You can query or set the Unix stack with ulimit :

[] ulimit -s
64000
[] ulimit -s 8192
[] ulimit -s
8192

The Unix stack can grow dynamically as space is needed. This does not hold for the OpenMP stacks: they are immediately allocated at their requested size. Thus it is important not too make them too large.

22.3 Data in dynamic scope

crumb trail: > omp-data > Data in dynamic scope

Functions that are called from a parallel region fall in the dynamic scope parallel region. The rules for variables in that function are as follows:

Fortran note {Saved variables} Variables in subprograms are private, as in C, except if the have the Save attribute. This attribute is implicitly given to any variable that has value-initialized.

In the following example we have two almost identical routines, except that the first does value-initialization on the local variable, thereby in effect making it shared. The second routine does not have that problem.

\fsnippetwithoutput{hellosavef}{examples/omp/f}{save}

22.4 Temporary variables in a loop

crumb trail: > omp-data > Temporary variables in a loop

It is common to have a variable that is set and used in each loop iteration:

#pragma omp parallel for
for ( ... i ... ) {
  x = i*h;
  s = sin(x); c = cos(x);
  a[i] = s+c;
  b[i] = s-c;
}

By the above rules, the variables x,s,c are all shared variables. However, the values they receive in one iteration are not used in a next iteration, so they behave in fact like private variables to each iteration.

Sometimes, even if you forget to declare these temporaries as private, the code may still give the correct output. That is because the compiler can sometimes eliminate them from the loop body, since it detects that their values are not otherwise used.

22.5 Default

crumb trail: > omp-data > Default

You can alter this default behavior with the \indexclause{default} clause:

#pragma omp parallel default(shared) private(x)
{ ... }
#pragma omp parallel default(private) shared(matrix)
{ ... }

and if you want to play it safe:

#pragma omp parallel default(none) private(x) shared(matrix)
{ ... }

22.6 Array data

crumb trail: > omp-data > Array data

The rules for arrays are slightly different from those for scalar data:

  1. Statically allocated data, that is with a syntax like

    int array[100];
    integer,dimension(:) :: array(100}
    

    can be shared or private, depending on the clause you use.
  2. Dynamically allocated data, that is, created with

    malloc or

    allocate  , can only be shared.

Example of the first type: each thread gets a private copy of the array, properly initialized. \csnippetwithoutput{privatearray}{examples/omp/c}{privarray} Of course, since only the private copy is altered, the original array is unaffected.

On the other hand, in \csnippetwithoutput{privatepointer}{examples/omp/c}{pointarray} each thread gets a private pointer, but all pointers point to the same object.

22.7 First and last private

crumb trail: > omp-data > First and last private

Above, you saw that private variables are completely separate from any variables by the same name in the surrounding scope. However, there are two cases where you may want some storage association between a private variable and a global counterpart.

First of all, private variables are created with an undefined value. You can force their initialization with \indexclause{firstprivate}.

  int t=2;
#pragma omp parallel firstprivate(t)
  {
    t += f( omp_get_thread_num() );
    g(t);
  }

The variable t behaves like a private variable, except that it is initialized to the outside value.

Remark Variables are \indexclause{firstprivate} by default in tasks; see chapter  OpenMP topic: Tasks  .
End of remark

Secondly, you may want a private value to be preserved to the environment outside the parallel region. This really only makes sense in one case, where you preserve a private variable from the last iteration of a parallel loop, or the last section in an sections construct. This is done with \indexclause{lastprivate}:

#pragma omp parallel for \
        lastprivate(tmp)
for (i=0; i<N; i+) {
  tmp = ......
  x[i] = .... tmp ....
}
 ..... tmp ....

22.8 Persistent data through threadprivate

crumb trail: > omp-data > Persistent data through threadprivate

Most data in OpenMP parallel regions is either inherited from the master thread and therefore shared, or temporary within the scope of the region and fully private. data}, which is not limited in lifetime to one parallel region. The threadprivate pragma is used to declare that each thread is to have a private copy of a variable:

#pragma omp threadprivate(var)

The variable needs be:

22.8.1 Thread private initialization

crumb trail: > omp-data > Persistent data through threadprivate > Thread private initialization

If each thread needs a different value in its threadprivate variable, the initialization needs to happen in a parallel region.

In the following example a team of 7 threads is created, all of which set their thread-private variable. Later, this variable is read by a larger team: the variables that have not been set are undefined, though often simply zero:

// threadprivate.c
static int tp;
#pragma omp threadprivate(tp)

int main(int argc,char **argv) {

#pragma omp parallel num_threads(7) tp = omp_get_thread_num();

#pragma omp parallel num_threads(9) printf("Thread %d has %d\n",omp_get_thread_num(),tp);

Fortran note {Private common blocks} Named common blocks can be made thread-private with the syntax

$!OMP threadprivate( /blockname/ )

Example: \fsnippetwithoutput{threadprivf}{examples/omp/f}{priv}

On the other hand, if the thread private data starts out identical in all threads, the \indexclause{copyin} clause can be used:

#pragma omp threadprivate(private_var)

private_var = 1;
#pragma omp parallel copyin(private_var)
  private_var += omp_get_thread_num()

If one thread needs to set all thread private data to its value, the \indexclause{copyprivate} clause can be used:

#pragma omp parallel
{
  ...
#pragma omp single copyprivate(private_var)
  private_var = read_data();
  ...
}

Threadprivate variables require OMP_DYNAMIC to be switched off.

22.8.2 Thread private example

crumb trail: > omp-data > Persistent data through threadprivate > Thread private example

The typical application for thread-private variables is in random number generator s. A random number generator needs saved state, since it computes each next value from the current one. To have a parallel generator, each thread will create and initialize a private `current value' variable. This will persist even when the execution is not in a parallel region; it gets updated only in a parallel region.

Exercise

Calculate the area of the Mandelbrot set by random sampling. Initialize the random number generator separately for each thread; then use a parallel loop to evaluate the points. Explore performance implications of the different loop scheduling strategies.
End of exercise

C++ note The new C++ random header has a threadsafe generator, by virtue of the statement in the standard that no STL object can rely on global state. The usual idiom can not be made threadsafe because of the initialization:

static random_device rd;
static mt19937 rng(rd);    

However, the following works:

// privaterandom.cxx
static random_device rd;
static mt19937 rng;
#pragma omp threadprivate(rd)
#pragma omp threadprivate(rng)

int main() {

#pragma omp parallel rng = mt19937(rd());

You can then use the generator safely and independently:
#pragma omp parallel
  {
    stringstream res;
    uniform_int_distribution<int> uni(1, 100);
    res << "Thread " << omp_get_thread_num() << ": " << uni(rng) << "\n";
    cout << res.str();
  }
End of C++ note

22.9 Allocators

crumb trail: > omp-data > Allocators

The OpenMP was initially designed for shared memory. With accelerators (see chapter  OpenMP topic: Offloading  ), non-coherent memory was added to this. In the OpenMP- standard, the story is further complicated, to account for new memory types such as high-bandwidth memory and non-volatile memory  .

There are several ways of using the OpenMP memory allocators.

Next, there are memory spaces. The binding between OpenMP identifiers and hardware is implementation defined.

22.9.1 Pre-defined types

crumb trail: > omp-data > Allocators > Pre-defined types

Allocators: omp_default_mem_alloc  , omp_large_cap_mem_alloc  , omp_const_mem_alloc  , omp_high_bw_mem_alloc  , omp_low_lat_mem_alloc  , omp_cgroup_mem_alloc  , omp_pteam_mem_alloc  , omp_thread_mem_alloc  .

Memory spaces: omp_default_mem_space  , omp_large_cap_mem_space  , omp_const_mem_space  , omp_high_bw_mem_space  , omp_low_lat_mem_space  .

Back to Table of Contents