OpenMP topic: Memory model

Experimental html version of Parallel Programming in MPI, OpenMP, and PETSc by Victor Eijkhout. download the textbook at https:/theartofhpc.com/pcse
\[ \newcommand\inv{^{-1}}\newcommand\invt{^{-t}} \newcommand\bbP{\mathbb{P}} \newcommand\bbR{\mathbb{R}} \newcommand\defined{ \mathrel{\lower 5pt \hbox{${\equiv\atop\mathrm{\scriptstyle D}}$}}} \] 26.1 : Thread synchronization
26.2 : Data races
26.2.1 : Dekker's algorithm
26.3 : Relaxed memory model
Back to Table of Contents

26 OpenMP topic: Memory model

26.1 Thread synchronization

crumb trail: > omp-memory > Thread synchronization

Let's do a producer-consumer model\footnote{This example is from Intel's excellent OMP course by Tim Mattson}. This can be implemented with sections, where one section, the producer, sets a flag when data is available, and the other, the consumer, waits until the flag is set.

#pragma omp parallel sections
{
  // the producer
  #pragma omp section
  {
    ... do some producing work ...
    flag = 1;
  }
  // the consumer
  #pragma omp section
  {
    while (flag==0) { }
    ... do some consuming work ...
  }
}

One reason this doesn't work, is that the compiler will see that the flag is never used in the producing section, and that is never changed in the consuming section, so it may optimize these statements, to the point of optimizing them away.

The producer then needs to do:

... do some producing work ...
#pragma omp flush
#pragma atomic write
  flag = 1;
#pragma omp flush(flag)

and the consumer does:

#pragma omp flush(flag)
while (flag==0) {
  #pragma omp flush(flag)
}
#pragma omp flush

This code strictly speaking has a race condition on the flag variable.

The solution is to make this an atomic operation and use an atomic pragma here: the producer has

#pragma atomic write
  flag = 1;

and the consumer:

while (1) {
  #pragma omp flush(flag)
  #pragma omp atomic read
    flag_read = flag
  if (flag_read==1) break;
}

26.2 Data races

crumb trail: > omp-memory > Data races

OpenMP, being based on shared memory, has a potential for race conditions . These happen when two threads access the same data item. The problem with race conditions is that programmer convenience runs counter to efficient execution. For this reason, OpenMP simply does not allow some things that would be desirable.

For a simple example:

// race.c
#pragma omp parallel for shared(counter)
  for (int i=0; i<count; i++)
    counter++;
  printf("Counter should be %d, is %d\n",
	 count,counter);

The basic rule about multiple-thread access of a single data item is:

Any memory location that is written by one thread, can not be read by another thread in the same parallel region, if no synchronization is done.

To start with that last clause: any workshare construct ends with an implicit barrier , so data written before that barrier can safely be read after it.

26.2.1 Dekker's algorithm

crumb trail: > omp-memory > Data races > Dekker's algorithm

A standard illustration of the weak memory model is Dekker's algorithm . We model that in OpenMP as follows;

// weak1.c
int a=0,b=0,r1,r2;
#pragma omp parallel sections shared(a, b, r1, r2)
{
#pragma omp section
  {
	a = 1;
	r1 = b;
	tasks++;
  }
#pragma omp section
  {
	b = 1;
	r2 = a;
	tasks++;
  }
}

Under any reasonable interpretation of parallel execution, the possible values for r1,r2 are $1,1$ $0,1$ or $1,0$. This is known as sequential consistency : the parallel outcome is consistent with a sequential execution that interleaves the parallel computations, respecting their local statement orderings. (See also  Eijkhout:IntroHPC .)

However, running this, we get a small number of cases where $r_1=r_2=0$. There are two possible explanations:

  1. The compiler is allowed to interchange the first and second statements, since there is no dependence between them; or
  2. The thread is allowed to have a local copy of the variable that is not coherent with the value in memory.

We fix this by flushing both a,b :

// weak2.c
int a=0,b=0,r1,r2;
#pragma omp parallel sections shared(a, b, r1, r2)
{
#pragma omp section
  {
	a = 1;
#pragma omp flush (a,b)
	r1 = b;
	tasks++;
  }
#pragma omp section
  {
	b = 1;
#pragma omp flush (a,b)
	r2 = a;
	tasks++;
  }
}

26.3 Relaxed memory model

crumb trail: > omp-memory > Relaxed memory model

flush

  • There is an implicit flush of all variables at the start and end of a parallel region .
  • There is a flush at each barrier, whether explicit or implicit, such as at the end of a work sharing .
  • At entry and exit of a critical section
  • When a lock is set or unset.

Back to Table of Contents