crumb trail: > omp-memory > Thread synchronization
Let's do a producer-consumer model\footnote{This example is from Intel's excellent OMP course by Tim Mattson}. This can be implemented with sections, where one section, the producer, sets a flag when data is available, and the other, the consumer, waits until the flag is set.
#pragma omp parallel sections { // the producer #pragma omp section { ... do some producing work ... flag = 1; } // the consumer #pragma omp section { while (flag==0) { } ... do some consuming work ... } }
One reason this doesn't work, is that the compiler will see that the flag is never used in the producing section, and that is never changed in the consuming section, so it may optimize these statements, to the point of optimizing them away.
The producer then needs to do:
... do some producing work ... #pragma omp flush #pragma atomic write flag = 1; #pragma omp flush(flag)
and the consumer does:
#pragma omp flush(flag) while (flag==0) { #pragma omp flush(flag) } #pragma omp flush
This code strictly speaking has a race condition on the flag variable.
The solution is to make this an atomic operation and use an atomic pragma here: the producer has
#pragma atomic write flag = 1;
and the consumer:
while (1) { #pragma omp flush(flag) #pragma omp atomic read flag_read = flag if (flag_read==1) break; }
crumb trail: > omp-memory > Data races
OpenMP, being based on shared memory, has a potential for race conditions . These happen when two threads access the same data item. The problem with race conditions is that programmer convenience runs counter to efficient execution. For this reason, OpenMP simply does not allow some things that would be desirable.
For a simple example:
// race.c #pragma omp parallel for shared(counter) for (int i=0; i<count; i++) counter++; printf("Counter should be %d, is %d\n", count,counter);
The basic rule about multiple-thread access of a single data item is:
Any memory location that is written by one thread, can not be read by another thread in the same parallel region, if no synchronization is done.
To start with that last clause: any workshare construct ends with an implicit barrier , so data written before that barrier can safely be read after it.
crumb trail: > omp-memory > Data races > Dekker's algorithm
A standard illustration of the weak memory model is Dekker's algorithm . We model that in OpenMP as follows;
// weak1.c int a=0,b=0,r1,r2; #pragma omp parallel sections shared(a, b, r1, r2) { #pragma omp section { a = 1; r1 = b; tasks++; } #pragma omp section { b = 1; r2 = a; tasks++; } }
Under any reasonable interpretation of parallel execution, the possible values for r1,r2 are $1,1$ $0,1$ or $1,0$. This is known as sequential consistency : the parallel outcome is consistent with a sequential execution that interleaves the parallel computations, respecting their local statement orderings. (See also Eijkhout:IntroHPC .)
However, running this, we get a small number of cases where $r_1=r_2=0$. There are two possible explanations:
We fix this by flushing both a,b :
// weak2.c int a=0,b=0,r1,r2; #pragma omp parallel sections shared(a, b, r1, r2) { #pragma omp section { a = 1; #pragma omp flush (a,b) r1 = b; tasks++; } #pragma omp section { b = 1; #pragma omp flush (a,b) r2 = a; tasks++; } }
crumb trail: > omp-memory > Relaxed memory model
flush