OpenMP remaining topics

Experimental html version of Parallel Programming in MPI, OpenMP, and PETSc by Victor Eijkhout. download the textbook at https:/theartofhpc.com/pcse
\[ \newcommand\inv{^{-1}}\newcommand\invt{^{-t}} \newcommand\bbP{\mathbb{P}} \newcommand\bbR{\mathbb{R}} \newcommand\defined{ \mathrel{\lower 5pt \hbox{${\equiv\atop\mathrm{\scriptstyle D}}$}}} \] 28.1 : Runtime functions, environment variables, internal control variables
28.2 : Timing
28.3 : Thread safety
28.4 : Performance and tuning
28.5 : Accelerators
28.6 : Tools interface
28.7 : OpenMP standards
28.8 : Memory model
28.8.1 : Dekker's algorithm
Back to Table of Contents

28 OpenMP remaining topics

28.1 Runtime functions, environment variables, internal control variables

crumb trail: > openmp > Runtime functions, environment variables, internal control variables

OpenMP has a number of settings that can be set through environment variables  , and both queried and set through library routines  . These settings are called ICVs }: an OpenMP implementation behaves as if there is an internal variable storing this setting.

The runtime functions are:

Here are the OpenMP environment variables :

There are 4 ICVs that behave as if each thread has its own copy of them. The default is implementation-defined unless otherwise noted.

Nonobvious syntax:

export OMP_SCHEDULE="static,100"

Other settings:

Other environment variables:

28.2 Timing

crumb trail: > openmp > Timing

OpenMP has a wall clock timer routine omp_get_wtime

double omp_get_wtime(void);
The starting point is arbitrary and is different for each program run; however, in one run it is identical for all threads. This timer has a resolution given by omp_get_wtick  .

Exercise Use the timing routines to demonstrate speedup from using multiple threads.


End of exercise

28.3 Thread safety

crumb trail: > openmp > Thread safety

With OpenMP it is relatively easy to take existing code and make it parallel by introducing parallel sections. If you're careful to declare the appropriate variables shared and private, this may work fine. However, your code may include calls to library routines that include a race condition ; such code is said not to be thread-safe  .

For example a routine

static int isave;
int next_one() {
 int i = isave;
 isave += 1;
 return i;
}

 ...
for ( .... ) {
  int ivalue = next_one();
}
has a clear race condition, as the iterations of the loop may get different next_one values, as they are supposed to, or not. This can be solved by using an critical pragma for the next_one call; another solution is to use an threadprivate declaration for isave  . This is for instance the right solution if the next_one routine implements a random number generator  .

28.4 Performance and tuning

crumb trail: > openmp > Performance and tuning

[epcc-ompbench]  .

The performance of an OpenMP code can be influenced by the following.

28.5 Accelerators

crumb trail: > openmp > Accelerators

In OpenMP- there is support for offloading work to an accelerator or co-processor

#pragma omp target [clauses]
with clauses such as

28.6 Tools interface

crumb trail: > openmp > Tools interface

The OpenMP- defines a tools interface. This means that routines can be defined that get called by the OpenMP runtime. For instance, the following example defines callback that are evaluated when OpenMP is initialized and finalized, thereby giving the runtime for the application.

int ompt_initialize(ompt_function_lookup_t lookup, int initial_device_num,
                    ompt_data_t *tool_data) {
  printf("libomp init time: %f\n",
         omp_get_wtime() - *(double *)(tool_data->ptr));
  *(double *)(tool_data->ptr) = omp_get_wtime();
  return 1; // success: activates tool
}

void ompt_finalize(ompt_data_t *tool_data) {
  printf("application runtime: %f\n",
         omp_get_wtime() - *(double *)(tool_data->ptr));
}

ompt_start_tool_result_t *ompt_start_tool(unsigned int omp_version,
                                          const char *runtime_version) {
  static double time = 0; // static defintion needs constant assigment
  time = omp_get_wtime();
  static ompt_start_tool_result_t ompt_start_tool_result = {
      &ompt_initialize, &ompt_finalize, {.ptr = &time}};
  return &ompt_start_tool_result; // success: registers tool
}  
(Example courtesy of https://git.rwth-aachen.de/OpenMPTools/OMPT-Examples  .)

28.7 OpenMP standards

crumb trail: > openmp > OpenMP standards

Here is the correspondence between the value of OpenMP versions (given by the _OPENMP macro) and the standard versions :

// version.c
int standard = _OPENMP;
printf("Supported OpenMP standard: %d\n",standard);
switch (standard) {
case  201511: printf("4.5\n");
  break;
case 201611: printf("Technical report 4: information about 5.0 but not yet mandated.\n");
  break;
case 201811: printf("5.0\n");
  break;
case 202011:
  printf("5.1\n");
  break;
case 202111: printf("5.2\n");
  break;
default:
  printf("Unrecognized version\n");
  break;
}

The openmp.org website maintains a record of which compilers support which standards: https://www.openmp.org/resources/openmp-compilers-tools/  .

28.8 Memory model

crumb trail: > openmp > Memory model

28.8.1 Dekker's algorithm

crumb trail: > openmp > Memory model > Dekker's algorithm

A standard illustration of the weak memory model is Dekker's algorithm  . We model that in OpenMP as follows;

// weak1.c
int a=0,b=0,r1,r2;
#pragma omp parallel sections shared(a, b, r1, r2)
{
#pragma omp section
  {
	a = 1;
	r1 = b;
	tasks++;
  }
#pragma omp section
  {
	b = 1;
	r2 = a;
	tasks++;
  }
}

Under any reasonable interpretation of parallel execution, the possible values for r1,r2 are $1,1$ $0,1$ or $1,0$. This is known as sequential consistency : the parallel outcome is consistent with a sequential execution that interleaves the parallel computations, respecting their local statement orderings. (See also  Eijkhout:IntroHPC  .)

However, running this, we get a small number of cases where $r_1=r_2=0$. There are two possible explanations:

  1. The compiler is allowed to interchange the first and second statements, since there is no dependence between them; or
  2. The thread is allowed to have a local copy of the variable that is not coherent with the value in memory.

We fix this by flushing both a,b :

// weak2.c
int a=0,b=0,r1,r2;
#pragma omp parallel sections shared(a, b, r1, r2)
{
#pragma omp section
  {
	a = 1;
#pragma omp flush (a,b)
	r1 = b;
	tasks++;
  }
#pragma omp section
  {
	b = 1;
#pragma omp flush (a,b)
	r2 = a;
	tasks++;
  }
}

Back to Table of Contents