Co-array Fortran

Experimental html version of Parallel Programming in MPI, OpenMP, and PETSc by Victor Eijkhout. download the textbook at https:/
\[ \newcommand\inv{^{-1}}\newcommand\invt{^{-t}} \newcommand\bbP{\mathbb{P}} \newcommand\bbR{\mathbb{R}} \newcommand\defined{ \mathrel{\lower 5pt \hbox{${\equiv\atop\mathrm{\scriptstyle D}}$}}} \] 40.1 : History and design
40.2 : Compiling and running
40.3 : Basics
40.3.1 : Image identification
40.3.2 : Remote operations
40.3.3 : Synchronization
40.3.4 : Collectives
Back to Table of Contents

40 Co-array Fortran

This chapter explains the basic concepts of CAF  , and helps you get started on running your first program.

40.1 History and design

crumb trail: > caf > History and design

40.2 Compiling and running

crumb trail: > caf > Compiling and running

CAF is built on the same SPMD design as MPI. Where MPI talks about processes or ranks, CAF calls the running instances of your program image s.

The Intel compiler uses the flag -coarray=xxx with values single  , shared  , distributed gpu  .

It is possible to bake the number of `images' into the executable, but by default this is not done, and it is determined at runtime by the variable FOR_COARRAY_NUM_IMAGES  .

CAF can not be mixed with OpenMP.

40.3 Basics

crumb trail: > caf > Basics

Co-arrays are defined by giving them, in addition to the Dimension  , a \indextermttdef{Codimension}

Complex,codimension(*) :: number
Integer,dimension(:,:,:),codimension[-1:1,*] :: grid  

This means we are respectively declaring an array with a single number on each image, or a three-dimensional grid spread over a two-dimensional processor grid.

Traditional-like syntax can also be used:

Complex :: number[*]
Integer :: grid(10,20,30)[-1:1,*]

Unlike MPI  , which normally only supports a linear process numbering, CAF allows for multi-dimensional process grids. The last dimension is always specified as  *  , meaning it is determined at runtime.

40.3.1 Image identification

crumb trail: > caf > Basics > Image identification

As in other models, in CAF one can ask how many images/processes there are, and what the number of the current one is, with \indextermttdef{num_images} and \indextermttdef{this_image} respectively.

!! hello.F90
  write(*,*) "Hello from image ", this_image(), &
       "out of ", num_images()," total images"

If you call this_image with a co-array as argument, it will return the image index, as a tuple of \indextermttdef{cosubscript} s, rather than a linear index. Given such a set of subscripts, \indextermttdef{image_index} will return the linear index.

The functions \indextermttdef{lcobound} and \indextermttdef{ucobound} give the lower and upper bound on the image subscripts, as a linear index, or a tuple if called with a co-array variable.

40.3.2 Remote operations

crumb trail: > caf > Basics > Remote operations

The appeal of CAF is that moving data between images looks (almost) like an ordinary copy operation:

real :: x(2)[*]
integer :: p
p = this_image()
x(1)[ p+1 ] = x(2)[ p ]

Exchanging grid boundaries is elegantly done with array syntax:

Real,Dimension( 0:N+1,0:N+1 )[*] :: grid
grid( N+1,: )[p] = grid( 0,: )[p+1]
grid(   0,: )[p] = grid( N,: )[p-1]

40.3.3 Synchronization

crumb trail: > caf > Basics > Synchronization

The fortran standard forbids race conditions

If a variable is defined on an image in a segment, it shall not be referenced, defined or become undefined in a segment on another image unless the segments are ordered.

That is, you should not cause them to happen. The language and runtime are certainly not going to help yu with that.

Well, a little. After remote updates you can synchronize images with the \indextermttdef{sync} call. The easiest variant is a global synchronization:

sync all
Compare this to a wait call after MPI nonblocking calls.

More fine-grained, one can synchronize with specific images:

sync images( (/ p-1,p,p+1 /) )
While remote operations in CAF are nicely one-sided, synchronization is not: if image p issues a call
then q also needs to issue a mirroring call to synchronize with  p  .

As an illustration, the following code is not a correct implementation of a ping-pong :

!! pingpong.F90
  sync all
  if (procid==1) then
     number[procid+1] = number[procid]
  else if (procid==2) then
     number[procid-1] = 2*number[procid]
  end if
  sync all

We can solve this with a global synchronization:

sync all
if (procid==1) &
     number[procid+1] = number[procid]
sync all
if (procid==2) &
     number[procid-1] = 2*number[procid]
sync all
or a local one:
if (procid==1) &
     number[procid+1] = number[procid]
if (procid<=2) sync images( (/1,2/) )
if (procid==2) &
     number[procid-1] = 2*number[procid]
if (procid<=2) sync images( (/2,1/) )
Note that the local sync call is done on both images involved.

Example of how you would synchronize a collective:

if ( this_image() .eq. 1 ) sync images( * )
if ( this_image() .ne. 1 ) sync images( 1 )
Here image 1 synchronizes with all others, but the others don't synchronize with each other.

if (procid==1) then
   sync images( (/procid+1/) )
else if (procid==nprocs) then
   sync images( (/procid-1/) )
   sync images( (/procid-1,procid+1/) )
end if

40.3.4 Collectives

crumb trail: > caf > Basics > Collectives

Collectives are not part of CAF as of the 2008 Fortran standard.

Back to Table of Contents