Getting started with MPI

Experimental html version of Parallel Programming in MPI, OpenMP, and PETSc by Victor Eijkhout. download the textbook at https:/theartofhpc.com/pcse

\[ \newcommand\inv{^{-1}}\newcommand\invt{^{-t}} \newcommand\bbP{\mathbb{P}} \newcommand\bbR{\mathbb{R}} \newcommand\defined{ \mathrel{\lower 5pt \hbox{${\equiv\atop\mathrm{\scriptstyle D}}$}}} \] 1.1 : Distributed memory and message passing
1.2 : History
1.3 : Basic model
1.4 : Making and running an MPI program
1.5 : Language bindings
1.5.1 : C
1.5.2 : C++, including MPL
1.5.3 : Fortran
1.5.4 : Python
1.5.5 : How to read routine signatures
1.5.5.1 : C
1.5.5.2 : Fortran
1.5.5.3 : Python
1.6 : Review
Back to Table of Contents

1 Getting started with MPI

In this chapter you will learn the use of the main tool for distributed memory programming: the MPI library. The MPI library has about 250 routines, many of which you may never need. Since this is a textbook, not a reference manual, we will focus on the important concepts and give the important routines for each concept. What you learn here should be enough for most common purposes. You are advised to keep a reference document handy, in case there is a specialized routine, or to look up subtleties about the routines you use.

1.1 Distributed memory and message passing

crumb trail: > mpi-started > Distributed memory and message passing

In its simplest form, a distributed memory machine is a collection of single computers hooked up with network cables. In fact, this has a name: a Beowulf cluster  . As you recognize from that setup, each processor can run an independent program, and has its own memory without direct access to other processors' memory. MPI is the magic that makes multiple instantiations of the same executable run so that they know about each other and can exchange data through the network.

One of the reasons that MPI is so successful as a tool for high performance on clusters is that it is very explicit: the programmer controls many details of the data motion between the processors. Consequently, a capable programmer can write very efficient code with MPI. Unfortunately, that programmer will have to spell things out in considerable detail. For this reason, people sometimes call MPI `the assembly language of parallel programming'. If that sounds scary, be assured that things are not that bad. You can get started fairly quickly with MPI, using just the basics, and coming to the more sophisticated tools only when necessary.

Another reason that MPI was a big hit with programmers is that it does not ask you to learn a new language: it is a library that can be interfaced to C/C++ or Fortran; there are even bindings to Python and Java (not described in this course). This does not mean, however, that it is simple to `add parallelism' to an existing sequential program. An MPI version of a serial program takes considerable rewriting; certainly more than shared memory parallelism through OpenMP, discussed later in this book.

MPI is also easy to install: there are free implementations that you can download and install on any computer that has a Unix-like operating system, even if that is not a parallel machine. However, if you are working on a supercomputer cluster, likely there will already be an MPI installation, tuned to that machine's network.

1.2 History

crumb trail: > mpi-started > History

Before the MPI standard was developed in 1993-4, there were many libraries for distributed memory computing, often proprietary to a vendor platform. MPI standardized the inter-process communication mechanisms. Other features, such as process management in PVM  , or parallel I/O were omitted. Later versions of the standard have included many of these features.

Since MPI was designed by a large number of academic and commercial participants, it quickly became a standard. A few packages from the pre-MPI era, such as Charmpp   [charmpp]  , are still in use since they support mechanisms that do not exist in MPI.

1.3 Basic model

crumb trail: > mpi-started > Basic model

Here we sketch the two most common scenarios for using MPI. In the first, the user is working on an interactive machine, which has network access to a number of hosts, typically a network of workstations; see figure  1.1  .

FIGURE 1.1: Interactive MPI setup

The user types the command mpiexec \footnote {A command variant is mpirun ; your local cluster may have a different mechanism.} and supplies

The mpiexec program then makes an ssh connection to each of the hosts, giving them sufficient information that they can find each other. All the output of the processors is piped through the mpiexec program, and appears on the interactive console.

In the second scenario (figure  1.2  ) the user prepares a batch job script with commands, and these will be run when the batch scheduler gives a number of hosts to the job.

FIGURE 1.2: Batch MPI setup

Now the batch script contains the mpiexec command% \begin{istc}  , or some variant such as ibrun \end{istc}  , and the hostfile is dynamically generated when the job starts. Since the job now runs at a time when the user may not be logged in, any screen output goes into an output file.

You see that in both scenarios the parallel program is started by the mpiexec command using an SPMD mode of execution: all hosts execute the same program. It is possible for different hosts to execute different programs, but we will not consider that in this book.

There can be options and environment variables that are specific to some MPI installations, or to the network.

1.4 Making and running an MPI program

crumb trail: > mpi-started > Making and running an MPI program

MPI is a library, called from programs in ordinary programming languages such as C/C++ or Fortran. To compile such a program you use your regular compiler:

gcc -c my_mpi_prog.c -I/path/to/mpi.h
gcc -o my_mpi_prog my_mpi_prog.o -L/path/to/mpi -lmpich
However, MPI libraries may have different names between different architectures, making it hard to have a portable makefile. Therefore, MPI typically has shell scripts around your compiler call, called mpicc  , mpicxx  , mpif90 for C/C++/Fortran respectively.
mpicc -c my_mpi_prog.c
mpicc -o my_mpi_prog my_mpi_prog.o

If you want to know what \indextermttdef{mpicc} does, there is usually an option that prints out its definition. On a Mac with the clang compiler:

$$ mpicc -show
clang -fPIC -fstack-protector -fno-stack-check -Qunused-arguments -g3 -O0 -Wno-implicit-function-declaration -Wl,-flat_namespace -Wl,-commons,use_dylibs -I/Users/eijkhout/Installation/petsc/petsc-3.16.1/macx-clang-debug/include -L/Users/eijkhout/Installation/petsc/petsc-3.16.1/macx-clang-debug/lib -lmpi -lpmpi

Remark In OpenMPI  , these commands are binary executables by default, but you can make it a shell script by passing the --enable-script-wrapper-compilers option at configure time.
End of remark

MPI programs can be run on many different architectures. Obviously it is your ambition (or at least your dream) to run your code on a cluster with a hundred thousand processors and a fast network. But maybe you only have a small cluster with plain ethernet  . Or maybe you're sitting in a plane, with just your laptop. An MPI program can be run in all these circumstances -- within the limits of your available memory of course.

The way this works is that you do not start your executable directly, but you use a program, typically called mpiexec or something similar, which makes a connection to all available processors and starts a run of your executable there. So if you have a thousand nodes in your cluster, mpiexec can start your program once on each, and if you only have your laptop it can start a few instances there. In the latter case you will of course not get great performance, but at least you can test your code for correctness.

1.5 Language bindings

crumb trail: > mpi-started > Language bindings

1.5.1 C

crumb trail: > mpi-started > Language bindings > C

The MPI library is written in C. However, the standard is careful to distinguish between MPI routines, versus their C bindings  . In fact, as of MPI MPI-4, for a number of routines there are two bindings, depending on whether you want 4 byte integers, or larger. See section  6.4  , in particular  6.4.1  .

1.5.2 C++, including MPL

crumb trail: > mpi-started > Language bindings > C++, including MPL

C++ bindings but they were declared deprecated, and have been officially removed in the \mpistandardsub{3}{C++ bindings removed} standard. Thus, MPI can be used from C++ by including

#include <mpi.h>
and using the C API.

The boost library has its own version of MPI, but it seems not to be under further development. A recent effort at idiomatic C++ support is MPL

https://github.com/rabauke/mpl  . This book has an index of MPL notes and commands: section  sec:idx:mpl  .

MPL note MPL is a C++ header-only library. Notes on MPI usage from MPL will be indicated like this. End of MPL note

1.5.3 Fortran

crumb trail: > mpi-started > Language bindings > Fortran

Fortran note Fortran-specific notes will be indicated with a note like this. End of Fortran note

Traditionally, Fortran bindings for MPI look very much like the C ones, except that each routine has a final error return parameter. You will find that a lot of MPI code in Fortran conforms to this.

However, in the MPI 3 % standard it is recommended that an MPI implementation providing a Fortran interface provide a module named \indextermttdef{mpi_f08} that can be used in a Fortran program. This incorporates the following improvements:

1.5.4 Python

crumb trail: > mpi-started > Language bindings > Python

Python note Python-specific notes will be indicated with a note like this.

The \indextermttdef{mpi4py} package  [Dalcin:m4py12,mpi4py:homepage] of python bindings is not defined by the MPI standards committee. Instead, it is the work of an individual, Lisandro Dalcin  .

In a way, the Python interface is the most elegant. It uses OO techniques such as methods on objects, and many default arguments.

Notable about the Python bindings is that many communication routines exist in two variants:

The first version looks more `pythonic', is easier to write, and can do things like sending python objects, but it is also decidedly less efficient since data is packed and unpacked with pickle  . As a common sense guideline, use the numpy interface in the performance-critical parts of your code, and the pythonic interface only for complicated actions in a setup phase.

Codes with mpi4py can be interfaced to other languages through Swig or conversion routines.

Data in numpy can be specified as a simple object, or [data, (count,displ), datatype]  .

1.5.5 How to read routine signatures

crumb trail: > mpi-started > Language bindings > How to read routine signatures

Throughout the MPI part of this book we will give the reference syntax of the routines. This typically comprises:

These `routine signatures' look like code but they are not! Here is how you translate them.

1.5.5.1 C

crumb trail: > mpi-started > Language bindings > How to read routine signatures > C

The typically C routine specification in MPI looks like:

int MPI_Comm_size(MPI_Comm comm,int *nprocs)
This means that

1.5.5.2 Fortran

crumb trail: > mpi-started > Language bindings > How to read routine signatures > Fortran

The Fortran specification looks like:

MPI_Comm_size(comm, size, ierror)
Type(MPI_Comm), Intent(In) :: comm
Integer, Intent(Out) :: size
Integer, Optional, Intent(Out) :: ierror
or for the Fortran90 legacy mode:
MPI_Comm_size(comm, size, ierror)
INTEGER, INTENT(IN) :: comm
INTEGER, INTENT(OUT) :: size
INTEGER, OPTIONAL, INTENT(OUT) :: ierror
The syntax of using this routine is close to this specification: you write
Type(MPI_Comm) :: comm = MPI_COMM_WORLD
! legacy: Integer :: comm = MPI_COMM_WORLD
Integer :: comm = MPI_COMM_WORLD
Integer :: size,ierr
CALL MPI_Comm_size( comm, size ) ! without the optional ierr

1.5.5.3 Python

crumb trail: > mpi-started > Language bindings > How to read routine signatures > Python

The Python interface to MPI uses classes and objects. Thus, a specification like:

MPI.Comm.Send(self, buf, int dest, int tag=0)
should be parsed as follows. Some python routines are `class methods', and their specification lacks the self keyword. For instance:
MPI.Request.Waitall(type cls, requests, statuses=None)
would be used as
MPI.Request.Waitall(requests)

1.6 Review

crumb trail: > mpi-started > Review

Review What determines the parallelism of an MPI job?

  1. The size of the cluster you run on.
  2. The number of cores per cluster node.
  3. The parameters of the MPI starter ( mpiexec  , ibrun  ,\ldots)

End of review

Review T/F: the number of cores of your laptop is the limit of how many MPI proceses you can start up.
End of review

Review Do the following languages have an object-oriented interface to MPI? In what sense?

  1. C
  2. C++
  3. Fortran2008
  4. Python

End of review

Back to Table of Contents