PETSc GPU support

Experimental html version of Parallel Programming in MPI, OpenMP, and PETSc by Victor Eijkhout. download the textbook at https:/theartofhpc.com/pcse

\[ \newcommand\inv{^{-1}}\newcommand\invt{^{-t}} \newcommand\bbP{\mathbb{P}} \newcommand\bbR{\mathbb{R}} \newcommand\defined{ \mathrel{\lower 5pt \hbox{${\equiv\atop\mathrm{\scriptstyle D}}$}}} \] 37.1 : Installation with GPUs
37.2 : Setup for GPU
37.3 : Distributed objects
37.3.1 : Vectors
37.3.2 : Matrices
37.3.3 : Array access
37.4 : Other
Back to Table of Contents

37 PETSc GPU support

37.1 Installation with GPUs

crumb trail: > petsc-gpu > Installation with GPUs

PETSc can be configured with options

--with-cuda   --with-cudac=nvcc?
You can test the presence of CUDA with:

// cudainstalled.c
#ifndef PETSC_HAVE_CUDA
#error "CUDA is not installed in this version of PETSC"
#endif

Some GPUs can accomodate MPI by being directly connected to the network through GPUDirect RMA  . If not, use this runtime option:

-use_gpu_aware_mpi 0
More conveniently, add this to your .petscrc file; section  38.3.3  .

37.2 Setup for GPU

crumb trail: > petsc-gpu > Setup for GPU

GPUs need to be initialized. This can be done implicitly when a GPU object is created, or explicitly through PetscDeviceInitialize  . (PETSc versions before \petscstandard{3.17} had an explicit routine PetscCUDAInitialize  .)

// cudainit.c
PetscDeviceType cuda = PETSC_DEVICE_CUDA;
ierr = PetscDeviceInitialize(cuda); 
PetscBool has_cuda;
has_cuda = PetscDeviceInitialized(cuda); 

37.3 Distributed objects

crumb trail: > petsc-gpu > Distributed objects

Objects such as matrices and vectors need to be create explicitly with a CUDA type. After that, most PETSc calls are independent of the presence of GPUs.

Should you need to test, there is a CPP macro PETSC_HAVE_CUDA  .

37.3.1 Vectors

crumb trail: > petsc-gpu > Distributed objects > Vectors

Analogous to vector creation as before, there are specific create calls VecCreateSeqCUDA  , VecCreateMPICUDAWithArray  , or the type can be set in VecSetType :

// kspcu.c
#ifdef PETSC_HAVE_CUDA
  ierr = VecCreateMPICUDA(comm,localsize,PETSC_DECIDE,&Rhs); 
#else
  ierr = VecCreateMPI(comm,localsize,PETSC_DECIDE,&Rhs); 
#endif

The type VECCUDA is sequential or parallel dependent on the run; specific types are VECSEQCUDA  , VECMPICUDA  .

37.3.2 Matrices

crumb trail: > petsc-gpu > Distributed objects > Matrices

ierr = MatCreate(comm,&A); 
#ifdef PETSC_HAVE_CUDA
ierr = MatSetType(A,MATMPIAIJCUSPARSE); 
#else
ierr = MatSetType(A,MATMPIAIJ); 
#endif

Dense matrices can be created with specific calls MatCreateDenseCUDA  , MatCreateSeqDenseCUDA  , or by setting types MATDENSECUDA  , MATSEQDENSECUDA  , MATMPIDENSECUDA  .

Sparse matrices: MATAIJCUSPARSE which is sequential or distributed depending on how the program is started. Specific types are: MATMPIAIJCUSPARSE  , MATSEQAIJCUSPARSE  .

37.3.3 Array access

crumb trail: > petsc-gpu > Distributed objects > Array access

All sorts of `array' operations such as MatDenseCUDAGetArray  , VecCUDAGetArray  ,

Set PetscMalloc to use the GPU: PetscMallocSetCUDAHost  , and switch back with PetscMallocResetCUDAHost  .

37.4 Other

crumb trail: > petsc-gpu > Other

The memories of a CPU and GPU are not coherent. This means that routines such as PetscMalloc1 can not immediately be used for GPU allocation. Use the routines PetscMallocSetCUDAHost and PetscMallocResetCUDAHost to switch the allocator to GPU memory and back.

// cudamatself.c
Mat cuda_matrix;
PetscScalar *matdata;
ierr = PetscMallocSetCUDAHost(); 
ierr = PetscMalloc1(global_size*global_size,&matdata); 
ierr = PetscMallocResetCUDAHost(); 
ierr = MatCreateDenseCUDA
  (comm,
   global_size,global_size,global_size,global_size,
   matdata,
   &cuda_matrix); 

Back to Table of Contents