PETSc can be configured with options
--with-cuda --with-cudac=nvcc?You can test the presence of CUDA with:
// cudainstalled.c #ifndef PETSC_HAVE_CUDA #error "CUDA is not installed in this version of PETSC" #endif
Some GPUs can accomodate MPI by being directly connected to the network through GPUDirect
RMA . If not, use this runtime option:
-use_gpu_aware_mpi 0More conveniently, add this to your .petscrc file; section 38.3.3 .
GPUs need to be initialized. This can be done implicitly when a GPU object is created, or explicitly through PetscCUDAInitialize .
// cudainit.c ierr = PetscCUDAInitialize(comm,PETSC_DECIDE); ierr = PetscCUDAInitializeCheck();
Objects such as matrices and vectors need to be create explicitly with a CUDA type. After that, most PETSc calls are independent of the presence of GPUs.
Should you need to test, there is a CPP macro PETSC_HAVE_CUDA .
Analogous to vector creation as before, there are specific create calls VecCreateSeqCUDA , VecCreateMPICUDAWithArray , or the type can be set in VecSetType :
// kspcu.c #ifdef PETSC_HAVE_CUDA ierr = VecCreateMPICUDA(comm,localsize,PETSC_DECIDE,&Rhs); #else ierr = VecCreateMPI(comm,localsize,PETSC_DECIDE,&Rhs); #endif
The type VECCUDA
is sequential or parallel dependent on the run; specific types are VECSEQCUDA , VECMPICUDA .
ierr = MatCreate(comm,&A); #ifdef PETSC_HAVE_CUDA ierr = MatSetType(A,MATMPIAIJCUSPARSE); #else ierr = MatSetType(A,MATMPIAIJ); #endif
Dense matrices can be created with specific calls MatCreateDenseCUDA , MatCreateSeqDenseCUDA , or by setting types MATDENSECUDA , MATSEQDENSECUDA , MATMPIDENSECUDA .
Sparse matrices: MATAIJCUSPARSE which is sequential or distributed depending on how the program is started. Specific types are: MATMPIAIJCUSPARSE , MATSEQAIJCUSPARSE .
All sorts of `array' operations such as MatDenseCUDAGetArray , VecCUDAGetArray ,
Set PetscMalloc to use the GPU: PetscMallocSetCUDAHost , and switch back with PetscMallocResetCUDAHost .
The memories of a CPU and GPU are not coherent. This means that routines such as PetscMalloc1 can not immediately be used for GPU allocation. Use the routines PetscMallocSetCUDAHost and PetscMallocResetCUDAHost to switch the allocator to GPU memory and back.
// cudamatself.c Mat cuda_matrix; PetscScalar *matdata; ierr = PetscMallocSetCUDAHost(); ierr = PetscMalloc1(global_size*global_size,&matdata); ierr = PetscMallocResetCUDAHost(); ierr = MatCreateDenseCUDA (comm, global_size,global_size,global_size,global_size, matdata, &cuda_matrix);