crumb trail: > petsc-gpu > Installation with GPUs
PETSc can be configured with options
--with-cuda --with-cudac=nvcc?You can test the presence of CUDA with:
// cudainstalled.c #ifndef PETSC_HAVE_CUDA #error "CUDA is not installed in this version of PETSC" #endif
Some GPUs can accomodate MPI by being directly connected to the network through GPUDirect RMA . If not, use this runtime option:
-use_gpu_aware_mpi 0More conveniently, add this to your .petscrc file; section 38.3.3 .
crumb trail: > petsc-gpu > Setup for GPU
GPUs need to be initialized. This can be done implicitly when a GPU object is created, or explicitly through PetscDeviceInitialize . (PETSc versions before \petscstandard{3.17} had an explicit routine PetscCUDAInitialize .)
// cudainit.c PetscDeviceType cuda = PETSC_DEVICE_CUDA; ierr = PetscDeviceInitialize(cuda); PetscBool has_cuda; has_cuda = PetscDeviceInitialized(cuda);
crumb trail: > petsc-gpu > Distributed objects
Objects such as matrices and vectors need to be create explicitly with a CUDA type. After that, most PETSc calls are independent of the presence of GPUs.
Should you need to test, there is a CPP macro PETSC_HAVE_CUDA .
crumb trail: > petsc-gpu > Distributed objects > Vectors
Analogous to vector creation as before, there are specific create calls VecCreateSeqCUDA , VecCreateMPICUDAWithArray , or the type can be set in VecSetType :
// kspcu.c #ifdef PETSC_HAVE_CUDA ierr = VecCreateMPICUDA(comm,localsize,PETSC_DECIDE,&Rhs); #else ierr = VecCreateMPI(comm,localsize,PETSC_DECIDE,&Rhs); #endif
The type VECCUDA is sequential or parallel dependent on the run; specific types are VECSEQCUDA , VECMPICUDA .
crumb trail: > petsc-gpu > Distributed objects > Matrices
ierr = MatCreate(comm,&A); #ifdef PETSC_HAVE_CUDA ierr = MatSetType(A,MATMPIAIJCUSPARSE); #else ierr = MatSetType(A,MATMPIAIJ); #endif
Dense matrices can be created with specific calls MatCreateDenseCUDA , MatCreateSeqDenseCUDA , or by setting types MATDENSECUDA , MATSEQDENSECUDA , MATMPIDENSECUDA .
Sparse matrices: MATAIJCUSPARSE which is sequential or distributed depending on how the program is started. Specific types are: MATMPIAIJCUSPARSE , MATSEQAIJCUSPARSE .
crumb trail: > petsc-gpu > Distributed objects > Array access
All sorts of `array' operations such as MatDenseCUDAGetArray , VecCUDAGetArray ,
Set PetscMalloc to use the GPU: PetscMallocSetCUDAHost , and switch back with PetscMallocResetCUDAHost .
crumb trail: > petsc-gpu > Other
The memories of a CPU and GPU are not coherent. This means that routines such as PetscMalloc1 can not immediately be used for GPU allocation. Use the routines PetscMallocSetCUDAHost and PetscMallocResetCUDAHost to switch the allocator to GPU memory and back.
// cudamatself.c Mat cuda_matrix; PetscScalar *matdata; ierr = PetscMallocSetCUDAHost(); ierr = PetscMalloc1(global_size*global_size,&matdata); ierr = PetscMallocResetCUDAHost(); ierr = MatCreateDenseCUDA (comm, global_size,global_size,global_size,global_size, matdata, &cuda_matrix);