Today’s guest post is from Rolf vandeVaart, a Senior CUDA Engineer with NVIDIA.
GPUs are becoming quite popular as accelerators in High Performance Computing clusters. For example, check out Titan; a recent entry into the Top 500 list from Oak Ridge Laboratories. Titan has 18,688 nodes (299,008 CPU cores) coupled with 18,688 NVIDIA Tesla K20 GPUs.
To help ease the programming burden working with GPU memory in MPI applications, support has been added to several MPI libraries such that the MPI library can directly send and receive the GPU buffers without the user having to stage them in host memory first. This has sometimes been referred to as “CUDA-aware MPI.”