<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Modern GPU Integration in MPI</title>
	<atom:link href="http://blogs.cisco.com/performance/modern-gpu-integration-in-mpi/feed/" rel="self" type="application/rss+xml" />
	<link>http://blogs.cisco.com/performance/modern-gpu-integration-in-mpi/</link>
	<description></description>
	<lastBuildDate>Wed, 19 Jun 2013 19:06:00 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
	<item>
		<title>By: Jeff Squyres</title>
		<link>http://blogs.cisco.com/performance/modern-gpu-integration-in-mpi/#comment-698366</link>
		<dc:creator>Jeff Squyres</dc:creator>
		<pubDate>Sat, 09 Feb 2013 13:21:03 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.cisco.com/?p=100543#comment-698366</guid>
		<description><![CDATA[Here&#039;s a reply from Fab Tillier, Microsoft HPC MPI developer:

You state that GPU buffers are similar to RDMA buffers.  I wholehearted disagree. The former is a conscious choice by the user, while the latter is an internal implementation issue in the MPI library.  A user of MPI should never know that they&#039;re using RDMA under the covers, and memory registration is a burden that the MPI implementation takes on in exchange for better performance.

Said another way, a program using a GPU *knows* it&#039;s using a GPU, and usually exactly which GPU vendor to boot.  Such a program could just as easily call

MPI_Send(gpuPtr, NUM_ELEMENTS, OMPI_GPU_DOUBLE, dest, tag, comm);

You might argue that an implementation-specific datatype handle makes the program no longer portable, and I&#039;ll counter that to allow using a GPU pointer in the first place requires knowledge that the MPI library supports GPU pointers, and loses portability just the same, but with potentially more difficult error manifestation (imagine treating a GPU pointer as a host pointer!)

Anyway, I think the end results will be better if the application is honest with the MPI library that it is using GPU buffers, rather than trying to deduce it from the pointer value.  The portability issue could be resolved by the MPI Forum via standardizing the GPU datatypes, of defining a way of tagging a datatype&#039;s buffer type.

Some other optimizations are GPUDirect RDMA (http://docs.nvidia.com/cuda/gpudirect-rdma/index.html) where the HCA could do peer-to-peer data transfers directly to/from GPU memory.  This is independent of how the MPI library discovers that a buffer is a GPU buffer (whether by querying CUDA, or being told by the user).]]></description>
		<content:encoded><![CDATA[<p>Here&#8217;s a reply from Fab Tillier, Microsoft HPC MPI developer:</p>
<p>You state that GPU buffers are similar to RDMA buffers.  I wholehearted disagree. The former is a conscious choice by the user, while the latter is an internal implementation issue in the MPI library.  A user of MPI should never know that they&#8217;re using RDMA under the covers, and memory registration is a burden that the MPI implementation takes on in exchange for better performance.</p>
<p>Said another way, a program using a GPU *knows* it&#8217;s using a GPU, and usually exactly which GPU vendor to boot.  Such a program could just as easily call</p>
<p>MPI_Send(gpuPtr, NUM_ELEMENTS, OMPI_GPU_DOUBLE, dest, tag, comm);</p>
<p>You might argue that an implementation-specific datatype handle makes the program no longer portable, and I&#8217;ll counter that to allow using a GPU pointer in the first place requires knowledge that the MPI library supports GPU pointers, and loses portability just the same, but with potentially more difficult error manifestation (imagine treating a GPU pointer as a host pointer!)</p>
<p>Anyway, I think the end results will be better if the application is honest with the MPI library that it is using GPU buffers, rather than trying to deduce it from the pointer value.  The portability issue could be resolved by the MPI Forum via standardizing the GPU datatypes, of defining a way of tagging a datatype&#8217;s buffer type.</p>
<p>Some other optimizations are GPUDirect RDMA (<a href="http://docs.nvidia.com/cuda/gpudirect-rdma/index.html" rel="nofollow">http://docs.nvidia.com/cuda/gpudirect-rdma/index.html</a>) where the HCA could do peer-to-peer data transfers directly to/from GPU memory.  This is independent of how the MPI library discovers that a buffer is a GPU buffer (whether by querying CUDA, or being told by the user).
<p class="comment-like"><img class="comment-like-btn" title="Vote" onclick="cl_like_this('http://blogs.cisco.com/wp-admin/admin-ajax.php',698366)" src="http://blogs.cisco.com/wp-content/plugins/comments-likes/images/like.png" />&nbsp;&nbsp;&nbsp;<span id="comment-like-cnt-698366">0</span> likes</p>
]]></content:encoded>
	</item>
</channel>
</rss>
