<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Followup to &#8220;The Common Communications Interface (CCI)&#8221;</title>
	<atom:link href="http://blogs.cisco.com/performance/followup-to-the-common-communications-interface-cci/feed/" rel="self" type="application/rss+xml" />
	<link>http://blogs.cisco.com/performance/followup-to-the-common-communications-interface-cci/</link>
	<description></description>
	<lastBuildDate>Tue, 21 May 2013 13:57:03 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
	<item>
		<title>By: Jeff Squyres</title>
		<link>http://blogs.cisco.com/performance/followup-to-the-common-communications-interface-cci/#comment-597323</link>
		<dc:creator>Jeff Squyres</dc:creator>
		<pubDate>Fri, 01 Jun 2012 18:37:44 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.cisco.com/?p=70961#comment-597323</guid>
		<description><![CDATA[Just to followup: a CCI beta 1 tarball has been publicly posted on the CCI Forum web page: http://cci-forum.com/ (under the &quot;Getting Started&quot; page).]]></description>
		<content:encoded><![CDATA[<p>Just to followup: a CCI beta 1 tarball has been publicly posted on the CCI Forum web page: <a href="http://cci-forum.com/" rel="nofollow">http://cci-forum.com/</a> (under the &#8220;Getting Started&#8221; page).
<p class="comment-like"><img class="comment-like-btn" title="Vote" onclick="cl_like_this('http://blogs.cisco.com/wp-admin/admin-ajax.php',597323)" src="http://blogs.cisco.com/wp-content/plugins/comments-likes/images/like.png" />&nbsp;&nbsp;&nbsp;<span id="comment-like-cnt-597323">0</span> likes</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Fab Tillier</title>
		<link>http://blogs.cisco.com/performance/followup-to-the-common-communications-interface-cci/#comment-597024</link>
		<dc:creator>Fab Tillier</dc:creator>
		<pubDate>Thu, 31 May 2012 16:25:01 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.cisco.com/?p=70961#comment-597024</guid>
		<description><![CDATA[Hi Scott,

&gt; currently one major API works across all O/Ses and that is Sockets

Just to be clear, this is not just Sockets, but synchronous Sockets (blocking or non-blocking, but not aio/overlapped).  There is a lot to be gained by moving away from synchronous I/O, though the learning curve for async I/O can be steep.  Throw in the concept of scatter/gather and it gets even steeper.

High performance asynchronous I/O applications in Windows can benefit from using I/O completion ports, rather than using per-I/O event objects and WaitForMultipleObjects (which has a limit to how many objects can be waited on concurrently).  I/O completion ports allow aggregating completions from multiple files or sockets, and the application can poll the completion port, or block on it waiting for any I/O completion to be added.  Multiple threads can poll events form an I/O completion port.  MSMPI today uses I/O completion ports internally, so that we can block for completions if we exceed our polling limit. MSMPI supports blocking for completions for all of our communcation channels, SHM, NetworkDirect, and Sockets.

It would be great to allow CCI endpoints to be associated with a user&#039;s I/O completion port, allowing users to get completions for CCI events side by side with their file completions, all through one function (GetQueuedCompletionStatus).  There are design issues that you&#039;ll need to work out, though (who provides the OVERLAPPED structure that identifies the I/O operation and is returned by GetQueueCompletionStatus, for example).

Cheers,
-Fab]]></description>
		<content:encoded><![CDATA[<p>Hi Scott,</p>
<p>&gt; currently one major API works across all O/Ses and that is Sockets</p>
<p>Just to be clear, this is not just Sockets, but synchronous Sockets (blocking or non-blocking, but not aio/overlapped).  There is a lot to be gained by moving away from synchronous I/O, though the learning curve for async I/O can be steep.  Throw in the concept of scatter/gather and it gets even steeper.</p>
<p>High performance asynchronous I/O applications in Windows can benefit from using I/O completion ports, rather than using per-I/O event objects and WaitForMultipleObjects (which has a limit to how many objects can be waited on concurrently).  I/O completion ports allow aggregating completions from multiple files or sockets, and the application can poll the completion port, or block on it waiting for any I/O completion to be added.  Multiple threads can poll events form an I/O completion port.  MSMPI today uses I/O completion ports internally, so that we can block for completions if we exceed our polling limit. MSMPI supports blocking for completions for all of our communcation channels, SHM, NetworkDirect, and Sockets.</p>
<p>It would be great to allow CCI endpoints to be associated with a user&#8217;s I/O completion port, allowing users to get completions for CCI events side by side with their file completions, all through one function (GetQueuedCompletionStatus).  There are design issues that you&#8217;ll need to work out, though (who provides the OVERLAPPED structure that identifies the I/O operation and is returned by GetQueueCompletionStatus, for example).</p>
<p>Cheers,<br />
-Fab
<p class="comment-like"><img class="comment-like-btn" title="Vote" onclick="cl_like_this('http://blogs.cisco.com/wp-admin/admin-ajax.php',597024)" src="http://blogs.cisco.com/wp-content/plugins/comments-likes/images/like.png" />&nbsp;&nbsp;&nbsp;<span id="comment-like-cnt-597024">0</span> likes</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Scott Atchley</title>
		<link>http://blogs.cisco.com/performance/followup-to-the-common-communications-interface-cci/#comment-596997</link>
		<dc:creator>Scott Atchley</dc:creator>
		<pubDate>Thu, 31 May 2012 13:21:28 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.cisco.com/?p=70961#comment-596997</guid>
		<description><![CDATA[Fab, currently one major API works across all O/Ses and that is Sockets. No one would argue that Sockets exposes the capabilities of today&#039;s networking hardware (no zero-copy, no OS bypass). If an application will only ever use Sockets and only run on standard Ethernet NICs, then there is little to gain by using CCI. If the application could run on more capable hardware (whether over Ethernet or another fabric), then CCI might make sense.

Verbs works on most O/Ses and provides access to modern networking features, yet no one would argue that it is a simple API. We believe there is a middle ground.

Forgive me for not knowing the optimal I/O model for Windows, perhaps you could give me an example. CCI is inherently an asynchronous API (similar to MPI). You initiate a send (small message or RMA) and poll for completion. If the app prefers to block via a native O/S method (e.g. select(), poll(), epoll(), kqueue(), WSA*(), etc.), CCI can provide a native OS handle to the application. If CCI does not allow for a high performance implementation in Windows, we would be very interested in what changes CCI would need in order to provide one.

CCI gives the choice to the app. If the app does not want additional threads, it must poll for completions and to ensure progress if the underlying hardware does not provide progress. If the app does not want to burn the cycles to poll, it can block which requires someone to check for completions and signal the blocker. Whether that someone is a progress thread or the kernel is an implementation detail.

There are trade-offs for simplicity, portability, and performance and we hope that CCI provides the ability for the app to choose the best combination given its needs.]]></description>
		<content:encoded><![CDATA[<p>Fab, currently one major API works across all O/Ses and that is Sockets. No one would argue that Sockets exposes the capabilities of today&#8217;s networking hardware (no zero-copy, no OS bypass). If an application will only ever use Sockets and only run on standard Ethernet NICs, then there is little to gain by using CCI. If the application could run on more capable hardware (whether over Ethernet or another fabric), then CCI might make sense.</p>
<p>Verbs works on most O/Ses and provides access to modern networking features, yet no one would argue that it is a simple API. We believe there is a middle ground.</p>
<p>Forgive me for not knowing the optimal I/O model for Windows, perhaps you could give me an example. CCI is inherently an asynchronous API (similar to MPI). You initiate a send (small message or RMA) and poll for completion. If the app prefers to block via a native O/S method (e.g. select(), poll(), epoll(), kqueue(), WSA*(), etc.), CCI can provide a native OS handle to the application. If CCI does not allow for a high performance implementation in Windows, we would be very interested in what changes CCI would need in order to provide one.</p>
<p>CCI gives the choice to the app. If the app does not want additional threads, it must poll for completions and to ensure progress if the underlying hardware does not provide progress. If the app does not want to burn the cycles to poll, it can block which requires someone to check for completions and signal the blocker. Whether that someone is a progress thread or the kernel is an implementation detail.</p>
<p>There are trade-offs for simplicity, portability, and performance and we hope that CCI provides the ability for the app to choose the best combination given its needs.
<p class="comment-like"><img class="comment-like-btn" title="Vote" onclick="cl_like_this('http://blogs.cisco.com/wp-admin/admin-ajax.php',596997)" src="http://blogs.cisco.com/wp-content/plugins/comments-likes/images/like.png" />&nbsp;&nbsp;&nbsp;<span id="comment-like-cnt-596997">0</span> likes</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff Squyres</title>
		<link>http://blogs.cisco.com/performance/followup-to-the-common-communications-interface-cci/#comment-596965</link>
		<dc:creator>Jeff Squyres</dc:creator>
		<pubDate>Thu, 31 May 2012 10:34:28 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.cisco.com/?p=70961#comment-596965</guid>
		<description><![CDATA[Sure; I&#039;ve got your email address because you posted a comment, so I&#039;ll follow up with you in email.]]></description>
		<content:encoded><![CDATA[<p>Sure; I&#8217;ve got your email address because you posted a comment, so I&#8217;ll follow up with you in email.
<p class="comment-like"><img class="comment-like-btn" title="Vote" onclick="cl_like_this('http://blogs.cisco.com/wp-admin/admin-ajax.php',596965)" src="http://blogs.cisco.com/wp-content/plugins/comments-likes/images/like.png" />&nbsp;&nbsp;&nbsp;<span id="comment-like-cnt-596965">0</span> likes</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Fab Tillier</title>
		<link>http://blogs.cisco.com/performance/followup-to-the-common-communications-interface-cci/#comment-596923</link>
		<dc:creator>Fab Tillier</dc:creator>
		<pubDate>Thu, 31 May 2012 03:53:25 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.cisco.com/?p=70961#comment-596923</guid>
		<description><![CDATA[I&#039;m not sure that a single abstraction layer makes sense for portable (cross O/S) code, especially if your application does things outside of networking (say file I/O).  For example, the I/O model for highest performance is going to be different between Linux and Windows.

I also have doubts about whether an API that requires the underlying library to have threads (in order to signal/deliver events) is a good approach, as it requires surfacing a bunch of &#039;knobs&#039; for the application to control the threading policy (number of threads, affinity, priority, etc).

Point being that simplicity, protability, and performance often conflict with one another.

-Fab]]></description>
		<content:encoded><![CDATA[<p>I&#8217;m not sure that a single abstraction layer makes sense for portable (cross O/S) code, especially if your application does things outside of networking (say file I/O).  For example, the I/O model for highest performance is going to be different between Linux and Windows.</p>
<p>I also have doubts about whether an API that requires the underlying library to have threads (in order to signal/deliver events) is a good approach, as it requires surfacing a bunch of &#8216;knobs&#8217; for the application to control the threading policy (number of threads, affinity, priority, etc).</p>
<p>Point being that simplicity, protability, and performance often conflict with one another.</p>
<p>-Fab
<p class="comment-like"><img class="comment-like-btn" title="Vote" onclick="cl_like_this('http://blogs.cisco.com/wp-admin/admin-ajax.php',596923)" src="http://blogs.cisco.com/wp-content/plugins/comments-likes/images/like.png" />&nbsp;&nbsp;&nbsp;<span id="comment-like-cnt-596923">0</span> likes</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Phil Miller</title>
		<link>http://blogs.cisco.com/performance/followup-to-the-common-communications-interface-cci/#comment-596893</link>
		<dc:creator>Phil Miller</dc:creator>
		<pubDate>Thu, 31 May 2012 00:13:16 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.cisco.com/?p=70961#comment-596893</guid>
		<description><![CDATA[I&#039;m one of the developers of the Charm++ runtime system for HPC applications, among which NAMD is the most widely-used. For the last 15 years, we&#039;ve maintained our own native machine layers (Elan, Myrinet, SHMEM, Infiniband, Blue Gene DCMF/PAMI, LAPI, uGNI) because of a huge impedance mismatch between our execution model and what MPI provides. We&#039;ve been hoping that various past proposals and projects would get some traction (e.g. GASnet), but none seems to have taken off. Could we get access to the CCI beta to do a port, see how it performs, and possibly contribute our expertise to this shared foundation?]]></description>
		<content:encoded><![CDATA[<p>I&#8217;m one of the developers of the Charm++ runtime system for HPC applications, among which NAMD is the most widely-used. For the last 15 years, we&#8217;ve maintained our own native machine layers (Elan, Myrinet, SHMEM, Infiniband, Blue Gene DCMF/PAMI, LAPI, uGNI) because of a huge impedance mismatch between our execution model and what MPI provides. We&#8217;ve been hoping that various past proposals and projects would get some traction (e.g. GASnet), but none seems to have taken off. Could we get access to the CCI beta to do a port, see how it performs, and possibly contribute our expertise to this shared foundation?
<p class="comment-like"><img class="comment-like-btn" title="Vote" onclick="cl_like_this('http://blogs.cisco.com/wp-admin/admin-ajax.php',596893)" src="http://blogs.cisco.com/wp-content/plugins/comments-likes/images/like.png" />&nbsp;&nbsp;&nbsp;<span id="comment-like-cnt-596893">0</span> likes</p>
]]></content:encoded>
	</item>
</channel>
</rss>
