The Guild of St Luke in Florence, where Leonardo Da Vinci qualified as a master sculptor when he was only 20 years old, had in attendance both artists and doctors of medicine. While someone today might wonder what those two vocations have in common, someone from the Italian Renaissance would not. Michelangelo and Leonardo Da Vinci were the most famous artists of their day, but they were also remarkably skilled engineers, designers, architects, and experts on human anatomy.
Like the guilds of Renaissance Europe, NVIDIA graphics cards serve multiple disciplines. They can deliver 2D/3D graphics performance to CAD/CAM engineers doing design work, medical technicians and doctors examining MRI/CT scans or tumor reconstructions, scientists performing data modeling, and a variety of graphics professionals.
Although their huge thirst for computing power and their immense appetite for data have kept them at the leading edge of computer design, graphics applications have been unable to take advantage of the revolution in virtualization. Narrow network bandwidths and localized rendering engines made it impractical. And companies were sensitive about the security of their intellectual property.
However, Cisco and Citrix have developed a solution in which graphics applications can run in virtual environments with as much performance and security as if they were running locally on high-powered graphics workstations. The solution allows graphics professionals to reap the benefits of virtualization: data remains protected in the data center, desktops are centrally provisioned, and users in different locations can remotely access the same large graphics files on a variety of devices—even on remote workstations, laptops, and tablets.
This blog describes the solution, its major components, and what’s involved in configuring it inside your data center. You can get the details about the full solution in this Cisco White Paper.
The Cisco and Citrix Solution for Virtualizing Graphics Applications
Four elements are key to the Cisco-Citrix solution:
- A combination of Citrix XenDesktop 7.5 and Cisco UCS C240 M3 rack servers that enables up to 64 VMs per server to run rich 2D/3D applications accelerated by NVIDIA GRID technology.
- Compute, network, and storage efficiency that gives each desktop (or other device) virtual GPU performance comparable to locally executing applications.
- The flexibility of the Citrix XenDesktop to run the NVIDIA GRID cards in both pass-through and vGPU modes, to configure different vGPU types, and to balance the number of vGPUs to match requirements.
- Comprehensive and centralized management of the entire system and its components via the Cisco UCS Management suite.
Major System Components
The major components of this system are:
- Cisco UCS 240 M3 Rack Servers
- Citrix XenDesktop 7.5
- NVIDIA GRID K1 or K2 Graphics Cards
- Citrix XenServer 6.2 Service Pack 1.
Cisco UCS 240 M3 Rack Servers
The Cisco UCS C240 M3 rack server is part of the Cisco Unified Computing System (UCS) family, a data center platform that unites compute, network, and storage access. The platform is optimized for virtual environments and uses open industry-standard technologies to reduce total cost of ownership. It integrates a 10 Gigabit Ethernet network fabric with enterprise-class, x86-architecture servers.
The Cisco UCS C240 M3 servers feature breakthrough compute power for demanding workloads and are rack-mountable with a compact 2RU form factor. These servers use the same stateless, streamlined provisioning and operations model as their blade server counterparts, the Cisco UCS B-Series Servers. The Cisco UCS C240 M3 servers can support either SAS, SATA, or SSD drives internally, or they can interface with third-party shared storage to meet cost, performance, and capacity requirements.
The Cisco UCS C240 M3 servers also include:
- Cisco UCS 6248UP 48-port Fabric Interconnects that supply 10-Gigabit Ethernet, Fibre Channel, and FCoE (Fibre Channel over Ethernet) connectivity
- The Cisco UCS Virtual Interface Card, a PCI Express (PCIe) adapter optimized to handle virtualization workloads of the Cisco UCS C-Series rack servers
- Cisco UCS Manager, which can be accessed through a GUI, a CLI, or an XML API to control multiple chassis and thousands of virtual machines. Administrators can use the same interfaces to manage these servers along with all other Cisco servers in the enterprise.
Citrix XenDesktop 7.5
Citrix XenDesktop 7.5 delivers Windows operating systems and high performance applications to a variety of device types with a native user experience. This XenDesktop release includes HDX enhancements (including HDX 3D Pro) to optimize virtualized application delivery on mobile devices and across limited network bandwidths. HDX 3D Pro provides GPU acceleration for Windows Desktop OS machines (provisioned as VDI desktops), and Windows Server OS machines (that use RDS). It enables an optimal user experience on wide area network (WAN) connections as low as 1.5 Mbps as well as local area network (LAN) connections.
NVIDIA GRID K1 and K2 Cards
The NVIDIA GRID K1 and K2 cards let multiple users simultaneously share GPUs that provide ultra-fast graphics displays with no lag, making a remote data center feel like it’s next door. Because the cards use the same graphics drivers that are deployed in non-virtualized environments, you can run the exact same application both locally and virtualized. The software stack—including GPU virtualization, remoting, and session-management libraries—enables efficient compression, fast streaming, and low-latency display of high-performance 2D and 3D enterprise applications.
Citrix XenServer 6.2 Service Pack 1
Citrix XenServer is an open-source virtualization platform for managing server, and desktop virtualization environments. XenServer 6.2 enables GPU sharing between multiple virtual machines. As a result, each physical GPU on the NVIDIA card can support multiple virtual GPU devices (vGPUs).
As shown in the illustration below, the NVIDIA Virtual GPU Manager running in XenServer dom0 controls the vGPUs, which are assigned directly to guest VMs:
Guest VMs use NVIDIA GRID virtual GPUs in the same manner as a physical GPU that has been passed through by the hypervisor. An NVIDIA driver loaded in the guest VM provides direct access to the GPU for performance-critical operations. Lower-performance management operations use a paravirtualized interface to the NVIDIA GRID Virtual GPU Manager.
Because resource requirements can vary, the maximum number of vGPUs that can be created on a physical GPU depends on the vGPU type, as shown in this table:
|Card||Physical GPUs||Virtual GPUs||Intended Use Case||Frame Buffer (Megabytes)||Virtual Display Heads||Max Resolution per Display Head||Maximum vGPUs|
|Per GPUs||Per Board|
|GRID K1||4||GRID K140Q||Power User||1024||2||2560×1600||4||16|
|GRID K120Q||Power User||512||2||2560×1600||8||32|
|GRID K100||Knowledge Worker||256||2||1920×1200||8||32|
|GRID K2||2||GRID K260Q||Power User, Designer||2048||4||2560×1600||2||4|
|GRID K240Q||Power User, Designer||1024||2||2560×1600||4||8|
|GRID K220Q||Power User, Designer||512||2||2560×1600||8||16|
|GRID K200||Knowledge Worker||256||2||1920X1200||8||16|
For example, an NVIDIA GRID K2 physical GPU can support up to four K240Q vGPUs on each of its two physical GPUs, for a total of eight vGPUs. However, the same card can support only two K260Q vGPUs, for a total of four vGPUs.
Configuring the Cisco-Citrix System – An Overview
These are the major steps required to configure a single VM to use the NVIDIA GRID vGPU:
- Install an NVIDIA GRID GPU card in a Cisco C240 M3 UCS server.
- Perform the base Cisco UCS configuration and, if required, upgrade the GPU firmware.
- Enable virtual machines for pass-through support by installing the pass-through GPU driver and the Citrix XenDesktop HDX 3D Pro Virtual Desktop Agent.
- Install XenServer 6.2.0 and Service Pack 1, and install the NVIDIA GRID vGPU Manager.
- Create a virtual machine and configure it with the NVIDIA vGPU type. For graphics-intensive applications, be sure to configure virtual machines running Citrix HDX 3D Pro Graphics with at least four virtual CPUs.
- Install and configure the vGPU driver on the VM guest operating system.
- Verify that the graphics applications are ready to use the vGPU.
The detailed configuration steps are provided in the full white paper.
For advanced configurations, note that the C240 M3 riser 1 is associated with the first CPU socket and riser 2 with second CPU socket. Refer to this white paper for information regarding vCPU pinning and GPU locality configurations.
Cisco, Citrix, and NVIDIA have teamed up to bring the benefits of virtualization to the users of graphics-intensive applications and the IT organizations that deploy and manage them. Combined breakthrough technologies allow graphics professionals to benefit from the remote access, data sharing, and low overhead of virtualization while experiencing the performance they demand for their graphics-intensive workloads.
For more information, see:
- The full white paper that describes the entire solution
- Cisco Unified Computing System
- Citrix XenServer and Citrix XenDesktop
- NVIDIA GRID Technology (K1 and K2 graphics cards).
With the World Cup games recently finished, I’m reminded of how rampantly soccer has swept across the U.S. in the last few years. Kids often start quite young — there are leagues for even five and six year olds! One element that helps younger kids enjoy their first soccer experience is that the balls are sized smaller in line with their height, making it easier for them to kick and control the ball. It’s an everyday example of how there can be better results when a tool is well matched with “entry-level” requirements.
Deploying an entry-level desktop virtualization solution follows similar logic. For a deployment to be successful, there must be a balance between the solution, its cost, and its ease of implementation, especially when the number of users is small. For large corporate environments with a few thousand users, it’s much easier to defray CAPEX costs across a large number of users, realize a low cost-per-seat, and rely on IT administrative staff to deploy and manage the solution. For smaller environments like branch offices or SMBs, deploying and managing a comprehensive desktop virtualization solution has generally been too complex and cost-prohibitive — until now.
Cisco and Citrix have collaborated on a new reference architecture that removes the barriers to smaller deployments, making it easy to deliver Microsoft Windows apps and desktops to a variety of client and mobile devices. Based on Cisco and Citrix technologies, the architecture creates a self-contained, easy-to-deploy, and centrally managed solution that supports 500 seats cost-effectively. This is a new Cisco and Citrix solution designed for fault-tolerant deployments of less than 1000 users, opening the door to new desktop virtualization opportunities in branch offices, SMBs, pilot projects, and test and development environments.
Citrix and Cisco test engineers validated the reference architecture and conducted a series of sizing tests using Login VSI. The testing demonstrated how the architecture can support up to 500 Medium/Knowledge Workers or 600 Light/Task Workers while delivering an outstanding end user experience. This blog gives a brief synopsis of the architecture, its benefits, the testing we conducted, and the test results. For more details, you can read the full reference architecture paper and test report here.
Figure 1 shows key solution components. Three Cisco UCS C240 M3 Rack Servers combine industry-standard, x86 servers with networking and storage access into a single converged system. The C-Series servers are part of the Cisco Unified Computing System (UCS) family of products. They have a compact 2RU form factor and use the same stateless, streamlined provisioning and operations model as Cisco UCS B-Series Blade Servers. Cisco UCS 6248UP 48-Port Fabric Interconnects supply 10-GigabitEthernet, Cisco Data Center Ethernet, Fibre Channel, and FCoE connectivity needed for the solution.
Figure 1. 500-User Architecture for Citrix XenApp 7.5 on Cisco UCS C240 M3 Rack Servers
The Citrix XenApp 7.5 release delivers a Windows OS and applications to mobile devices (including laptops, tablets, and smartphones) with a native-touch experience and high performance. In this architecture, the XenApp software delivers 500 Hosted Shared Desktop (HSD) sessions using Remote Desktop Services (RDS). Citrix XenServer 6.2 is the hypervisor that supports virtual machines (VMs) running Microsoft Windows 2012 Server for XenApp and infrastructure services.
Using local storage is essential to achieving an entry-level price point. To make that possible with just twelve 10,000RPM SAS drives, each server includes an LSI Nytro MegaRAID card containing two 100GB flash memory cards for caching I/O operations. Using the LSI Nytro flash cache in conjunction with local storage is a key differentiator for this solution, allowing it to deliver responsive performance while conserving cost.
Why the Buzz?
The reference architecture is an exciting breakthrough for these reasons:
- Self-contained, all-in-one solution. The architecture defines an entirely self-contained “in-a-box” solution with all of the infrastructure elements required for a XenApp 7.5 deployment, including Active Directory, DNS, SQL Server, and more. This takes the complexity out of deploying a desktop virtualization solution especially for small standalone environments.
- Fault-tolerant architecture. The architecture locates redundant infrastructure virtual machines across two Cisco UCS C-Series servers to optimize availability. The solution also configures N+1 XenApp servers to maintain service levels even if a XenApp server failure occurs. In addition, Microsoft Distributed File System services are used across multiple servers to protect user data on local storage.
- Easy to build, deploy, grow, and maintain. The compact design of Cisco UCS C-Series Rack Servers keeps the footprint small, making the solution easy to deploy in a small business or branch office setting. Since the C-Series servers are part of the Cisco UCS product family, they can be managed as standalone systems or alongside existing blade and rack servers using Cisco UCS Manager.
By adding Cisco UCS Central Software to the solution, companies can extend Cisco UCS Manager capabilities, allowing administrators to manage multiple Cisco UCS domains (such as domains for satellite offices) in conjunction with centrally defined policies. Both the C-Series Rack Servers and B-Series Blade Servers can be managed using the same set of management tools.
- Low cost per seat. The architecture avoids expensive flash drives, instead caching IOPs in flash memory on the LSI Nytro cards. The choice of less expensive SAS drives helps to rein in solution costs while providing excellent end user experience.
Figure 2 shows the virtual machines deployed across the three physical servers in the test configuration. Infrastructure VMs were hosted on two of the Cisco UCS C240 M3 Servers, and each server also hosted eight XenApp 7.5 HSD VMs. The redundancy across physical servers yields a highly available design.
Figure 2. Test Configuration
Table 1 lists specific components in the test configuration.
|3 x Cisco UCS C240-M3 Rack Servers (dual Intel Xeon E5-2697v2 Processors @ 2.7 GHz, 256GB of memory, one Cisco VIC1225 network adapter)||Cisco UCS Manager 2.2(1d)|
|1 x LSI Nytro MegaRAID Controller NMR 8110-4i card per server||Citrix XenApp 7.5|
|12 x 600-GB 10,000 RPM hot-swappable hard disk drives||XenServer 6.2 Hypervisors and XenCenter 6.2|
|2 x Cisco 6248UP 48-port Fabric Interconnects||Microsoft Windows Server 2012 R2, 64-bit Remote Desktop Services (5vCPU, 24GB of memory per VM)|
Local storage was organized into drive groups to create RAID 5 and 10 volumes for the hypervisor, infrastructure services, and XenApp VMs. The XenApp 7.5 VMs were provisioned with Machine Creation Service (MCS) differencing disks. MCS differencing disks are virtual hard disks that store desktop changes during Hosted Shared Desktop sessions and they incur a high number of IOPS. The LSI Nytro cards are specifically configured to accelerate IOPs for the I/O-intensive volumes that contain the MCS differencing disks.
To generate load, we used the Login VSI 3.7 software to simulate multiple users accessing the XenApp 7.5 environment and executing a typical end user workflow. Login VSI 3.7 tracks user experience statistics, looping through specific operations and measuring response times at regular intervals. Collected response times determine VSImax, the maximum number of users the test environment can support before performance degrades consistently. Because baseline response times can vary depending on the virtualization technology used, using a dynamically calculated threshold provides greater accuracy for cross-vendor comparisons. For this reason, Login VSI also reports VSImax Dynamic.
At the start of the testing, we executed performance monitoring scripts to record resource consumption for the hypervisor, virtual desktop, storage, and load generation software. At the beginning of each test run, we took the desktops out of maintenance mode, started the virtual machines, and waited for them to register. The Login VSI launchers then initiated the desktop sessions and began user logins (the ramp-up phase). Once all users were logged in, the steady state portion of the test began in which Login VSI executed the application workload, running applications like Microsoft Office, Internet Explorer (including a Flash video applet), printing, and Adobe Acrobat Reader.
The testing captured resource metrics during the entire workload lifecycle — XenApp virtual machine boot, user logon and desktop acquisition (ramp-up), user workload execution (steady state), and user logoff. Each test cycle was not considered passing unless all test users completed the ramp-up and steady state phases and all metrics were within permissible thresholds.
Two test phases were conducted:
- Finding the recommended maximum density for a single physical server. This phase validated single-server scalability under a maximum recommended density with the RDS load. The maximum recommended load for a single server occurs when CPU or memory utilization peaks at 90-95% and the end user response times remain below 4000ms. This phase was used to determine the server N+1 count for the solution.
- Validating the solution at full scale. This phase validated multiple server scalability using the full test configuration.
The first phase was executed under the Login VSI Medium workload and then the Light workload to identify VSImax for each workload type. The validation phase was executed using the Medium workload only.
Phase 1: Single Server Recommended Maximum Density
We first tested different combinations of XenApp 7.5 server VMs and virtual CPU (vCPU) combinations, finding that the best performance was achieved when the number of vCPUs assigned to the VMs did not exceed the number of hyper-threaded cores available on the server. (In other words, not overcommitting CPU resources provides the best user experience.) For the Intel E5-2697v2 processors, 24 cores with hyper-threading equates to 48 vCPUs. The highest density was observed at eight XenApp VMs per physical server, with each VM configured with five vCPUs and 24GB RAM.
The first test sequence determined VSImax for each workload on a single server, indicating the density that a single server can support before the end user experience degrades. Based on this value, we added one additional server to the total number of physical servers needed so that the full-scale configuration achieves optimal performance under normal operating conditions and enable N+1 server fault tolerance.
Medium Workload: Single Server Recommended Maximum Density
For the single server Medium Workload, guided by VSImax scores, we determined that 250 user sessions per host gave us optimal end user experience and good resource utilization. Figures 3 and 4 show end user response times and CPU utilization metrics for the Medium workload.
Figure 3. Single Server, Medium Workload, End User Response Times at 250 Sessions
Figure 4. Single Server, Medium Workload, CPU Utilization
Light Workload: Single Server Recommended Maximum Density
For the single server Light Workload, we determined that 325 user sessions per host gave us optimal end user experience and good server utilization metrics. Figures 5 and 6 show end user response times and CPU utilization metrics for the Light workload.
Figure 5. Single Server, Light Workload, End User Response Times at 325 Sessions
Figure 6. Single Server, Light Workload, CPU Utilization
Phase 2: Full-Scale Configuration Testing
Using all three Cisco UCS C240 M3 Rack Servers, we performed 500-session Login VSI Medium Workload tests to validate the solution at scale, which provided excellent results. The Login VSI Index Average and Average Response times tracked well below 2 seconds throughout the run (Figure 7), indicating an outstanding end user experience throughout the test.
Figure 7. Full-Scale Configuration, Medium Workload, End User Response Times at 500 Sessions
Figures 8 through 13 show performance data for one of the three Cisco UCS C240 M3 servers in the full configuration test. The graphs are representative of data collected for all servers in the three-server test.
Figure 8. Full-Scale Configuration, Medium Workload, CPU Utilization
Figure 9. Full-Scale Configuration, Medium Workload, IOPS
Figure 10. Full-Scale Configuration, Medium Workload, IO Throughput (Mbps)
Figure 11. Full-Scale Configuration, Medium Workload, IO Wait
Figure 12. Full-Scale Configuration, Medium Workload, IO Latency
Figure 13. Full-Scale Configuration, Medium Workload, IO Ave. Queue Length
What about XenDesktop?
Given the same hardware configuration, are you curious how well XenDesktop with Windows 7 virtual desktops perform? Or, perhaps, a 500-seat deployment is initially too much and you just want to “kick some tires” with a single UCS server. In either case, here’s a 200-seat XenDesktop reference architecture that provides the same server specifications and configuration as the 500-seat XenApp configuration discussed above: Deploy 200 Citrix XenDesktop 7.1 Hosted Virtual Desktops on Cisco UCS C240 M3 Rack Server with LSI Nytro MegaRAID and SAS Drives.
Desktop virtualization is an efficient way to deliver the latest Microsoft Windows OS and applications not only to traditional client PCs, but also to the user’s choice of mobile device types. At the same time, desktop virtualization centralizes and protects corporate data and intellectual property, simplifying desktop and OS management. Until now, it’s been difficult for small to medium-sized organizations to realize these advantages because of the complexity and up-front costs associated with building out a pilot or entry-level configuration.
Because this low-cost configuration enables a 100% self-contained solution, it overcomes previous obstacles to deploying desktop virtualization in small business or branch office settings. The architecture provides an extremely easy-to-deploy, fault tolerant, Cisco UCS-managed infrastructure for Citrix XenApp 7.5 hosted shared desktops. For many, the solution greatly simplifies the entry point into desktop virtualization, making it easier to build out and manage a 500-seat standalone deployment.
To read more about the 500-seat XenApp 7.5 reference architecture and the validation testing, see the full white paper: Reference Architecture for 500-Seat Citrix XenApp 7.5 Deployment on Cisco UCS C240-M3 Rack Servers with On-Board SAS Storage and LSI Nytro MegaRAID Controller.
— Frank Anderson, Senior Solutions Architect, Cisco Systems, Inc.
You’re probably thinking I have the best job in Silicon Valley. Last month I was running the Cisco UCS – with Citrix XenDesktop demos at Citrix Synergy in Anaheim. This week it’s time for Cisco Live 2013 at the Orlando Convention Center. So you’re right I do have an enviable job, bringing together the best of Cisco and Citrix technologies that help customers work better, with more flexibility and greater security – and also having some fun in the process. If like me, you are fortunate enough to be attending, I am sure you are looking forward to the John Chambers’s keynote, the Super Sessions and the Cisco Party.
In addition you will have another great opportunity to check out and experience the latest innovations in our Cisco Desktop Virtualization Solutions portfolio for Citrix XenDesktop.I want to spend the next few minutes taking you through a virtual tour of the Cisco and Citrix presents at the event. Let’s pick up action in the Cisco Unified Data Center (UDC) Booth 758.
The Desktop Virtualization demos in the Cisco UCS booth are at the center of all the action.. We have two cool demos. One of these demos features the Cisco UCS Storage Accelerator with Citrix XenDesktop, that showcases how performance intensive Citrix write-cache can be placed locally on the UCS Blade servers. You guessed it, this eliminates the need for expensive SAN storage. Come, learn and experience, how you can achieve 50% reduction in SAN costs, increased IOPS and all the jazz that comes with Cisco Storage Accelerator and Citrix XenDesktops.
Also catch my good colleague Ashok Rajagopalan presenting on “Deploy VDI with Higher Performance and Lower TCO with Cisco UCS”. Do not miss this presentation, particularly as the outline touches the architectural approach to Cisco Desktop Virtualization, Performance Optimization with Cisco UCS VM-FEX, Manageability simplification and UCS-Nvidia GPU integration, among others. Ashok has a busy schedule at the event. His breakout session BRKVIR – 2022 titled, “Transformation of server caching in Desktop Virtualization, Big Data and Database workloads” is fast filling up and I recommend you register quickly.
As a teaser , here is quick snapshot that I made at Citrix Synergy on these topics
For additional insights, check out Ashok’s Video Blog on benefits of Cisco Desktop Virtualization Solutions