With quite a few VMware Horizon View environments under our belts, we've all seen the beauty of the PCoIP protocol.
Imagine trying to push all of these users back to the old school Terminal Server without the ability to render video and reproduce audio in real time. Today, the remote experience offers stunning high definition video at or near thirty frames per second with high quality audio to match. The Terminal Server was barely capable of offering a comfortable experience while scrolling down a static webpage. How things have changed.
The typical Horizon View environment in the business world sees a lot of slow moving business apps that don't really apply to the statements above. While there might be a little YouTube here and there, it's unlikely that the whole environment will be rendering video simultaneously. This leaves the small number of users who might render video with ample CPU cycles to get the job done - but what if the whole environment uses video-intensive applications all day long? A school, for example, that sits each student in front of a virtual desktop to watch multimedia lessons and presentations all day long will put your host CPUs to work. That's a big difference from the call center or insurance agency that has the majority of its pixel change coming from a blinking curser.
The thing to understand about View is that the CPU load during audio/video intensive applications could be double that of a physical session. While Flash rendering is happening on the View VM the PCoIP server has to render everything again. It's likely that a dual-CPU View VM will consume near 100% of its CPU while rendering even a 640x480 video.
The screen shots above are from a VM in a slow-moving business View environment while playing a 1080p video on YouTube. There are ample CPU cycles available so it's able to get through the task easily. Imagine multiplying this by sixty users on a single host. We would see the CPU graph pushed up to the top as the CPU ready time climbs, video becomes choppy and audio stutters.
Envision Technology Advisors was contracted to build such an environment recently for The Village Green Virtual Public Charter School (see more on this project). We were given the ability to do it from the ground-up using our preferred technologies from top to bottom. The environment was relatively small with three hosts dedicated to serving View desktops to approximately 160 users. At around 53 users per host, we were well under the real capabilities of each host leaving an easy N+1. Testing looked great while the environment was being built, but we didn't fully know what we were up against until go-live day. Every user in the environment logged in and opened a multimedia based presentation app that uses a combination of rendering technologies including Flash, Java and QuickTime. We were immediately facing a huge challenge while all three hosts CPUs went to 100% and users complained of video that was choppy at best and audio stuttered. We had to provide a solution and we had to do it quick.
The solution to our problem needed to be capable of offloading at least one of the CPU intensive services shown in the above screen shot. We had two options to consider as NVIDIA offers the GRID card for GPU offload and Teradici offers the APEX 2800 for PCoIP offload. The situation was complicated a bit as we're using a Dell M1000e chassis, so the standard PCIe cards weren't going to fit, and really the GRID cards are built for DirectX 9 and OpenGL improvements. Fortunately, we quickly found out that the Apex 2800 had just become available in a mezzanine-based format for the Dell M-series blades…although distribution hadn't started in the US yet. Through the help of our friends at Dell we were able to get them shipped over from the UK so we were the first to put them into production stateside.
Apparently the NVIDIA card is available in mezzanine format also, but we didn't have room for it as we were already sacrificing a network card for the Apex. Having both would have been pretty slick, but it was one or the other. The image below shows the Apex 2800 (blue) installed in the M series blade.
Once the cards were physically installed, hosts were booted and VLANs were reconfigured since we lost four NICs per blade, thus the driver installation tasks were all that remained.
Installation was quick and simple. On the host, after copying the installers up to the /tmp directory, the install command is:
esxcli software vib install -d /tmp/apex2800-2.3.1-rel-esxi-5.1.0-024870.zip
After rebooting the host a successful installation can be confirmed by running the following command:
The next step is to install the driver to the VDI master image and recompose. The Windows driver installer at the time of this writing is: apex2800-2.3.0-rel-23839.exe.
The only other setting, which is on by default in Horizon View 5.2, is under Policies/Global Policies in the Horizon View Administrator. Simply edit the policy and ensure that "PCoIP hardware acceleration" is set to "Allow".
There are a number of command line utilities to control and monitor the offload card including a function that puts a nifty little red square at the upper left corner of the virtual desktop's screen indicating that hardware acceleration is being used. Additionally, it will produce a blue square indicating that acceleration is not being used. To enable this indicator run the following command from each host with an offload card(s).
pcoip-ctrl -P "offload_indicator 1"
To disable the indicator:
pcoip-ctrl -P "offload_indicator 0"
To see general information about the card:
All said and done, the results were a success. While CPU graphs didn't show a major improvement, the user experience went from what were close to unusable desktops to near perfect. Prior to installing the cards, the system was so CPU-bound that the virtual desktops simply couldn't deliver more than a few frames per second, at best, while audio was inaudible. After the cards were installed, the available CPU resources could be dedicated to Flash, Java and Quicktime rendering. For our environment, we have more than enough to get the job done despite the fact that the graphs show high utilization.
The following graph shows CPU over the timeline from initial testing through the installation of the Apex card.
Prior to the Apex installation, we had implemented a number of tweaks in order to let the client continue functioning in a useable (but reduced) capacity. We had played with changing the VM CPU count, modifying registry settings for frame-rate and audio bandwidth, changing screen resolution on the zero clients to reduce the PCOIP rendering load, etc. Ultimately, the Apex allowed us to run everything wide open, without limitations, despite the continued high-CPU utilization. Note the final segment where the Apex was installed, resolution was returned to normal, and no frame-rate limiting was being employed. The CPU usage during idle time is actually lower.
The Apex 2800 is capable of rendering PCoIP for 65 screens per host. It's not the number of clients, but the number of screens - so an environment with dual monitors would benefit for approximately half that. Of course you can always put more than one Apex card in a host, depending on your needs - but what's nice is that it doesn't force your host to a hard limit. If you had more than 65 screens, you would obviously have users without the benefit of offload, which might be fine in mixed environments.
Moving forward, I expect that our Solutions Engineers will give consideration to the Apex 2800 for most every environment they spec out given the possibility that people will be using graphic intensive applications. Given the overall cost of a deployment, the Apex 2800 is unlikely to be a deal breaker. Because we have clients that use CAD and other architectural and rendering applications, we are excited to play with Apex and NVIDIA side-by-side in the future.