A new Reference Architecture for NVIDIA vComputeServer on Dell EMC infrastructure provides a solution to enable server GPU virtualization.
A recent study that analyzed GPU utilization metrics across different customer sites running AI workloads revealed that GPU resources were underutilized in most cases. Here we present the study’s two key findings, along with recommendations for solving them.
- Nearly a third of the users are averaging less than 15% utilization. Average GPU memory usage is quite similar. Given that the users are experienced deep learning practitioners, this is very surprising. GPUs are getting faster and faster, but it doesn’t matter if the applications don’t completely use them.
Recommendation: Improve utilization by sharing the GPU across multiple users by using virtualization. Those who use optimal batch size, learning rates and hyper-parameters to fully utilize the GPU memory and compute core capabilities can be allocated a dedicated virtualized GPU instance or multiple GPUs inside a single virtual machine (VM).
- There’s another, probably larger, waste of resources — GPUs that sit unused. It’s hard to queue up work efficiently for GPUs. In a typical workflow, a data scientist will set up many experiments, wait for them to finish, and then spend quite a lot of time digesting the results while the GPUs sit idle.
Recommendation: GPU pooling and disaggregation can solve this problem by providing the ability to dynamically re-assign and spin up resources, allowing idle resources to be used by other data scientist applications. Using VMware® vSphere® vMotion™ to dynamically transfer GPU-accelerated VMs and workloads will reduce GPU resources.
New NVIDIA A100 offers GPU partitioning
NVIDIA® recently announced hardware partitioning with the NVIDIA A100 Tensor Core GPU as a complementary solution to virtualization. The A100 in multi-instance GPU (MIG) mode can run any mix of up to seven AI or HPC workloads of different sizes simultaneously. GPU partitioning is especially useful for AI inferencing jobs as well as early-stage AI development work that typically do not consume all the performance that a modern GPU delivers. With GPU virtualization software, a virtual machine (VM) can be run on each of these MIG instances so organizations can take advantage of management, monitoring, and operational benefits of hypervisor-based server virtualization.
For many years, data centers have used server CPU virtualization to increase IT agility and improve the utilization of their compute hardware. Today, this focus on virtualization is expanding to encompass the GPUs that accelerate many compute-intensive workloads, such as AI training and inferencing as well as data analytics. With virtualization, data centers can make GPUs available to more users, while increasing the overall utilization of these valuable assets.
Virtualizing GPUs inside Dell EMC servers
At Dell Technologies, we’ve worked closely with our technology partners to make GPU virtualization available in our line of GPU-accelerated Dell EMC PowerEdge servers. We took a big step in this direction in August 2019 when we rolled out support for NVIDIA vComputeServer software to enable hypervisor-based virtualization on GPU-accelerated servers equipped with NVIDIA Mellanox® ConnectX-5 or newer network interface cards (NICs). NVIDIA vComputeServer allows data centers to accelerate server virtualization with the latest GPUs so that the most compute-intensive workloads can run in virtual machines.
Today, we’re taking another big step forward with a new Dell EMC reference architecture for NVIDIA vComputeServer. With this solution, your IT administrators can allocate partitions of GPU resources within VMware vSphere, as well as support the live migration of virtual machines running NVIDIA CUDA™ workloads.
There are many valuable benefits in the move to GPU virtualization with vComputeServer with Dell EMC PowerEdge servers. For example, virtualization helps your IT administrators:
- Democratize GPU access by providing partitions of GPUs on demand
- Scale GPU resource assignments up and down, as needed and
- Support live migration of GPU memory
If your IT organization is considering GPU virtualization in your data center, the Dell EMC reference architecture for NVIDIA vComputeServer is a great place to get started. It walks you through the use cases for vComputeServer and your options for NVIDIA GPUs in Dell EMC PowerEdge servers.
Putting vComputeServer to the Test
Dell Technologies engineers investigated how GPU virtualization with vComputeServer impacts overall performance. These tests initially compared an NVIDIA GPU running on bare-metal Linux to a virtualized GPU. After establishing that baseline of performance, the team conducted additional testing with multiple virtual GPUs and virtual GPU partitions.
Test results show that in most cases, users can expect a small difference in performance, in the range of two to five percent, compared to bare metal when using virtual GPU profiles for machine learning and deep learning workloads. And in an interesting twist, there are scenarios where the performance difference is favorable. For example, when VMs running a mix of workloads, you might see faster time to result using multiple fractional GPUs in parallel than you would using a full GPU and scheduling the tasks to run serially. This can occur when workloads across virtual machines aren’t executed at the same time, or aren’t always GPU-bound. Choosing the appropriate GPU scheduling policy can impact performance, and the team compared performance of different scheduling policies.
For full details on the performance tests conducted in the Dell EMC Server CTO lab, along with detailed configuration information, see Virtualizing GPUs in VMware vSphere using NVIDIA vComputeServer on Dell EMC infrastructure. Visit here to learn more about Dell EMC PowerEdge server accelerators.