Skip to main content

GPU Passthrough on KVM: Running AI/ML Workloads After Migration

· 4 min read
HyperSDK Team
HyperSDK Team
Core Team

One of the most common concerns we hear from organizations migrating from VMware to KVM is GPU support. VMware's vSphere has mature GPU passthrough and vGPU capabilities, and teams running AI/ML training, inference, VDI, or scientific computing workloads need assurance that these capabilities transfer to KVM. The answer is straightforward: KVM's GPU passthrough via VFIO delivers 98%+ of bare-metal GPU performance, and HyperSDK automates the configuration that traditionally requires manual kernel and libvirt setup.

How GPU Passthrough Works on KVM

GPU passthrough on KVM uses the VFIO (Virtual Function I/O) framework to assign a physical PCI device directly to a virtual machine. The guest VM gets exclusive access to the GPU hardware, bypassing the hypervisor for all GPU operations. This is fundamentally the same approach used by VMware's DirectPath I/O, but with the advantage of being built into the Linux kernel rather than requiring a proprietary hypervisor.

The process involves four steps. First, IOMMU (Intel VT-d or AMD-Vi) must be enabled in the host BIOS and kernel parameters. IOMMU provides the memory isolation that allows a PCI device to be safely assigned to a VM. Second, the GPU must be unbound from the host graphics driver (nouveau or nvidia) and bound to the VFIO-PCI driver. This tells the kernel that the GPU is reserved for VM passthrough. Third, the GPU is assigned to a VM via libvirt XML configuration, specifying the PCI bus, slot, and function addresses. Fourth, NVIDIA drivers are installed inside the guest VM, which then sees the GPU as if it were physically installed.

Performance: Near-Native Results

We have benchmarked GPU passthrough on KVM against bare-metal across multiple workloads and GPU models. The results are consistently within 2% of bare-metal performance for compute workloads.

On an NVIDIA A100 80GB running PyTorch ResNet-50 training, KVM with VFIO passthrough delivered 98.3% of bare-metal throughput. CUDA memory bandwidth tests showed 99.1% of native performance. For inference workloads using TensorRT on an NVIDIA T4, latency was within 1% of bare-metal at all batch sizes.

The negligible overhead comes from the IOMMU address translation layer, which adds a small fixed cost to DMA operations. For GPU-bound workloads where the vast majority of time is spent on GPU compute, this overhead is effectively invisible.

vGPU for Multi-Tenant Scenarios

For environments that need multiple VMs to share a single physical GPU -- common in VDI and inference serving -- NVIDIA vGPU provides time-sliced or MIG (Multi-Instance GPU) partitioning. Each VM receives a guaranteed allocation of GPU memory and compute resources.

HyperSDK supports vGPU configuration for NVIDIA GPUs that support it (A100, A30, H100 with MIG; Tesla T4, L40S with time-slicing). The hyper2kvm conversion engine can pre-configure VMs for vGPU profiles during migration, so GPU-accelerated workloads are ready to run immediately after deployment on KVM.

Migrating GPU Workloads from VMware

When migrating GPU-accelerated VMs from vSphere to KVM, the GPU assignment changes from VMware's DirectPath I/O to KVM's VFIO passthrough. The guest-side NVIDIA drivers remain the same -- a Windows or Linux VM running CUDA workloads needs only the standard NVIDIA driver package, regardless of whether the underlying hypervisor is ESXi or KVM.

HyperSDK handles the infrastructure side automatically. During export from vSphere, the VM's GPU configuration is captured in the migration manifest. During deployment on KVM, HyperSDK generates the correct libvirt XML with VFIO hostdev entries, verifies IOMMU group isolation, and configures the necessary kernel module parameters. For organizations running AI/ML workloads on VMware, the migration to KVM preserves full GPU performance while eliminating VMware licensing costs.

If your organization runs GPU-accelerated workloads on VMware and is evaluating a KVM migration, talk to our team about GPU passthrough configuration and performance validation.