When you backup a VMware vSphere virtual machine using vStorage APIs for Data Protection (VADP), one of the common ways to transmit data from VMware data store to backup server is through Network Block Device (NBD) transport. NBD is a Linux-like module that attaches to VMkernel and makes the snapshot of the virtual machine visible to backup server as if the snapshot is a block device on network. While NBD is quite popular and easy to implement, it is also the least understood transport mechanisms in VADP based backups.
NBD is based on VMware’s Network File Copy (NFC) protocol. NFC uses VMkernel port for network traffic. As you already know, VMkernel ports may also be used by other services like host management, vMotion, Fault Tolerance logging, vSphere Replication, NFS, iSCSI an so on. It is recommended to create specific VMkernel ports that attach to dedicated network adapters if you are using a bandwidth intensive service. For example, it is highly recommended to dedicate an adapter for Fault Tolerance logging.
Naturally, the first logical solution to drive high throughput from NBD backups would be to dedicate a bigger pipe for VADP NBD transport. Many vendors put this as the best practice but that alone won’t give you performance and scale.
Let me explain this using an example. Let us assume that you have a backup server streaming six virtual machines from an ESXi host using NBD transport sessions. The host and backup server are equipped with 10Gb adapters. In general a single 10Gb pipe can deliver around 600 MB/sec. So you would expect that each virtual machine would be backed up at around 100 MB/sec (600 MB/sec divided into 6 streams for each virtual machine), right? However, in reality each stream would have access to much lower share of bandwidth because VMkernel automatically caps each session for stability. Let me show you the actual results from a benchmark that we conducted where we measured performance as we increased the number of streams.
As you can see, by the time the number of streams has reached 4 (in other words, four virtual machines were simultaneously getting backed up), each stream is able to deliver just 55 MB/sec and the overall throughput is 220 MB/sec. This is nowhere near the available bandwidth of 600 MB/sec.
The reasoning behind this type of bandwidth throttling is straightforward. You don’t want VMkernel to be strained by serving this type of copy operations while it has better things to do. VMkernel’s primary function is to orchestrate VM processes. VMware engineering (VMware was also a partner in this benchmark, we submitted the full story as a paper for VMworld 2012) confirmed this behavior as normal.
This naturally puts NBD as a second-class citizen in backup transport world, doesn’t it? The good news is that there is a way to solve this problem! Instead of backing up too many virtual machines from the same host, just make your backup policy/job configuration to distribute the load over multiple hosts. Unfortunately, in environments with 100s of hosts and 1000s of virtual machines, it may be difficult to do it manually. Veritas NetBackup provides VMware Resource Limits as part of its Intelligent Policies for VMware backup where you can limit the number of jobs at VMware vSphere object levels, which is quite handy in this type of situations. For example, I ask customers to limit number of jobs per ESXi host to 4 or less using such intelligent policies and resource limit setting. Thus NetBackup can scale-out its throughput by tapping NBD connections from multiple hosts to keep its available pipe fully utilized while limiting the impact of NBD backups on production ESXi hosts.
Thus Veritas NetBackup moves NBD to first class status in protecting large environments even when the backend storage isn’t on Fiber Channel SAN. For example, NetBackup’s NBD has proven its scale in NetApp FlexPod, VCE VBLOCK, Nutanix and VMware EVO (VSAN). Customers could enjoy the simplicity of NBD and scale-out performance of NetBackup in these converged platforms.
Taking VMware vSphere Storage APIs for Data Protection to the Limit: Pushing the Backup Performance Envelope; Rasheed, Winter et al. VMworld 2012
Full presentation on Pushing the Backup Performance Envelope