Getting to know the Network Block Device Transport in VMware vStroage APIs for Data Protection

When you backup a VMware vSphere virtual machine using vStorage APIs for Data Protection (VADP), one of the common ways to transmit data from VMware data store to backup server is through Network Block Device (NBD) transport. NBD is a Linux-like module that attaches to VMkernel and makes the snapshot of the virtual machine visible to backup server as if the snapshot is a block device on network. While NBD is quite popular and easy to implement, it is also the least understood transport mechanisms in VADP based backups.

NBD is based on VMware’s Network File Copy (NFC) protocol. NFC uses VMkernel port for network traffic. As you already know, VMkernel ports may also be used by other services like host management, vMotion, Fault Tolerance logging, vSphere Replication, NFS, iSCSI an so on. It is recommended to create specific VMkernel ports that attach to dedicated network adapters if you are using a bandwidth intensive service. For example, it is highly recommended to dedicate an adapter for Fault Tolerance logging.

Naturally, the first logical solution to drive high throughput from NBD backups would be to dedicate a bigger pipe for VADP NBD transport. Many vendors put this as the best practice but that alone won’t give you performance and scale.

Let me explain this using an example. Let us assume that you have a backup server streaming six virtual machines from an ESXi host using NBD transport sessions. The host and backup server are equipped with 10Gb adapters. In general a single 10Gb pipe can deliver around 600 MB/sec. So you would expect that each virtual machine would be backed up at around 100 MB/sec (600 MB/sec divided into 6 streams for each virtual machine), right? However, in reality each stream would have access to much lower share of bandwidth because VMkernel automatically caps each session for stability. Let me show you the actual results from a benchmark that we conducted where we measured performance as we increased the number of streams.

NBD Transport and number of backup streams
NBD Transport and number of backup streams

As you can see, by the time the number of streams has reached 4 (in other words, four virtual machines were simultaneously getting backed up), each stream is able to deliver just 55 MB/sec and the overall throughput is 220 MB/sec. This is nowhere near the available bandwidth of 600 MB/sec.

The reasoning behind this type of bandwidth throttling is straightforward. You don’t want VMkernel to be strained by serving this type of copy operations while it has better things to do. VMkernel’s primary function is to orchestrate VM processes. VMware engineering (VMware was also a partner in this benchmark, we submitted the full story as a paper for VMworld 2012) confirmed this behavior as normal.

This naturally puts NBD as a second-class citizen in backup transport world, doesn’t it? The good news is that there is a way to solve this problem! Instead of backing up too many virtual machines from the same host, just make your backup policy/job configuration to distribute the load over multiple hosts. Unfortunately, in environments with 100s of hosts and 1000s of virtual machines, it may be difficult to do it manually. Veritas NetBackup provides VMware Resource Limits as part of its Intelligent Policies for VMware backup where you can limit the number of jobs at VMware vSphere object levels, which is quite handy in this type of situations. For example, I ask customers to limit number of jobs per ESXi host to 4 or less using such intelligent policies and resource limit setting. Thus NetBackup can scale-out its throughput by tapping NBD connections from multiple hosts to keep its available pipe fully utilized while limiting the impact of NBD backups on production ESXi hosts.

Thus Veritas NetBackup moves NBD to first class status in protecting large environments even when the backend storage isn’t on Fiber Channel SAN. For example, NetBackup’s NBD has proven its scale in NetApp FlexPod, VCE VBLOCK, Nutanix and VMware EVO (VSAN). Customers could enjoy the simplicity of NBD and scale-out performance of NetBackup in these converged platforms.

References:

Taking VMware vSphere Storage APIs for Data Protection to the Limit: Pushing the Backup Performance Envelope; Rasheed, Winter et al. VMworld 2012

Full presentation on Pushing the Backup Performance Envelope

Dear EMC Avamar, please stop leeching from enterprise vSphere environments

VMware introduced vStorage APIs for Data Protection (VADP) so that backup products can do centralized, efficient, off-host LAN free backup of vSphere virtual machines.

In the physical world, most systems have plenty of resources, often underutilized. Running backup agent in such a system wasn’t a primary concern for most workloads. The era of virtualization changed things drastically. Server consolidation via virtualization allowed organizations to get the most out of their hardware investment. That means backup agents do not have the luxury to simply take up resources from production workloads anymore as the underlying ESXi infrastructure is optimized and right-sized to get line of business applications running smoothly.

VMware solved the backup agent problem from the early days of ESX/ESXi hosts. The SAN transport method for virtual machine backup was born during the old VCB (VMware Consolidated Backup) days and further enhanced in VADP (vStorage APIs for Data Protection). The idea is simple. Let the snapshots of virtual machine be presented to a workhorse backup host and allow that system do the heavy lifting of processing and moving data to backup storage. The CPU, memory and I/O resources on ESX/ESXi hosts are not used during backups. Thus the production virtual machines are not starved for hypervisor resources during backups.

For non-SAN environments like NFS based datastores, the same dedicated host can use Network Block Device (NBD) transport to stream data through management network. Although it is not as efficient as SAN transport, it still offloaded most of the backup processing to the dedicated physical host.

Dedicating one or more workhorse backup systems to do backups was not practical for small business environments and remote offices. To accommodate that business need, VMware allowed virtual machines to act as backup proxy hosts for smaller deployments. This is how hotadd transport was introduced.

Thus your backup strategy is to use a dedicated physical workhorse backup system to offload all or part of backup processing using SAN or NBD transports. For really small environments, a virtual machine with NBD or hotadd transport would suffice.

Somehow EMC missed this memo. Ironically, EMC had been the proponent of running Avamar agent inside the guest instead of adopting VMware’s VADP. The argument was that the source side deduplication at Avamar agent minimizes the amount of data to be moved across the wire. While that is indeed true, EMC conveniently forgot to mention that CPU intensive deduplication within the backup agent would indeed leech ESXi resources away from production workloads!

Then EMC conceded and announced VADP support. But the saga continues. What EMC had provided is hotadd support for VADP. That means you allocate multiple proxy virtual machines even in the case of enterprise vSphere environments. Some of the best practice documents for Avamar suggest deploying a backup proxy host for every 20 virtual machines. Typical vSphere environment in an enterprise would have 1000 to 3000 virtual machines. That translates to 50 to 150 proxy hosts! These systems are literally the leach worms in vSphere environment draining resources that belong to production applications.

The giant tower of energy consuming nodes in Avamar grid is not even lifting a finger in processing backups! It is merely a storage system. The real workhorses are ESXi hosts giving in CPU, memory and I/O resources to Avamar proxy hosts to generate and deduplicate backup stream.

The story does not change even if you replace Avamar Datastore with a Data Domain device. In that case, the DD Boost agent running on Avamar proxy hosts are draining resources from ESXi to reduce data at source and send deduplicated data to Data Domain system.

EMC BRS should seriously look at the way Avamar proxy hosts with or without DD Boost are leaching resources from precious production workloads. The method used by Avamar is recommended only for SMB and remote office environments. Take the hint from VMware engineering as to why Avamar technology was borrowed to provide a solution for SMB customers in VMware Data Protection (VDP) product. You can’t chop a tree with a penknife!

The best example for effectively using VADP for enterprise vSphere is NetBackup 5220. EMC BRS could learn a lesson or two from how Symantec integrates with VMware in a much better way. This appliance is a complete backup system with intelligent deduplication and VADP support built right in for VMware backups.  This appliance does the heavy lifting so that production workloads are unaffected by backups.

How about recovery? For thick provisioned disks SAN transport is indeed the fastest. For thin provisioned disks, NBD performs much better. The good news on Symantec NetBackup 5220 is that the user could control the transport method for restores as well. You might have done the backup using SAN transport, however you can do the restore using NBD if you are restoring thin provisioned virtual machines. For Avamar, hot-add is the end-all for all approaches. NBD on a virtual proxy isn’t useful, hence using that is a moot point when the product offers just virtual machine proxy for VADP.

The question is…

Dear EMC Avamar, when will you offer an enterprise grade VADP based backup for your customers? They deserve enterprise grade protection for the investment they had done for large Avamar  Datastores and Data Domain devices.