Dear EMC Avamar, please stop leeching from enterprise vSphere environments

VMware introduced vStorage APIs for Data Protection (VADP) so that backup products can do centralized, efficient, off-host LAN free backup of vSphere virtual machines.

In the physical world, most systems have plenty of resources, often underutilized. Running backup agent in such a system wasn’t a primary concern for most workloads. The era of virtualization changed things drastically. Server consolidation via virtualization allowed organizations to get the most out of their hardware investment. That means backup agents do not have the luxury to simply take up resources from production workloads anymore as the underlying ESXi infrastructure is optimized and right-sized to get line of business applications running smoothly.

VMware solved the backup agent problem from the early days of ESX/ESXi hosts. The SAN transport method for virtual machine backup was born during the old VCB (VMware Consolidated Backup) days and further enhanced in VADP (vStorage APIs for Data Protection). The idea is simple. Let the snapshots of virtual machine be presented to a workhorse backup host and allow that system do the heavy lifting of processing and moving data to backup storage. The CPU, memory and I/O resources on ESX/ESXi hosts are not used during backups. Thus the production virtual machines are not starved for hypervisor resources during backups.

For non-SAN environments like NFS based datastores, the same dedicated host can use Network Block Device (NBD) transport to stream data through management network. Although it is not as efficient as SAN transport, it still offloaded most of the backup processing to the dedicated physical host.

Dedicating one or more workhorse backup systems to do backups was not practical for small business environments and remote offices. To accommodate that business need, VMware allowed virtual machines to act as backup proxy hosts for smaller deployments. This is how hotadd transport was introduced.

Thus your backup strategy is to use a dedicated physical workhorse backup system to offload all or part of backup processing using SAN or NBD transports. For really small environments, a virtual machine with NBD or hotadd transport would suffice.

Somehow EMC missed this memo. Ironically, EMC had been the proponent of running Avamar agent inside the guest instead of adopting VMware’s VADP. The argument was that the source side deduplication at Avamar agent minimizes the amount of data to be moved across the wire. While that is indeed true, EMC conveniently forgot to mention that CPU intensive deduplication within the backup agent would indeed leech ESXi resources away from production workloads!

Then EMC conceded and announced VADP support. But the saga continues. What EMC had provided is hotadd support for VADP. That means you allocate multiple proxy virtual machines even in the case of enterprise vSphere environments. Some of the best practice documents for Avamar suggest deploying a backup proxy host for every 20 virtual machines. Typical vSphere environment in an enterprise would have 1000 to 3000 virtual machines. That translates to 50 to 150 proxy hosts! These systems are literally the leach worms in vSphere environment draining resources that belong to production applications.

The giant tower of energy consuming nodes in Avamar grid is not even lifting a finger in processing backups! It is merely a storage system. The real workhorses are ESXi hosts giving in CPU, memory and I/O resources to Avamar proxy hosts to generate and deduplicate backup stream.

The story does not change even if you replace Avamar Datastore with a Data Domain device. In that case, the DD Boost agent running on Avamar proxy hosts are draining resources from ESXi to reduce data at source and send deduplicated data to Data Domain system.

EMC BRS should seriously look at the way Avamar proxy hosts with or without DD Boost are leaching resources from precious production workloads. The method used by Avamar is recommended only for SMB and remote office environments. Take the hint from VMware engineering as to why Avamar technology was borrowed to provide a solution for SMB customers in VMware Data Protection (VDP) product. You can’t chop a tree with a penknife!

The best example for effectively using VADP for enterprise vSphere is NetBackup 5220. EMC BRS could learn a lesson or two from how Symantec integrates with VMware in a much better way. This appliance is a complete backup system with intelligent deduplication and VADP support built right in for VMware backups.  This appliance does the heavy lifting so that production workloads are unaffected by backups.

How about recovery? For thick provisioned disks SAN transport is indeed the fastest. For thin provisioned disks, NBD performs much better. The good news on Symantec NetBackup 5220 is that the user could control the transport method for restores as well. You might have done the backup using SAN transport, however you can do the restore using NBD if you are restoring thin provisioned virtual machines. For Avamar, hot-add is the end-all for all approaches. NBD on a virtual proxy isn’t useful, hence using that is a moot point when the product offers just virtual machine proxy for VADP.

The question is…

Dear EMC Avamar, when will you offer an enterprise grade VADP based backup for your customers? They deserve enterprise grade protection for the investment they had done for large Avamar  Datastores and Data Domain devices.



VMware announces vSphere Data Protection (VDP), what is in it for you?

vSphere Data Protection (VDP) is VMware’s new virtual backup appliance for SMB available in VMware vSphere 5.1. It replaces the older VMware Data Recovery (vDR) product. There had been a number of confusions around this announcement; partly due the way EMC, VMware’s parent company, made some press releases.

Is VDP the same as EMC’s Avamar Virtual Edition (Avamar VE)?

No, it is not. VDP is a product from VMware. The only technology VMware had used from Avamar is its deduplication engine. The older vDR had limited dedupe capabilities as it was mainly coming from change block tracking (CBT) in vStorage APIs for Data Protection (VADP). With Avamar’s technology, VDP now provides variable block based deduplication.

I heard that I can upgrade from VDP to EMC Avamar if I need to grow beyond 2TB, is that true?

No, VDP is not a ‘lite’ version of Avamar. It is a different product altogether.

What are my options if I need to grow beyond 2TB?

You could add additional VDP appliances. Up to 10 VDP appliances are supported under one vCenter server. However, these are separate islands of storage. These appliances do not provide global deduplication among these storage pools.

Having said that it is more likely for you to hit other limitations in VDP before hitting the 2TB limit. Note that Avamar based deduplication engine is suitable only for SMBs who could afford to have black out windows and maintenance windows in their backup solution. These are the periods of time where the house keeping work is being done by dedupe engine.  The system is not available for running backup jobs.

Only 8 virtual machines can be backed up concurrently that might increase backup windows. There is no SAN transport capability to offload production ESXi hosts from backup tasks. There is no good way to make additional copies for redundancy or extended retention like replication to remote location or cloud. VMware has made it clear that VDP is truly for SMBs and encourages customers to look at enterprise class backup solutions from partners for larger environments.

Why would EMC let VMware use its Avamar technology at no additional cost to customers? Is EMC trying to promote its products?

Just like how Windows/UNIX/Linux operating environments provide basic utilities for backups, VMware had always provided basic backup solution with its offerings. In the days of ESX service console, the Linux based console provided tools like tar and cpio. With ESXi where service console is no more, vDR was brought to the table. vDR had its limitations. Now the choice is to innovate vDR or license a relatively mature technology. As parent company has a solution, VMware went the route of taking Avamar dedupe engine for storage and build its own capabilities for scheduling backups and managing recovery points.

EMC’s Avamar is a popular product in small environments. Although EMC had been trying hard to make Avamar enterprise ready, its deduplication engine has significant limitations. It requires blackout and maintenance windows. With larger capacities, the duration of these windows also increases. With the acquisition of Data Domain, EMC is now focusing more on using its DD Boost technology for distributing the deduplication workload. In fact, EMC recommends the use of Data Domain Boost with Avamar (instead of using Avamar’s dedupe engine) for larger workloads. I believe it was a good decision to support VMware’s SMB market with a technology that was meant for SMB in the first place. I think Avamar dedupe engine is counting its days as a technology that can make money. See my earlier blog on EMC’s backup portfolio.

Stay tuned. More on VDP coming soon!