Did Rubrik make Veeam’s Modern Data Protection a bit antiquated?

Veeam Antiquated?
Veeam Antiquated?

Modern Data Protection™ got a trademark from Veeam. No, I am not joking. It is true! Veeam started with a focused strategy. It will do nothing but VMware VM backups. Thankfully VMware had done most of the heavy lifting with vStorage APIs for Data Protection (VADP) so developing a VM-only backup solution was as simple as creating a software plugin for those APIs and developing a storage platform for keeping the VM copies. With a good marketing engine Veeam won the hearts of virtual machine administrators and it paid off.

As the opportunity to reap the benefits as a niche VM-only backup started to erode (intense competition, low barrier to entry on account of VADP), Veeam is attempting to re-invent its image by exploring broader use cases like physical systems protection, availability etc. Some of these efforts make it look like its investors are hoping for Microsoft to buy Veeam. The earlier wish to sell itself to VMware shattered when VMware adopted EMC Avamar’s storage to build its data protection solution.

Now Rubrik is coming to market and attacking the very heart of Veeam’s little playground while making Veeam’s modern data protection a thing of past. Rubrik’s market entry is also through VMware backups using vStorage APIs but with a better storage backend that can scale out.

Both Veeam and Rubrik have two high level tiers. The frontend tier connects to vSphere through VMware APIs. It discovers and streams virtual machine data. Then there is a backend storage tier where virtual machine data is stored.

For Veeam the front-end is a standalone backup server and its possible backup proxies. The proxies (thanks to VMware hot-add) enable limited level of scale-out for the frontend, but this approach leeches resources from production and increases complexity. The backend is one or more backup repositories. There is nothing special about the repository; it is a plain file system. Although Veeam claims to have deduplication built-in, it is perhaps the most primitive in the industry and works only across virtual machines from the same backup job.

Rubrik is a scale-out solution where the frontend and backend are fused together from users’ perspective. You buy Rubrik bricks where each brick consists of four nodes. These are the compute and storage components that cater to both frontend in streaming virtual machines from vSphere via NBD or SAN transport (kudos to Rubrik for ditching hot-add!) and backend, which is a cluster file system that spans nodes and bricks. Rubrik claims to have global deduplication across all its cluster file system namespace.

Historically, the real innovation from Veeam was the commercial success of powering on virtual machines directly from the backup storage. Veeam may list several other innovations (e.g. they may claim that they ‘invented’ agentless backups, but it was actually done by VMware in its APIs) in their belt but exporting VMs directly from backup is something every other vendor followed afterwards and hence kudos go to Veeam on that one. But this innovation may backfire and may help Veeam customers to transition to Rubrik seamlessly.

Veeam customers are easy targets for Rubrik for a few reasons.

  • One of the cornerstones of Veeam’s foundation is its dependency on vStorage APIs from VMware; it is not a differentiator because all VMware partners have access to those APIs. Unlike other backup vendors, Veeam didn’t focus on building application awareness and granular quiescence until late in the game
  • Veeam is popular in smaller IT shops and shadow projects within large IT environments. It is a handy backup tool, but it is not perceived as a critical piece in meeting regulatory specs and compliance needs. It had been marketed towards virtual machine administrators; hence higher-level buying centers do no have much visibility. That adversely affects Veeam’s ‘stickiness’ in an account.
  • Switching from one backup application to another had been a major undertaking historically. But that is not the case if customers want to switch from Veeam to something else. Earlier days, IT shops needed to standup both solutions until all the backup images from the old solution would hit the expiration dates. Or you have to develop strategies to migrate old backups into the new system, a costly affair. When the source is Veeam with 14 recovery points per VM by default, you could build workflows that spin up each VM backup in a sandbox and let the new solution back it up as if it is a production copy. (Rubrik may want to work on building a small migration tool for this)
  • Unlike Veeam that started stitching support for other hypervisors and physical systems afterwards, Rubrik has architected its platform to accommodate future needs. That design may intrigue customers when VMware customers are looking to diversify into other hypervisors and containers.

The fine print is that Rubrik is yet to be proven. If the actual product delivers on the promises, it may have antiquated Veeam. The latter may be become a good case study for business schools on not building a product that is dependent too much on someone else’s technology.

Thanks to #VFD5 TechFieldDay for sharing Rubrik’s story. You can watch it here: Rubrik Technology Deep Dive

Disclaimer: I work for Veritas/Symantec, opinions here are my own.

Getting to know the Network Block Device Transport in VMware vStroage APIs for Data Protection

When you backup a VMware vSphere virtual machine using vStorage APIs for Data Protection (VADP), one of the common ways to transmit data from VMware data store to backup server is through Network Block Device (NBD) transport. NBD is a Linux-like module that attaches to VMkernel and makes the snapshot of the virtual machine visible to backup server as if the snapshot is a block device on network. While NBD is quite popular and easy to implement, it is also the least understood transport mechanisms in VADP based backups.

NBD is based on VMware’s Network File Copy (NFC) protocol. NFC uses VMkernel port for network traffic. As you already know, VMkernel ports may also be used by other services like host management, vMotion, Fault Tolerance logging, vSphere Replication, NFS, iSCSI an so on. It is recommended to create specific VMkernel ports that attach to dedicated network adapters if you are using a bandwidth intensive service. For example, it is highly recommended to dedicate an adapter for Fault Tolerance logging.

Naturally, the first logical solution to drive high throughput from NBD backups would be to dedicate a bigger pipe for VADP NBD transport. Many vendors put this as the best practice but that alone won’t give you performance and scale.

Let me explain this using an example. Let us assume that you have a backup server streaming six virtual machines from an ESXi host using NBD transport sessions. The host and backup server are equipped with 10Gb adapters. In general a single 10Gb pipe can deliver around 600 MB/sec. So you would expect that each virtual machine would be backed up at around 100 MB/sec (600 MB/sec divided into 6 streams for each virtual machine), right? However, in reality each stream would have access to much lower share of bandwidth because VMkernel automatically caps each session for stability. Let me show you the actual results from a benchmark that we conducted where we measured performance as we increased the number of streams.

NBD Transport and number of backup streams
NBD Transport and number of backup streams

As you can see, by the time the number of streams has reached 4 (in other words, four virtual machines were simultaneously getting backed up), each stream is able to deliver just 55 MB/sec and the overall throughput is 220 MB/sec. This is nowhere near the available bandwidth of 600 MB/sec.

The reasoning behind this type of bandwidth throttling is straightforward. You don’t want VMkernel to be strained by serving this type of copy operations while it has better things to do. VMkernel’s primary function is to orchestrate VM processes. VMware engineering (VMware was also a partner in this benchmark, we submitted the full story as a paper for VMworld 2012) confirmed this behavior as normal.

This naturally puts NBD as a second-class citizen in backup transport world, doesn’t it? The good news is that there is a way to solve this problem! Instead of backing up too many virtual machines from the same host, just make your backup policy/job configuration to distribute the load over multiple hosts. Unfortunately, in environments with 100s of hosts and 1000s of virtual machines, it may be difficult to do it manually. Veritas NetBackup provides VMware Resource Limits as part of its Intelligent Policies for VMware backup where you can limit the number of jobs at VMware vSphere object levels, which is quite handy in this type of situations. For example, I ask customers to limit number of jobs per ESXi host to 4 or less using such intelligent policies and resource limit setting. Thus NetBackup can scale-out its throughput by tapping NBD connections from multiple hosts to keep its available pipe fully utilized while limiting the impact of NBD backups on production ESXi hosts.

Thus Veritas NetBackup moves NBD to first class status in protecting large environments even when the backend storage isn’t on Fiber Channel SAN. For example, NetBackup’s NBD has proven its scale in NetApp FlexPod, VCE VBLOCK, Nutanix and VMware EVO (VSAN). Customers could enjoy the simplicity of NBD and scale-out performance of NetBackup in these converged platforms.


Taking VMware vSphere Storage APIs for Data Protection to the Limit: Pushing the Backup Performance Envelope; Rasheed, Winter et al. VMworld 2012

Full presentation on Pushing the Backup Performance Envelope

NetBackup Accelerator vs. Simpana DASH Full

I want to start this blog with a note.

I mean no disrespect to CommVault as a company or its engineers innovating its products. Being an engineer myself by trade, I do understand that innovations are triggered by market demands and there is always room for improvements in any product. This blog is entirely my own opinions.

As most of you guys reading this blog know, I also write for official Symantec blogs. I recently got an opportunity to take readers of Symantec Connect on a deep dive into one of the major features in NetBackup 7.6 for VMware vSphere and vCloud environments. It is primarily targeted for users of NetBackup who knows its nuts and bolts. A couple of employees from a CommVault read the blog. It is natural in competitive intelligence world to look for weak spots or things that can be selectively pointed out to show parity. It is part of their job and I respect it. However it appeared that they wanted to claim parity for Simpana with NetBackup Accelerator for VMware based on two statements (tweets, to be precise!). While asking to elaborate, the discussion went on a rat hole with statements made out of context and downright unprofessional. Hence here I go with an attempt to compare Simpana 10 with NetBackup 7.6 on the very topic discussed in official blog.

Claims to equate parity with NetBackup Accelerator for VMware

  1. (Not explicitly stated) Simpana supports CBT
  2. Simpana had ‘block detection’ for over a year
  3. Simpana does synthetics

The attempt here is to check all the boxes to claim parity while at times people do miss the big picture! At times they were equating apples to oranges. Hence I am going to attempt to clarify this as much as possible using Simpana language for the benefit those two employees.

Simpana supports CBT: Of course, every major vendor supports it. It is an innovation from VMware. The willingness to support a feature from vStorage APIs is important to protect VMware virtual machines.

What sets NetBackup 7.6 apart from Simpana 10 in this case is that Simpana’s implementation of CBT is limited to recovering an entire VM or individual files from the VM. If you have enterprise applications (e.g. Microsoft Exchange, Microsoft SQL Server etc.), you must stream data through an agent inside the guest to protect those applications and perform granular recovery. The value of CBT is to minimize data processing and movement load on production VMs while performing backups. A virtual machine’s operating system binaries and related files are typically static and CBT won’t add much value there. The real value comes from daily changes to disk blocks by applications! That means ZERO value in Simpana to protect enterprise applications with its implementation of vSphere CBT.

Simpana had block detection for over a year,  Simpana does synthetics: The employee is trying to add a check box for Simpana next to NetBackup’s capability to make use of Symantec V-Ray to detect deleted blocks. Nice try!

First and foremost, the block optimization technique described in my blog is present in NetBackup since 2007, with version 6.5.1 when Symantec announced support for VMware Virtual Infrastructure 3. Congratulations on trying to claim that Simpana had this capability after half a decade! But wait…. We are talking about apple and orange here.

This technique had been available for both full and incremental backup schedules. It works no matter where backups are going to, disk, deduplicated disk, tape or cloud. NetBackup’s block optimization happens closer to the data source. Thus it detects deleted blocks at the backup host so that the deleted blocks never appear in SAN or LAN traffic to the backup storage. That is optimization for processing-power, interconnect-bandwidth and storage!

CommVault employee was in a hurry to equate this to something Simpana caught up recently.  This is what I believe he is referring to. (I am asking him to tweet back if there is anything else).  Quoted from Simpana 10 online documentation.

DASH Full is a read optimized Synthetic Full operation which does not require traditional full backups to be performed. Once the first full backup is completed, changed blocks are protected during incremental or differential backups. A DASH Full will run in place of traditional full or synthetic full. This operation does not require movement of data. It will simply update indexing information and the deduplication database signifying that a full backup has been completed. This will significantly reduce the time it takes to perform full backups.

There are so many things I want to say about this, but I am trying to be concise here with bullet points.

  • What Simpana has here is an equivalent of NetBackup OpenStorage Optimized Synthetics that was introduced in NetBackup 6.5.4 (in 2009). While NetBackup still supports this capability, Symantec had taken this to the next level with NetBackup Accelerator. For the record, NetBackup Accelerator is also backed by Optimized Synthetics and hence the so-called ‘block detection’ is there in NetBackup since 2009.
  • The optimization I was talking about was the capability to detect deleted blocks from the CBT data stream while CommVault is touting about data movement within backup storage!
  • The DASH full requires incremental backups and separate schedules for synthetic backups. NetBackup Accelerator eliminates this operational inefficiency by synthesizing full image inline using the resources needed for an incremental backup.
  • If you are curious about how NetBackup Accelerator in general is different from Optimized Synthetics (or DASH Full), this blog would help.
  • Last but not the least, did I say that NetBackup Accelerator for VMware works with enterprise applications as well? Thus both CBT and deleted blocks detection (both relevant to applications that does the real work inside VM) adds real value for NetBackup Accelerator

High Availability for Business Critical Applications on VMware vSphere

In the last blog we talked about VMware vSphere HA and FT. As we discussed, vSphere HA is quite impressive in protecting against infrastructure failures at a reasonable cost. vSphere FT, on the other hand, has very limited use cases. However, none of these solutions are sufficient to meet high availability requirements for business critical applications with demanding service level agreements.

  1. Neither vSphere HA nor FT has application awareness. These technologies monitor just the container (the virtual machine). If an application or a resource that it depends on goes down, these technologies cannot detect and remediate the issue.
  2. Both technologies cannot provide availability during planned downtimes. If the application or operating system needs to be patched, the application will not be available to users.
  3. The remediation in vSphere HA requires restarting guest operating system that can be time consuming. This poor RTO may not be suited for enterprise applications.

This is where Symantec comes to rescue VMware vSphere administrators. Symantec has two products to fill these gaps so that organizations can confidently virtualize business critical applications. Thus you get to enjoy the agility and cost efficiency of VMware vSphere without compromising enterprise availability.

 Symantec ApplicationHA: This solution solves problem 1 given above. Symantec ApplicationHA monitors designated application and resources (e.g. disk, volume, file system, network…). If a failure is detected Symantec ApplicationHA can restart the application and its resources in a pre-defined order. The application monitoring is quite efficient and foolproof. For example, if you are monitoring MS SQL Server, you can configure the ApplicationHA agent to login and logout from the database as if it were a regular database user. If the application restart fails (it can attempt application restarts for a configured number of times), Symantec ApplicationHA will send a trigger to vSphere HA (if available) to restart the VM on the same or on a different host.

Symantec ApplicationHA has support for over 21 business critical applications. Moreover, it provides a framework to create custom agents for homegrown applications as well. Symantec has been in the business of HA and DR for long time with superb reputation for HA agents.

Symantec Cluster Server, powered by Veritas: This product solves all the three problems stated earlier. This can work with or without vSphere HA. The application monitoring and remediation workflow is similar to that of Symantec ApplicationHA. In fact, the agents for both the products are the same. What is different about Symantec Cluster Server is its ability to migrate just the application and its resources to a standby VM as part of remediation.  There is no need to wait for VM to restart thereby significantly reducing the downtime and improving RTO.

The ability to migrate application to another VM also mitigates the downtime normally incurred for planned activities like applying maintenance updates. You can update patches on standby node and migrate the application. Considering the long patching processes for modern operating systems, you definitely don’t want to deploy business critical applications on vSphere without the availability from Symantec Cluster Server.

Symantec Cluster Server for VMware is purpose built for virtual environments. When compared to traditional clusters like Windows Failover Cluster (formerly known as Microsoft Cluster Server), Symantec Cluster Server gives you high availability without compromising the perks of virtualization. For example, Windows Failover Cluster requires you to create VMs with physical RDM (raw device mapping) disks. If you use RDM, the flagship vSphere capabilities like vMotion, DRS, vStorage API based backups etc. are lost. Symantec Cluster Server has hot-plug APIs to work in VMFS and NFS based datastores.

 Note: VMware has released product called vSphere App HA with vSphere 5.5 release. Lorenzo has written a great blog where Symantec Application HA and vSphere App HA are compared in detail. Check it out here.

Run baby run! High Availability for business critical applications in virtualized environments

Most of you are on a journey to a software defined data center. Some of you used virtualization to consolidate infrastructure to reduce capital expanses. Some of you may be virtualizing (or starting to think virtualizing) business applications to take advantage of the agility and flexibility that virtualization brings. Naturally, one thing you may be worried a lot is system and application availability if you have reached that part of the journey.

The good news is that VMware is not a stranger to HA. VMware vSphere includes a feature named vSphere HA (formerly VMware HA) that protects VMs against hardware failures.  Two or more ESXi hosts can form an HA cluster. vSphere HA provides the following values.

  1. Decent protection against hardware failures (ESXi host failures). When a host fails, the virtual machines on that host can be restarted on another ESXi host sharing the same data store.
  2. Limited protection against guest OS failures. The VMware tools running on guess operating system sends heartbeats to vSphere HA. If heartbeat stops (e.g. the guest operating system is hung), vSphere HA can restart the VM on the same or on a different ESXi host.

The use of vSphere HA depends on the service level agreement (SLA) between IT department and business unit. In most development/test workloads, vSphere HA is good enough as the services can be resumed in less than 10 minutes. The main bottleneck here is the time it takes to reboot the guest operating system.

Another solution is vSphere Fault Tolerance (vSphere FT).  It creates and maintains an additional copy of the VM being protected. It provides continuous availability by ensuring that the states of the primary and secondary VMs are identical at any point in the instruction execution of the virtual machine. However, vSphere FT is not for everyone. Although its protection against hardware failures is impeccable, its protection against OS and applications misbehavior is extremely limited. The cost of operating two virtual machines (and related storage) and other limitations like lack of support for vStorage APIs makes vSphere FT suitable for very limited use cases.

Both vSphere HA and vSphere FT lacks something quite important when it comes to protecting business critical workloads, viz. application awareness. Let us say that you are running an instance of Oracle with a few databases inside a virtual machine. What happens if an Oracle instance fails? What happens of an instance loses access to underlying storage? Neither VMware HA nor FT detects it and hence downtime will be incurred. Downtime = Lost revenue.

There is another weakness in vSphere HA and vSphere FT solutions. It does not protect applications against planned downtimes. When you need to patch, upgrade or perform any other maintenance task related to components within the guest (operating system binaries, application binaries etc.) you must shutdown the application that may be costly for tier 1 business critical applications.

ScenariovSphere HAvSphere FT
Detect host failureVMs are restarted on another host (Recovery time = restart time)The VM executing instructions in lockstep on surviving host takes over (Recovery time is near zero)
Detect VM failure (VM not sending heart beats, OS hung) VM is restartedNo protection likely as both VMs are in lockstep
Detect Application FailureNo ProtectionNo Protection
Compatibility with vMotionYesYes
Compatibility with vStorage APIs for Data Protection (VADP)YesNo (in guest backup agent required)
Avoiding Planned Downtime (patching, upgrades etc.) Planned downtime cannot be avoidedPlanned downtime cannot be avoided

Symantec has solutions to tackle these types of scenarios. One was jointly developed with VMware. The second one comes from a time-tested solution that was ported to support vSphere platform. Let us look at each of them in another blog.

What’s up with VADP backups and VDDK on vSphere 5.1?

VMware vSphere 5.1 has been in the market for more than a few months now and the interest in the new capabilities is high. Because of this the market saw many backup vendors rush to announce support for vSphere 5.1 in their VADP (vStorage APIs for Data Protection) integration. Everything looked clean and shiny and new.

On November 21, Symantec made an interesting announcement1. In a nutshell, the statement was that support for vSphere 5.1 would be delayed in its NetBackup and Backup Exec products. It was because they discovered issues while testing the VADP 5.1 API for integration. The API in the current form may introduce risk in performing consistent backups and ensuring reliable restores. All vendors receive the same API, not all vendors perform the same level of testing.

In order to explain the intricacies, first we need to take a quick look at how a backup product is integrated with VMware vSphere. With each release of vSphere, VMware publishes a set of APIs known as VMware APIs for Data Protection or VADP. One of the key components of VADP is Virtual Disk Development kit aka VDDK. This is the component through which third party code receives authenticated access to vSphere Datastores and virtual machine disk files. VMware makes this component available to its technology partners. Partners (backup product vendors in this case) ship this along with their product that has calls to vStorage APIs.

With each version of vSphere, an equivalent version of VDDK is released. The VDDK is generally backward compatible to one or more earlier versions of vSphere. For example, VDDK 5.1 supports2 vSphere 5.1, 5.0 and 4.1. VDDK 5.0 supports3 vSphere 5.0, 4.1, 4.0 and VI 3.5. Since the updated VDDK is required to understand the modified data structures in a new version of vSphere, lower versions of VDDK are in general not supported for accessing a higher version of vSphere. For example, VMware historically and currently (as of today) does not support the use of VDDK 5.0 to access datastores in vSphere 5.1.  VMware documents supported versions of vSphere for each of its VDDK versions in release notes.

The key to remember is the statement in bold face above. VMware does not support any violated combinations because of the risks and uncertainties. The partners are expected to ship the correct version of VDDK when they announce the availability of support for a given vSphere release.

What Symantec announced and VMware confirmed4 is that VDDK 5.1 has issues and hence the support for vSphere 5.1 in its products will be delayed. This makes sense since VDDK 5.1 is the only version currently allowed to access vSphere 5.1. The face-saving reactions from other vendors to this announcement revealed some of the dirty games and ugly truths to come out in the area of VADP/VDDK integration.


  1. Vendors were claiming support for vSphere 5.1 but still shipping VDDK 5.0 with their products. This is currently not supported by VMware because of the uncertainties.  This may change but at the time vendors claiming support, they were taking risks that typically are not acceptable in field of data protection business.
  2. Vendors were mucking with API calls and silently killing hung processes. That may work for an isolated or random hang. But will not work when there are repeatable hang situations like those observed in VDDK 5.1. Plus, there are performance and reliability concerns in abruptly ending sessions with vSphere.
  3. Most vendors weren’t testing all the edge cases and never realized the problems in VDDK 5.1, thus prematurely announcing support for 5.1


If your backup vendor currently supports vSphere 5.1, be sure to ask what their situation is.

Sources and references:

1. Quality wins every time: vSphere 5.1 support update, Symantec official blog.

2. VDDK 5.1 Release Notes, VMware Support resources

3. VDDK 5.0 Release Notes, VMware Support resources

4. Third-party backup software using VDDK 5.1 may encounter backup/restore failures, VMware Support KB

Dear EMC Avamar, please stop leeching from enterprise vSphere environments

VMware introduced vStorage APIs for Data Protection (VADP) so that backup products can do centralized, efficient, off-host LAN free backup of vSphere virtual machines.

In the physical world, most systems have plenty of resources, often underutilized. Running backup agent in such a system wasn’t a primary concern for most workloads. The era of virtualization changed things drastically. Server consolidation via virtualization allowed organizations to get the most out of their hardware investment. That means backup agents do not have the luxury to simply take up resources from production workloads anymore as the underlying ESXi infrastructure is optimized and right-sized to get line of business applications running smoothly.

VMware solved the backup agent problem from the early days of ESX/ESXi hosts. The SAN transport method for virtual machine backup was born during the old VCB (VMware Consolidated Backup) days and further enhanced in VADP (vStorage APIs for Data Protection). The idea is simple. Let the snapshots of virtual machine be presented to a workhorse backup host and allow that system do the heavy lifting of processing and moving data to backup storage. The CPU, memory and I/O resources on ESX/ESXi hosts are not used during backups. Thus the production virtual machines are not starved for hypervisor resources during backups.

For non-SAN environments like NFS based datastores, the same dedicated host can use Network Block Device (NBD) transport to stream data through management network. Although it is not as efficient as SAN transport, it still offloaded most of the backup processing to the dedicated physical host.

Dedicating one or more workhorse backup systems to do backups was not practical for small business environments and remote offices. To accommodate that business need, VMware allowed virtual machines to act as backup proxy hosts for smaller deployments. This is how hotadd transport was introduced.

Thus your backup strategy is to use a dedicated physical workhorse backup system to offload all or part of backup processing using SAN or NBD transports. For really small environments, a virtual machine with NBD or hotadd transport would suffice.

Somehow EMC missed this memo. Ironically, EMC had been the proponent of running Avamar agent inside the guest instead of adopting VMware’s VADP. The argument was that the source side deduplication at Avamar agent minimizes the amount of data to be moved across the wire. While that is indeed true, EMC conveniently forgot to mention that CPU intensive deduplication within the backup agent would indeed leech ESXi resources away from production workloads!

Then EMC conceded and announced VADP support. But the saga continues. What EMC had provided is hotadd support for VADP. That means you allocate multiple proxy virtual machines even in the case of enterprise vSphere environments. Some of the best practice documents for Avamar suggest deploying a backup proxy host for every 20 virtual machines. Typical vSphere environment in an enterprise would have 1000 to 3000 virtual machines. That translates to 50 to 150 proxy hosts! These systems are literally the leach worms in vSphere environment draining resources that belong to production applications.

The giant tower of energy consuming nodes in Avamar grid is not even lifting a finger in processing backups! It is merely a storage system. The real workhorses are ESXi hosts giving in CPU, memory and I/O resources to Avamar proxy hosts to generate and deduplicate backup stream.

The story does not change even if you replace Avamar Datastore with a Data Domain device. In that case, the DD Boost agent running on Avamar proxy hosts are draining resources from ESXi to reduce data at source and send deduplicated data to Data Domain system.

EMC BRS should seriously look at the way Avamar proxy hosts with or without DD Boost are leaching resources from precious production workloads. The method used by Avamar is recommended only for SMB and remote office environments. Take the hint from VMware engineering as to why Avamar technology was borrowed to provide a solution for SMB customers in VMware Data Protection (VDP) product. You can’t chop a tree with a penknife!

The best example for effectively using VADP for enterprise vSphere is NetBackup 5220. EMC BRS could learn a lesson or two from how Symantec integrates with VMware in a much better way. This appliance is a complete backup system with intelligent deduplication and VADP support built right in for VMware backups.  This appliance does the heavy lifting so that production workloads are unaffected by backups.

How about recovery? For thick provisioned disks SAN transport is indeed the fastest. For thin provisioned disks, NBD performs much better. The good news on Symantec NetBackup 5220 is that the user could control the transport method for restores as well. You might have done the backup using SAN transport, however you can do the restore using NBD if you are restoring thin provisioned virtual machines. For Avamar, hot-add is the end-all for all approaches. NBD on a virtual proxy isn’t useful, hence using that is a moot point when the product offers just virtual machine proxy for VADP.

The question is…

Dear EMC Avamar, when will you offer an enterprise grade VADP based backup for your customers? They deserve enterprise grade protection for the investment they had done for large Avamar  Datastores and Data Domain devices.



VMware announces vSphere Data Protection (VDP), what is in it for you?

vSphere Data Protection (VDP) is VMware’s new virtual backup appliance for SMB available in VMware vSphere 5.1. It replaces the older VMware Data Recovery (vDR) product. There had been a number of confusions around this announcement; partly due the way EMC, VMware’s parent company, made some press releases.

Is VDP the same as EMC’s Avamar Virtual Edition (Avamar VE)?

No, it is not. VDP is a product from VMware. The only technology VMware had used from Avamar is its deduplication engine. The older vDR had limited dedupe capabilities as it was mainly coming from change block tracking (CBT) in vStorage APIs for Data Protection (VADP). With Avamar’s technology, VDP now provides variable block based deduplication.

I heard that I can upgrade from VDP to EMC Avamar if I need to grow beyond 2TB, is that true?

No, VDP is not a ‘lite’ version of Avamar. It is a different product altogether.

What are my options if I need to grow beyond 2TB?

You could add additional VDP appliances. Up to 10 VDP appliances are supported under one vCenter server. However, these are separate islands of storage. These appliances do not provide global deduplication among these storage pools.

Having said that it is more likely for you to hit other limitations in VDP before hitting the 2TB limit. Note that Avamar based deduplication engine is suitable only for SMBs who could afford to have black out windows and maintenance windows in their backup solution. These are the periods of time where the house keeping work is being done by dedupe engine.  The system is not available for running backup jobs.

Only 8 virtual machines can be backed up concurrently that might increase backup windows. There is no SAN transport capability to offload production ESXi hosts from backup tasks. There is no good way to make additional copies for redundancy or extended retention like replication to remote location or cloud. VMware has made it clear that VDP is truly for SMBs and encourages customers to look at enterprise class backup solutions from partners for larger environments.

Why would EMC let VMware use its Avamar technology at no additional cost to customers? Is EMC trying to promote its products?

Just like how Windows/UNIX/Linux operating environments provide basic utilities for backups, VMware had always provided basic backup solution with its offerings. In the days of ESX service console, the Linux based console provided tools like tar and cpio. With ESXi where service console is no more, vDR was brought to the table. vDR had its limitations. Now the choice is to innovate vDR or license a relatively mature technology. As parent company has a solution, VMware went the route of taking Avamar dedupe engine for storage and build its own capabilities for scheduling backups and managing recovery points.

EMC’s Avamar is a popular product in small environments. Although EMC had been trying hard to make Avamar enterprise ready, its deduplication engine has significant limitations. It requires blackout and maintenance windows. With larger capacities, the duration of these windows also increases. With the acquisition of Data Domain, EMC is now focusing more on using its DD Boost technology for distributing the deduplication workload. In fact, EMC recommends the use of Data Domain Boost with Avamar (instead of using Avamar’s dedupe engine) for larger workloads. I believe it was a good decision to support VMware’s SMB market with a technology that was meant for SMB in the first place. I think Avamar dedupe engine is counting its days as a technology that can make money. See my earlier blog on EMC’s backup portfolio.

Stay tuned. More on VDP coming soon!

vSphere changed block tracking: A powerful weapon for backup applications to shrink backup window

Changed block tracking is not a new technology. Those who have used Storage Foundation for Oracle would know that VERITAS file system (VxFS) provides no-data check points which can be used by backup applications to identify and backup just the changed blocks from the file systems where database files are housed.  This integration was in NetBackup since version 4.5 that was released 10 years ago! It is still used by Fortune 500 companies to protect mission critical Oracle databases that would otherwise require a large backup window with traditional RMAN streaming backups.

VMware introduced change block tracking (CBT) since vSphere 4.0 and is available for virtual machines version 7 or higher. NetBackup 7.0 added support for CBT right away. Backing up VMware vSphere environments got faster. When a VM has CBT turned on, it can track changes to virtual machine disk (VMDKs) sectors.  Its impact on VM performance is marginal. Backup applications with VADP (vStorage APIs for Data Protection) support can use an API (named QueryChangedDiskAreas) to identify and copy changed blocks from a particular point in time. This time point is identified using an argument named ChangeId in the API call.

VMware has made this quite easy for backup vendors to implement. Powerful weapons can be dangerous when not used with utmost care. An unfortunate problem in Avamar’s implementation of CBT came to light recently. I am not picking on Avamar developers here, it is not possible to predict all the edge cases during development and they are working hard to fix this data loss situation. As an engineer myself, I truly empathize with Avamar developers for getting themselves into this unfortunate situation. This blog is a humble attempt to explain what had happened as I got a few questions from the field seeking input on the use of CBT after the EMC reported issues in Avamar.

As we know, VADP lets you query the changed disk areas to get all the changes in a VMDK since a point in time corresponding to a previous snapshot. Once the changed blocks are identified, those blocks are transferred to the backup storage. The way the changed blocks are used by the backup application to create the recovery point (i.e. backup image) varies from vendor to vendor.

No matter how the recovery point is synthesized, the backup application must make sure that the changed blocks are accurately associated with the correct VMDK because a VM can have many disks. As you can imagine if the blocks were associated with the wrong disk in backup image; the image is not an accurate representation of source. The recovery from this backup image will fail or will result in corrupt data on source.

The correct way to identify VMDK is using their UUIDs which are always unique. Using positional identifies like controller-target-LUN at the VM level are not reliable as those numbers could change when some of the VMDK are removed or new ones are added to a VM. This is an example of disk re-order problem. This re-order can also happen for non-user initiated operations. In Avamar’s case, the problem was that the changed blocks belonging one VMDK was getting associated with a different VMDK in backup storage on account of VMDK re-ordering. Thus the resulting backup image (recovery point) generated did not represent the actual state of VMDK being protected.

To make the unfortunate matter worse, there was a cascading effect. It appears that Avamar’s implementation of generating a recovery point is to use the previous backup as the base. If disk re-order happened after nth backup, all backups after nth backup are affected on account of the cascading effect because new backups are inheriting the base from corrupted image.

This sounds scary. That is how I started getting questions on reliability of CBT for backups from the field. Symantec supports CBT in both Backup Exec and NetBackup. Are Symantec customers safe?

Yes, Symantec customers using NetBackup and Backup Exec are safe.

How do Symantec NetBackup and Backup Exec handle re-ordering? Block level tracking and associated risks were well thought out during the implementation. Implementation for block level tracking is not something new for Symantec engineering because such situations were accounted for in the design for implementing VxFS’s no-data check point block level tracking several years ago.

There are multiple layers of resiliency built-in Symantec’s implementation of CBT support. I shall share oversimplified explanations for two of those relevant in ensuring data integrity that are relevant here.

Using UUID to accurately associate ChangeId to correct VMDK: We already touched on this. UUID is always unique and using it to associate the previous point in time for VMDK is safe. Even when VMDKs get re-ordered in a VM, UUID stays the same. Thus both NetBackup and Backup Exec always associate the changed blocks to the correct VM disk.

Superior architecture that eliminates the ‘cascading-effect’:  Generating a corrupted recovery point is bad. What is worse is to use it as the base for newer recovery points. The corruption goes on and hurt the business if left unnoticed for long time. NetBackup and Backup Exec never directly inject changed blocks to an existing backup to create a new recovery point. The changed blocks are referenced separately in the backup storage. During a restore, NetBackup recreates the point in time during run-time. This is the reason NetBackup and Backup Exec are able to support block level incremental backups even to tape media! Thus a corrupted backup (should that ever happen) never ‘propagates’ corruption to future backups.

Introduction to VMware vStorage APIs for Data Protection aka VADP

6. Getting to know NetBackup for VMware vSphere

Note: This is an extended version of my blog in VMware Communities: Where do I download VADP? 

Now that we talked about NetBackup master servers and media servers, it is time to get into learning how NetBackup Client on VMware backup host (sometimes known as VMware proxy host) protects the entire vSphere infrastructure. In order to get there, we first need a primer on vStorage APIs for Data Protection (VADP) from VMware.  We will use two blogs to cover this topic.

Believe it or not, this question of what VADP really is comes up quite often in VMware Communities, especially in Backup & Recovery Discussions

Backup is like an insurance policy. You don’t want to pay for it, but not having it is the recipe for sleepless nights. You need to protect data on your virtual machines to guard against hardware failures and user errors. You may also have regulatory and compliance requirements to protect data for longer term.

With modern day hardware and cutting edge hypervisors like that from VMware, you can protect data just by running a backup agent within the guest operating system. In fact, for certain workloads; this is still the recommended way.

VMware had made data protection easy for administrators. That is what vStorage APIs for Data Protection (VADP) is. It available since vSphere 4.0 release. It is a set of APIs (Application Programming Interfaces) made available by VMware for independent backup software vendors. These APIs make it possible for backup software vendors to embed the intelligence needed to protect virtual machines without the need to install a backup agent within the guest operating system. Through these APIs, the backup software can create snapshots of virtual machines and copy those to backup storage.

Okay, now let us come to the point. As a VMware administrator what do I need to do to make use of VADP? Where do I download VADP? The answer is…Ensure that you are NOT using hosts with free ESXi licenses.

  1. Ensure that you are NOT using hosts with free ESXi licenses
  2. Choose a backup product that has support for VADP

The first one is easy to determine. If you are not paying anything to VMware, the chances are that you are using free ESXi. In that case, the only way to protect data in VMs is to run a backup agent within the VM. No VADP benefits.

Choosing a backup product that supports VADP can be tricky. If your organization is migrating to a virtualized environment, see what backup product is currently in use for protecting physical infrastructure. Most of the leading backup vendors have added support for VADP. Symantec NetBackup, Symantec Backup Exec, IBM TSM, EMC NetWorker, CommVault Simpana are examples.

If you are not currently invested in a backup product (say, you are working for a start-up), there are a number of things you need to consider. VMware has a free product called VMware Data Recovery (VDR) that supports VADP. It is an easy to use virtual appliance with which you can schedule backups and store it in deduplicated storage. There are also point products (Quest vRanger, Veeam Backup & Replication etc.) which provide additional features. All these products are good for managing and storing backups of virtual machines on disk for shorter retention periods. However, if your business requirements need long term retention, you would need another backup product to protect the backup repositories of these VM only solutions which can be a challenge. Moreover, it is less unlikely to see businesses that are 100% virtualized. You are likely to have those NAS devices for file serving, desktops and laptops for end users and so on.  Hence a backup product that supports both physical systems and VADP are ideal in most solutions.

Although VADP support is available from many backup vendors, the devil is in the details. Not all solutions use VADP the same way. Furthermore, many vendors add innovative features on top of VADP to make things better.  We will cover this next.

Back to NetBackup 101 for VMware Professionals main page

Next: Coming Soon!