Did Rubrik make Veeam’s Modern Data Protection a bit antiquated?

Veeam Antiquated?
Veeam Antiquated?

Modern Data Protection™ got a trademark from Veeam. No, I am not joking. It is true! Veeam started with a focused strategy. It will do nothing but VMware VM backups. Thankfully VMware had done most of the heavy lifting with vStorage APIs for Data Protection (VADP) so developing a VM-only backup solution was as simple as creating a software plugin for those APIs and developing a storage platform for keeping the VM copies. With a good marketing engine Veeam won the hearts of virtual machine administrators and it paid off.

As the opportunity to reap the benefits as a niche VM-only backup started to erode (intense competition, low barrier to entry on account of VADP), Veeam is attempting to re-invent its image by exploring broader use cases like physical systems protection, availability etc. Some of these efforts make it look like its investors are hoping for Microsoft to buy Veeam. The earlier wish to sell itself to VMware shattered when VMware adopted EMC Avamar’s storage to build its data protection solution.

Now Rubrik is coming to market and attacking the very heart of Veeam’s little playground while making Veeam’s modern data protection a thing of past. Rubrik’s market entry is also through VMware backups using vStorage APIs but with a better storage backend that can scale out.

Both Veeam and Rubrik have two high level tiers. The frontend tier connects to vSphere through VMware APIs. It discovers and streams virtual machine data. Then there is a backend storage tier where virtual machine data is stored.

For Veeam the front-end is a standalone backup server and its possible backup proxies. The proxies (thanks to VMware hot-add) enable limited level of scale-out for the frontend, but this approach leeches resources from production and increases complexity. The backend is one or more backup repositories. There is nothing special about the repository; it is a plain file system. Although Veeam claims to have deduplication built-in, it is perhaps the most primitive in the industry and works only across virtual machines from the same backup job.

Rubrik is a scale-out solution where the frontend and backend are fused together from users’ perspective. You buy Rubrik bricks where each brick consists of four nodes. These are the compute and storage components that cater to both frontend in streaming virtual machines from vSphere via NBD or SAN transport (kudos to Rubrik for ditching hot-add!) and backend, which is a cluster file system that spans nodes and bricks. Rubrik claims to have global deduplication across all its cluster file system namespace.

Historically, the real innovation from Veeam was the commercial success of powering on virtual machines directly from the backup storage. Veeam may list several other innovations (e.g. they may claim that they ‘invented’ agentless backups, but it was actually done by VMware in its APIs) in their belt but exporting VMs directly from backup is something every other vendor followed afterwards and hence kudos go to Veeam on that one. But this innovation may backfire and may help Veeam customers to transition to Rubrik seamlessly.

Veeam customers are easy targets for Rubrik for a few reasons.

  • One of the cornerstones of Veeam’s foundation is its dependency on vStorage APIs from VMware; it is not a differentiator because all VMware partners have access to those APIs. Unlike other backup vendors, Veeam didn’t focus on building application awareness and granular quiescence until late in the game
  • Veeam is popular in smaller IT shops and shadow projects within large IT environments. It is a handy backup tool, but it is not perceived as a critical piece in meeting regulatory specs and compliance needs. It had been marketed towards virtual machine administrators; hence higher-level buying centers do no have much visibility. That adversely affects Veeam’s ‘stickiness’ in an account.
  • Switching from one backup application to another had been a major undertaking historically. But that is not the case if customers want to switch from Veeam to something else. Earlier days, IT shops needed to standup both solutions until all the backup images from the old solution would hit the expiration dates. Or you have to develop strategies to migrate old backups into the new system, a costly affair. When the source is Veeam with 14 recovery points per VM by default, you could build workflows that spin up each VM backup in a sandbox and let the new solution back it up as if it is a production copy. (Rubrik may want to work on building a small migration tool for this)
  • Unlike Veeam that started stitching support for other hypervisors and physical systems afterwards, Rubrik has architected its platform to accommodate future needs. That design may intrigue customers when VMware customers are looking to diversify into other hypervisors and containers.

The fine print is that Rubrik is yet to be proven. If the actual product delivers on the promises, it may have antiquated Veeam. The latter may be become a good case study for business schools on not building a product that is dependent too much on someone else’s technology.

Thanks to #VFD5 TechFieldDay for sharing Rubrik’s story. You can watch it here: Rubrik Technology Deep Dive

Disclaimer: I work for Veritas/Symantec, opinions here are my own.

What’s up with VADP backups and VDDK on vSphere 5.1?

VMware vSphere 5.1 has been in the market for more than a few months now and the interest in the new capabilities is high. Because of this the market saw many backup vendors rush to announce support for vSphere 5.1 in their VADP (vStorage APIs for Data Protection) integration. Everything looked clean and shiny and new.

On November 21, Symantec made an interesting announcement1. In a nutshell, the statement was that support for vSphere 5.1 would be delayed in its NetBackup and Backup Exec products. It was because they discovered issues while testing the VADP 5.1 API for integration. The API in the current form may introduce risk in performing consistent backups and ensuring reliable restores. All vendors receive the same API, not all vendors perform the same level of testing.

In order to explain the intricacies, first we need to take a quick look at how a backup product is integrated with VMware vSphere. With each release of vSphere, VMware publishes a set of APIs known as VMware APIs for Data Protection or VADP. One of the key components of VADP is Virtual Disk Development kit aka VDDK. This is the component through which third party code receives authenticated access to vSphere Datastores and virtual machine disk files. VMware makes this component available to its technology partners. Partners (backup product vendors in this case) ship this along with their product that has calls to vStorage APIs.

With each version of vSphere, an equivalent version of VDDK is released. The VDDK is generally backward compatible to one or more earlier versions of vSphere. For example, VDDK 5.1 supports2 vSphere 5.1, 5.0 and 4.1. VDDK 5.0 supports3 vSphere 5.0, 4.1, 4.0 and VI 3.5. Since the updated VDDK is required to understand the modified data structures in a new version of vSphere, lower versions of VDDK are in general not supported for accessing a higher version of vSphere. For example, VMware historically and currently (as of today) does not support the use of VDDK 5.0 to access datastores in vSphere 5.1.  VMware documents supported versions of vSphere for each of its VDDK versions in release notes.

The key to remember is the statement in bold face above. VMware does not support any violated combinations because of the risks and uncertainties. The partners are expected to ship the correct version of VDDK when they announce the availability of support for a given vSphere release.

What Symantec announced and VMware confirmed4 is that VDDK 5.1 has issues and hence the support for vSphere 5.1 in its products will be delayed. This makes sense since VDDK 5.1 is the only version currently allowed to access vSphere 5.1. The face-saving reactions from other vendors to this announcement revealed some of the dirty games and ugly truths to come out in the area of VADP/VDDK integration.

 

  1. Vendors were claiming support for vSphere 5.1 but still shipping VDDK 5.0 with their products. This is currently not supported by VMware because of the uncertainties.  This may change but at the time vendors claiming support, they were taking risks that typically are not acceptable in field of data protection business.
  2. Vendors were mucking with API calls and silently killing hung processes. That may work for an isolated or random hang. But will not work when there are repeatable hang situations like those observed in VDDK 5.1. Plus, there are performance and reliability concerns in abruptly ending sessions with vSphere.
  3. Most vendors weren’t testing all the edge cases and never realized the problems in VDDK 5.1, thus prematurely announcing support for 5.1

 

If your backup vendor currently supports vSphere 5.1, be sure to ask what their situation is.

Sources and references:

1. Quality wins every time: vSphere 5.1 support update, Symantec official blog.

2. VDDK 5.1 Release Notes, VMware Support resources

3. VDDK 5.0 Release Notes, VMware Support resources

4. Third-party backup software using VDDK 5.1 may encounter backup/restore failures, VMware Support KB

vSphere changed block tracking: A powerful weapon for backup applications to shrink backup window

Changed block tracking is not a new technology. Those who have used Storage Foundation for Oracle would know that VERITAS file system (VxFS) provides no-data check points which can be used by backup applications to identify and backup just the changed blocks from the file systems where database files are housed.  This integration was in NetBackup since version 4.5 that was released 10 years ago! It is still used by Fortune 500 companies to protect mission critical Oracle databases that would otherwise require a large backup window with traditional RMAN streaming backups.

VMware introduced change block tracking (CBT) since vSphere 4.0 and is available for virtual machines version 7 or higher. NetBackup 7.0 added support for CBT right away. Backing up VMware vSphere environments got faster. When a VM has CBT turned on, it can track changes to virtual machine disk (VMDKs) sectors.  Its impact on VM performance is marginal. Backup applications with VADP (vStorage APIs for Data Protection) support can use an API (named QueryChangedDiskAreas) to identify and copy changed blocks from a particular point in time. This time point is identified using an argument named ChangeId in the API call.

VMware has made this quite easy for backup vendors to implement. Powerful weapons can be dangerous when not used with utmost care. An unfortunate problem in Avamar’s implementation of CBT came to light recently. I am not picking on Avamar developers here, it is not possible to predict all the edge cases during development and they are working hard to fix this data loss situation. As an engineer myself, I truly empathize with Avamar developers for getting themselves into this unfortunate situation. This blog is a humble attempt to explain what had happened as I got a few questions from the field seeking input on the use of CBT after the EMC reported issues in Avamar.

As we know, VADP lets you query the changed disk areas to get all the changes in a VMDK since a point in time corresponding to a previous snapshot. Once the changed blocks are identified, those blocks are transferred to the backup storage. The way the changed blocks are used by the backup application to create the recovery point (i.e. backup image) varies from vendor to vendor.

No matter how the recovery point is synthesized, the backup application must make sure that the changed blocks are accurately associated with the correct VMDK because a VM can have many disks. As you can imagine if the blocks were associated with the wrong disk in backup image; the image is not an accurate representation of source. The recovery from this backup image will fail or will result in corrupt data on source.

The correct way to identify VMDK is using their UUIDs which are always unique. Using positional identifies like controller-target-LUN at the VM level are not reliable as those numbers could change when some of the VMDK are removed or new ones are added to a VM. This is an example of disk re-order problem. This re-order can also happen for non-user initiated operations. In Avamar’s case, the problem was that the changed blocks belonging one VMDK was getting associated with a different VMDK in backup storage on account of VMDK re-ordering. Thus the resulting backup image (recovery point) generated did not represent the actual state of VMDK being protected.

To make the unfortunate matter worse, there was a cascading effect. It appears that Avamar’s implementation of generating a recovery point is to use the previous backup as the base. If disk re-order happened after nth backup, all backups after nth backup are affected on account of the cascading effect because new backups are inheriting the base from corrupted image.

This sounds scary. That is how I started getting questions on reliability of CBT for backups from the field. Symantec supports CBT in both Backup Exec and NetBackup. Are Symantec customers safe?

Yes, Symantec customers using NetBackup and Backup Exec are safe.

How do Symantec NetBackup and Backup Exec handle re-ordering? Block level tracking and associated risks were well thought out during the implementation. Implementation for block level tracking is not something new for Symantec engineering because such situations were accounted for in the design for implementing VxFS’s no-data check point block level tracking several years ago.

There are multiple layers of resiliency built-in Symantec’s implementation of CBT support. I shall share oversimplified explanations for two of those relevant in ensuring data integrity that are relevant here.

Using UUID to accurately associate ChangeId to correct VMDK: We already touched on this. UUID is always unique and using it to associate the previous point in time for VMDK is safe. Even when VMDKs get re-ordered in a VM, UUID stays the same. Thus both NetBackup and Backup Exec always associate the changed blocks to the correct VM disk.

Superior architecture that eliminates the ‘cascading-effect’:  Generating a corrupted recovery point is bad. What is worse is to use it as the base for newer recovery points. The corruption goes on and hurt the business if left unnoticed for long time. NetBackup and Backup Exec never directly inject changed blocks to an existing backup to create a new recovery point. The changed blocks are referenced separately in the backup storage. During a restore, NetBackup recreates the point in time during run-time. This is the reason NetBackup and Backup Exec are able to support block level incremental backups even to tape media! Thus a corrupted backup (should that ever happen) never ‘propagates’ corruption to future backups.

Introduction to VMware vStorage APIs for Data Protection aka VADP

6. Getting to know NetBackup for VMware vSphere

Note: This is an extended version of my blog in VMware Communities: Where do I download VADP? 

Now that we talked about NetBackup master servers and media servers, it is time to get into learning how NetBackup Client on VMware backup host (sometimes known as VMware proxy host) protects the entire vSphere infrastructure. In order to get there, we first need a primer on vStorage APIs for Data Protection (VADP) from VMware.  We will use two blogs to cover this topic.

Believe it or not, this question of what VADP really is comes up quite often in VMware Communities, especially in Backup & Recovery Discussions

Backup is like an insurance policy. You don’t want to pay for it, but not having it is the recipe for sleepless nights. You need to protect data on your virtual machines to guard against hardware failures and user errors. You may also have regulatory and compliance requirements to protect data for longer term.

With modern day hardware and cutting edge hypervisors like that from VMware, you can protect data just by running a backup agent within the guest operating system. In fact, for certain workloads; this is still the recommended way.

VMware had made data protection easy for administrators. That is what vStorage APIs for Data Protection (VADP) is. It available since vSphere 4.0 release. It is a set of APIs (Application Programming Interfaces) made available by VMware for independent backup software vendors. These APIs make it possible for backup software vendors to embed the intelligence needed to protect virtual machines without the need to install a backup agent within the guest operating system. Through these APIs, the backup software can create snapshots of virtual machines and copy those to backup storage.

Okay, now let us come to the point. As a VMware administrator what do I need to do to make use of VADP? Where do I download VADP? The answer is…Ensure that you are NOT using hosts with free ESXi licenses.

  1. Ensure that you are NOT using hosts with free ESXi licenses
  2. Choose a backup product that has support for VADP

The first one is easy to determine. If you are not paying anything to VMware, the chances are that you are using free ESXi. In that case, the only way to protect data in VMs is to run a backup agent within the VM. No VADP benefits.

Choosing a backup product that supports VADP can be tricky. If your organization is migrating to a virtualized environment, see what backup product is currently in use for protecting physical infrastructure. Most of the leading backup vendors have added support for VADP. Symantec NetBackup, Symantec Backup Exec, IBM TSM, EMC NetWorker, CommVault Simpana are examples.

If you are not currently invested in a backup product (say, you are working for a start-up), there are a number of things you need to consider. VMware has a free product called VMware Data Recovery (VDR) that supports VADP. It is an easy to use virtual appliance with which you can schedule backups and store it in deduplicated storage. There are also point products (Quest vRanger, Veeam Backup & Replication etc.) which provide additional features. All these products are good for managing and storing backups of virtual machines on disk for shorter retention periods. However, if your business requirements need long term retention, you would need another backup product to protect the backup repositories of these VM only solutions which can be a challenge. Moreover, it is less unlikely to see businesses that are 100% virtualized. You are likely to have those NAS devices for file serving, desktops and laptops for end users and so on.  Hence a backup product that supports both physical systems and VADP are ideal in most solutions.

Although VADP support is available from many backup vendors, the devil is in the details. Not all solutions use VADP the same way. Furthermore, many vendors add innovative features on top of VADP to make things better.  We will cover this next.

Back to NetBackup 101 for VMware Professionals main page

Next: Coming Soon!