Deduplication for dollar zero?

One of the data protection experts asked me a question after reading my blog on Deduplication Dilemma: Veeam or Data Domain.

I am paraphrasing his question as our conversation was limited to 140 characters at time through Twitter.

“Have you seen this best practice blog on Veeam with Exagrid? Here is the blog.  It says not to do reverse incremental backups. The test Mr. Attlia ran was incomplete. The Veeam deduplication at the first pass is poor, but after that it is worth it, right?”

These are all great questions. I thought of dissecting each aspect and share it here. Before I do that I want to make it clear that deduplication devices are fantastic for use in backups. These work great with backup applications that really offer the ability to restore individual objects. If the backup application ‘knows’ how to retrieve specific objects from backup storage, target deduplication adds a lot of value.  That is why NetBackup, Backup Exec, TSM, NetWorker and the like play well with target deduplication appliances. Veeam, on the other hand, simply mounts the VMDK file from backup store and asks the application administrator to fish for the item he/she is looking for. This is where Veeam falls apart if you try to deploy it in medium to large environments. Although target deduplication appliances are disk based, they are optimized more for sequential access as backup jobs mostly follow sequential I/O pattern. When you perform random I/O on these devices (as it happens when a VM is directly run from it), there is a limit to which those devices can perform.

Exagrid: a great company helping out customers

Exgrid has an advantage here. It has flexibility to keep the most recent backup in hydrated form (Exagrid uses post-process deduplication) which works well with Veeam if you employ reverse incremental backups. In reverse incremental backups, the most recent backup is always a full backup. You can eliminate the performance issues inherent in mounting the image on an ESX host when the image is being served in hydrated form. This is good from the recovery performance perspective.  However, Exagrid recommends not turning on reverse incremental method because it burdens the appliance during backups. This is another dilemma; you have to pick backup performance or recovery performance (RTO), not both.

Let me reiterate this. The problem is not with Exagrid in this case. They are sincerely trying to help customers who happened to choose Veeam. Exagrid is doing the right thing; you want to find methods to help out customers in achieving ROI no matter what backup solution they ended up choosing. I take my hat off at Exagrid in respect.

Now let us take a closer look at other recommendations from Exagrid to alleviate the pain points with Veeam.

Turn off compression in Veeam and Optimize for Local target:  Note that Exagrid suggested turning off compression and choosing Optimize for Local target option. These settings have the effect of eliminating most of what Veeam’s deduplication offers. By choosing those options, you let the real guy (Exagrid appliance) do the work.

Weren’t Mr Attila’s tests incomplete?

Mr. Attila stopped tests after the initial backup. The advantage of deduplication is visible only on subsequent backups. Hence his tests weren’t complete. However, as I stated in the blog; that test simply triggered my own research. I wasn’t basing my opinions just on Mr. Attila’s tests. I should have mentioned this in the earlier blog, but it was already becoming too big.

As I mentioned in the blog earlier, Veeam deduplication capabilities are limited. Quoting Exagrid this time: “Once the ExaGrid has all the data, it can look at the entire job at a more granular level and compress and dedupe within jobs and also across jobs! Veeam can’t do this because it has data constantly streaming into it from the SAN or ESX host, so it’s much harder to get a “big picture” of all the data.”   

If Veeam’s deduplication is the only thing you have, the problem is not just limited to the initial backup. Here are a few other reasons why a target deduplication is important when using Veeam.

  1. The deduplication is limited to a job. Veeam’s manual recommends putting VMs created from the same template into a single job to achieve that dedupe rate. It is true that VMs created from the same template have a lot of redundant OS files and whitespace so the dedupe rate will be good at the beginning. But these are just the skins or shells of you enterprise production data. The real meat is the actual data which is less likely to be the same across multiple VMs. We are better of giving that task to the real deduplication engines!
  2. Let us say you have a job with 20 production VMs. You are going to install something new on one of the VM, so you prefer to do a one-time backup before making any changes. Veeam requires you to create a new job to do this. This is not only inconvenient, but now you lose the advantage of incremental backup. You have to stream the entire VM again. Can we afford this in a production environment?
  3. Veeam incremental backups are heavily dependent on vCenter server. If you move a VM from one vCenter to another or if you had to rebuild your vCenter (Veeam cannot protect an enterprise grade vCenter running on a physical system, but let us not go there for now), you need to start seeding full backups for all your VMs. For example, if you want to migrate from a traditional vCenter server running 4.x to a vCSA 5.0, expect to reseed all the backups again.

My point is that Veeam deduplication is not something you can count on to protect a medium to large environment with these limitations. It has the price of $0 for a reason.

NetBackup and Backup Exec let you take advantage of target deduplication appliances to the fullest potential. As these platforms tracks which image has the objects the application administrator is looking for, they can simply retrieve those objects alone from backup storage. The application administrator can self-serve their needs, no need for  20th century ticket system! The journey to the Cloud starts with empowering users to self-serve their needs from the Cloud.