Deduplication Dilemma: Veeam or Data Domain?

Recently I came across a blog post from Szamos Attila. He ran a deduplication contest between Data Domain and Veeam. His was a very small test environment, just 12 virtual machines with 133GB of data. His observations were significant. I thought I share it here.

Veeam vs. Data Domain
Veeam vs. DataDomain Deduplication Contest run by Szamos Attila

You can read more about Mr. Attila’s tests at his blog here

What does this tell you right of the bat? Well, Veeam’s deduplication is slow; not a big deal as they do not charge for deduplication separately. Not a big deal, right?

Not exactly; there is much more to this story if you take a look at the big picture.

First of all, note that this is a very small data set (just 133GB, even my laptop has more data!). Veeam’s deduplication is not really a true deduplication engine that fingerprints data segments and stores only one copy. It is basically a data reduction technique that works only on a predefined set of backup files. Veeam refers to this set of backup files as a backup repository. You can only run one backup job to a given repository at any given time. Hence if you want to backup two virtual machines concurrently, you need to send them to two different backup repositories. If you do that your backup data is not deduplicated across those two jobs. Thus your data reduction strategy using Veeam’s deduplication and concurrent processing of jobs are inversely proportional to one another. This is a major drawback as VMs generally contain a lot of redundant data. In fact, Veeam recommends to run deduplication mainly for a given backup set where all the VMs come from the same template.

Secondly, note that even with a single backup repository; this tiny data set (of just 133GB) took twice as long as Data Domain’s deduplication. Now think about a small business environment with a few terabytes of data. Imagine the time it would take to protect that data. When it comes to an enterprise data center (100s of terabytes); you must depend on a target based deduplication solution like Data Domain to get the job done.

So, can I simply let Veeam do the data movement and count on Data Domain do the deduplication? That is one way to solve this problem. But you have a multitude of other issues with that approach because of the way Veeam does restores.

Veeam does not have a good way to let application administrators in guest operating system (e.g. Exchange administrator on a VM running Microsoft Exchange) self-serve their restore needs. First the application administrator submits a ticket for restore. Then the backup administrator will mount the VMDK files from backup using a temporary VM that starts up in a production ESX host. Even to restore a small object, you have to allocate resources for the entire VM (the marketing name for this multi-step restore is U-AIR) in the ESX host. As this VM needs to ‘run’ from backup storage, it is not recommended for the backup image to be on a deduplicated storage being served through network. As target deduplication devices are designed for streaming data sequentially, the random I/O pattern caused by running a VM from such storage is painfully slow. This is even stated by the partners who are offering deduplication storage for Veeam. HP did tests with Veeam using HP StoreOnce target deduplication appliance and have published a white paper on this, please see this whitepaper in Business Week . See the section on Performance Considerations.

It is to be further noted that only the most recent backup typically stays as a single image in Veeam’s reverse incremental backup strategy. If you are in an unfortunate need to restore from a copy that is not the most recent copy, the performance degrades further while running the temporary VM from backup storage as a lot of random I/O needs to happen at the back-end.

Even after somehow you patiently waited for VM to startup from backup storage, the application administrator needs to figure out how to restore the required objects. If the object is not there in the currently mounted backup image, he/she has to send another ticket to Veeam administrator to mount a different backup image on a temporary VM. This saga continues until the application administrator finds the correct object. What a pain!

There you have it. On one side you have scalability and backup performance issues if using Veeam’s deduplication. On the other side, you have poor recovery performance and usability issues when using a target deduplication appliance with Veeam. This is the deduplication dilemma!

The good news is that target deduplication devices work well with NetBackup and Backup Exec. Both these products provide user interfaces for application administrators so that they could self-serve their recovery needs. At the same time, VM backup and recovery remains agent-less. The V-Ray powered NetBackup and Backup Exec has the capability to stream the actual object from the backup; no need to mount it using a temporary VM.

18 thoughts on “Deduplication Dilemma: Veeam or Data Domain?

  • Pingback: Veeam Deduplication for dollar zero! Is it worth it?

  • November 16, 2011 at 1:58 am
    Permalink

    Reached your web blog through Delicious. You already know I am signing up to your rss feed.

    Reply
    • December 10, 2011 at 7:30 pm
      Permalink

      There’s a teirrfic amount of knowledge in this article!

      Reply
  • November 21, 2011 at 4:57 pm
    Permalink

    You recognize thus significantly on the subject of this matter, made me in my view believe it from so many various angles. Its like women and men are not fascinated unless it’s something to accomplish with Lady gaga! Your individual stuffs excellent. Always care for it up!

    Reply
    • December 10, 2011 at 7:47 pm
      Permalink

      Articles like these put the csnouemr in the driver seat-very important.

      Reply
  • December 10, 2011 at 9:08 am
    Permalink

    Definitely consider that that you said. Your favorite reason appeared to be on the internet the simplest factor to take note of. I say to you, I certainly get annoyed at the same time as people consider issues that they just don’t understand about. You controlled to hit the nail upon the highest and defined out the entire thing with no need side-effects , other people could take a signal. Will likely be back to get more. Thank you

    Reply
  • December 10, 2011 at 4:30 pm
    Permalink

    Woah this blog is excellent i love studying your posts. Keep up the good paintings! You know, many people are searching around for this info, you can help them greatly.

    Reply
  • December 10, 2011 at 7:32 pm
    Permalink

    And to think I was going to talk to someone in peorsn about this.

    Reply
  • December 11, 2011 at 1:31 am
    Permalink

    Normally I don’t learn article on blogs, but I would like to say that this write-up very pressured me to try and do it! Your writing style has been surprised me. Thank you, very great article.

    Reply
  • December 11, 2011 at 7:30 pm
    Permalink

    Pretty nice post. I simply stumbled upon your weblog and wished to mention that I have truly enjoyed surfing around your weblog posts. After all I’ll be subscribing on your feed and I am hoping you write once more very soon!

    Reply
  • December 14, 2011 at 12:37 am
    Permalink

    Great post. I was checking constantly this blog and I am impressed! Extremely helpful information specifically the last part 🙂 I care for such information much. I was looking for this particular information for a very long time. Thank you and best of luck.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *