Recently I came across a blog post from Szamos Attila. He ran a deduplication contest between Data Domain and Veeam. His was a very small test environment, just 12 virtual machines with 133GB of data. His observations were significant. I thought I share it here.

You can read more about Mr. Attila’s tests at his blog here
What does this tell you right of the bat? Well, Veeam’s deduplication is slow; not a big deal as they do not charge for deduplication separately. Not a big deal, right?
Not exactly; there is much more to this story if you take a look at the big picture.
First of all, note that this is a very small data set (just 133GB, even my laptop has more data!). Veeam’s deduplication is not really a true deduplication engine that fingerprints data segments and stores only one copy. It is basically a data reduction technique that works only on a predefined set of backup files. Veeam refers to this set of backup files as a backup repository. You can only run one backup job to a given repository at any given time. Hence if you want to backup two virtual machines concurrently, you need to send them to two different backup repositories. If you do that your backup data is not deduplicated across those two jobs. Thus your data reduction strategy using Veeam’s deduplication and concurrent processing of jobs are inversely proportional to one another. This is a major drawback as VMs generally contain a lot of redundant data. In fact, Veeam recommends to run deduplication mainly for a given backup set where all the VMs come from the same template.
Secondly, note that even with a single backup repository; this tiny data set (of just 133GB) took twice as long as Data Domain’s deduplication. Now think about a small business environment with a few terabytes of data. Imagine the time it would take to protect that data. When it comes to an enterprise data center (100s of terabytes); you must depend on a target based deduplication solution like Data Domain to get the job done.
So, can I simply let Veeam do the data movement and count on Data Domain do the deduplication? That is one way to solve this problem. But you have a multitude of other issues with that approach because of the way Veeam does restores.
Veeam does not have a good way to let application administrators in guest operating system (e.g. Exchange administrator on a VM running Microsoft Exchange) self-serve their restore needs. First the application administrator submits a ticket for restore. Then the backup administrator will mount the VMDK files from backup using a temporary VM that starts up in a production ESX host. Even to restore a small object, you have to allocate resources for the entire VM (the marketing name for this multi-step restore is U-AIR) in the ESX host. As this VM needs to ‘run’ from backup storage, it is not recommended for the backup image to be on a deduplicated storage being served through network. As target deduplication devices are designed for streaming data sequentially, the random I/O pattern caused by running a VM from such storage is painfully slow. This is even stated by the partners who are offering deduplication storage for Veeam. HP did tests with Veeam using HP StoreOnce target deduplication appliance and have published a white paper on this, please see this whitepaper in Business Week . See the section on Performance Considerations.
It is to be further noted that only the most recent backup typically stays as a single image in Veeam’s reverse incremental backup strategy. If you are in an unfortunate need to restore from a copy that is not the most recent copy, the performance degrades further while running the temporary VM from backup storage as a lot of random I/O needs to happen at the back-end.
Even after somehow you patiently waited for VM to startup from backup storage, the application administrator needs to figure out how to restore the required objects. If the object is not there in the currently mounted backup image, he/she has to send another ticket to Veeam administrator to mount a different backup image on a temporary VM. This saga continues until the application administrator finds the correct object. What a pain!
There you have it. On one side you have scalability and backup performance issues if using Veeam’s deduplication. On the other side, you have poor recovery performance and usability issues when using a target deduplication appliance with Veeam. This is the deduplication dilemma!
The good news is that target deduplication devices work well with NetBackup and Backup Exec. Both these products provide user interfaces for application administrators so that they could self-serve their recovery needs. At the same time, VM backup and recovery remains agent-less. The V-Ray powered NetBackup and Backup Exec has the capability to stream the actual object from the backup; no need to mount it using a temporary VM.
Pingback: Veeam Deduplication for dollar zero! Is it worth it?
Frankly I think that’s absoltuley good stuff.
Reached your web blog through Delicious. You already know I am signing up to your rss feed.
Now we know who the sniesble one is here. Great post!
Shoot, so that’s that one suppesos.
I am so much excited after reading your blog. Your blog is very much innovative and much helpful.
My blog is about Canon EOS 7D Review.
There’s a teirrfic amount of knowledge in this article!
You recognize thus significantly on the subject of this matter, made me in my view believe it from so many various angles. Its like women and men are not fascinated unless it’s something to accomplish with Lady gaga! Your individual stuffs excellent. Always care for it up!
Cheers pal. I do appreciate the wirting.
Out of the ordinary information. Credit on behalf of the info!
Heart Healthy Diet
Articles like these put the csnouemr in the driver seat-very important.
Definitely consider that that you said. Your favorite reason appeared to be on the internet the simplest factor to take note of. I say to you, I certainly get annoyed at the same time as people consider issues that they just don’t understand about. You controlled to hit the nail upon the highest and defined out the entire thing with no need side-effects , other people could take a signal. Will likely be back to get more. Thank you
Woah this blog is excellent i love studying your posts. Keep up the good paintings! You know, many people are searching around for this info, you can help them greatly.
Ah yes, nicely put, evyernoe.
And to think I was going to talk to someone in peorsn about this.
Normally I don’t learn article on blogs, but I would like to say that this write-up very pressured me to try and do it! Your writing style has been surprised me. Thank you, very great article.
Pretty nice post. I simply stumbled upon your weblog and wished to mention that I have truly enjoyed surfing around your weblog posts. After all I’ll be subscribing on your feed and I am hoping you write once more very soon!
Great post. I was checking constantly this blog and I am impressed! Extremely helpful information specifically the last part 🙂 I care for such information much. I was looking for this particular information for a very long time. Thank you and best of luck.