EMC or HP: Who is stretching the truth on deduplication system performance?

EMC proudly announced the availability of Data Domain 990 during EMC World 2012 on May 21st. The claim in the news release was that the system could backup up to 248 TB in 8-hour backup window with 31 TB/hr throughput. Further, it claimed that it is 6x faster than closer competitor.

The pride was shattered within 2 weeks. Even Kardashion’s marriage lasted longer than the claim. HP announced that it could protect up 100 TB/hr using its StoreOnce family of products. EMC looked at it with tears and finally responded as given here

EMC said HP’s decision was “puzzling”, and argued the comparison was not fair because HP’s claim was for four hardware systems working on four storage pools compared to EMC’s figures for one system and one pool. Deduplication, which removes copies of data from storage to improve usage, only works within pools of data.

Now is time for a reality check.

Number of systems involved in deduplication processing: EMC’s claim is that Data Domain 990 is a single head unit while HP StoreOnce B6200 is a multi-node system. From the first look, it sounds like a legitimate argument; but the reality is that EMC has no reason to shed crocodile’s tears about this. Here is why.

The 31 TB/hr rate for Data Domain 990 is coming from Data Domain Boost, the software component that offloads most of the processor-intensive deduplication processing to backup servers and/or application servers. The unit by itself is not doing all the work. The story is not different for HP B6200 either; it is making use of StoreOnce Catalyst software, which does similar to what Data Domain Boost does for Data Domain 990.

The absolute number of processing heads shouldn’t matter in this case as the actual performance numbers are skewed on account of distributed processing. I would even give credit to HP, as their solution is highly available with two nodes serving one storage pool. Backups are the last line of defense in an enterprise. High Availability brings additional customer value.

Number of name spaces: Single name space provides deduplication across all the workload ingested into the storage pool. Data Domain 990 is a single name space device with one processing head. You buy HP B6200 in the form of two nodes and storage known as couplets.  It is not crystal clear from HP’s documentation whether multiple couplets can share the same name space or they use dedicated name spaces. I am giving the benefit of doubt that EMC did the research and made the statement on this. Some of the defensive comments HP did after EMC’s reaction tend to indicate the HP stretched the truth a little here.

HP marketing veep Craig Nunes says an 8-node B6200 is a single system because it is managed as one and has a single namespace. The single namespace is segmented into four individual namespaces, one per couplet, and, he says, “next year I could do a firmware update and change that”.

So, I am inclined to support EMC from this point unless someone can confirm from HP’s documentation that a four-couplet unit uses a single name space.

Truth in comparisons: 

EMC’s claim: 6x faster than closer competitor. HP’s claim: 3 times faster (backups) than closest competitor

The statements won’t actually tell you how ‘closer/closest’ competitor is decided. EMC is defining closer competition based on IDC’s report on market share on Purpose-Built Backup Appliances (PBBA) and they are referring to IBM. They selected to compare IBM because they have the poorest number. The other vendors in the list with– HP at 25 TB/hr without Catalyst and Symantec at 23.7 TB/hr for its NetBackup 5220– have solutions superior to IBM! EMC cannot even claim 2x (let alone 6x) if the closest comparison was based on performance itself.

HP defined closest competitor in terms of the actual performance. They compared against EMC’s 31 TB/hr to make the 3 times faster claim with 100 TB/hr.

Verdict: Always ask questions on metrics! It is easy to make a claim while staying vague on details.

Not seeing your comments on this post? Please read this note.