EMC or HP: Who is stretching the truth on deduplication system performance?

EMC proudly announced the availability of Data Domain 990 during EMC World 2012 on May 21st. The claim in the news release was that the system could backup up to 248 TB in 8-hour backup window with 31 TB/hr throughput. Further, it claimed that it is 6x faster than closer competitor.

The pride was shattered within 2 weeks. Even Kardashion’s marriage lasted longer than the claim. HP announced that it could protect up 100 TB/hr using its StoreOnce family of products. EMC looked at it with tears and finally responded as given here

EMC said HP’s decision was “puzzling”, and argued the comparison was not fair because HP’s claim was for four hardware systems working on four storage pools compared to EMC’s figures for one system and one pool. Deduplication, which removes copies of data from storage to improve usage, only works within pools of data.

Now is time for a reality check.

Number of systems involved in deduplication processing: EMC’s claim is that Data Domain 990 is a single head unit while HP StoreOnce B6200 is a multi-node system. From the first look, it sounds like a legitimate argument; but the reality is that EMC has no reason to shed crocodile’s tears about this. Here is why.

The 31 TB/hr rate for Data Domain 990 is coming from Data Domain Boost, the software component that offloads most of the processor-intensive deduplication processing to backup servers and/or application servers. The unit by itself is not doing all the work. The story is not different for HP B6200 either; it is making use of StoreOnce Catalyst software, which does similar to what Data Domain Boost does for Data Domain 990.

The absolute number of processing heads shouldn’t matter in this case as the actual performance numbers are skewed on account of distributed processing. I would even give credit to HP, as their solution is highly available with two nodes serving one storage pool. Backups are the last line of defense in an enterprise. High Availability brings additional customer value.

Number of name spaces: Single name space provides deduplication across all the workload ingested into the storage pool. Data Domain 990 is a single name space device with one processing head. You buy HP B6200 in the form of two nodes and storage known as couplets.  It is not crystal clear from HP’s documentation whether multiple couplets can share the same name space or they use dedicated name spaces. I am giving the benefit of doubt that EMC did the research and made the statement on this. Some of the defensive comments HP did after EMC’s reaction tend to indicate the HP stretched the truth a little here.

HP marketing veep Craig Nunes says an 8-node B6200 is a single system because it is managed as one and has a single namespace. The single namespace is segmented into four individual namespaces, one per couplet, and, he says, “next year I could do a firmware update and change that”.

So, I am inclined to support EMC from this point unless someone can confirm from HP’s documentation that a four-couplet unit uses a single name space.

Truth in comparisons: 

EMC’s claim: 6x faster than closer competitor. HP’s claim: 3 times faster (backups) than closest competitor

The statements won’t actually tell you how ‘closer/closest’ competitor is decided. EMC is defining closer competition based on IDC’s report on market share on Purpose-Built Backup Appliances (PBBA) and they are referring to IBM. They selected to compare IBM because they have the poorest number. The other vendors in the list with– HP at 25 TB/hr without Catalyst and Symantec at 23.7 TB/hr for its NetBackup 5220– have solutions superior to IBM! EMC cannot even claim 2x (let alone 6x) if the closest comparison was based on performance itself.

HP defined closest competitor in terms of the actual performance. They compared against EMC’s 31 TB/hr to make the 3 times faster claim with 100 TB/hr.

Verdict: Always ask questions on metrics! It is easy to make a claim while staying vague on details.

Not seeing your comments on this post? Please read this note.

10 thoughts on “EMC or HP: Who is stretching the truth on deduplication system performance?

  • June 8, 2012 at 12:42 am
    Permalink

    There is categorically no lie in the HP claims. HP StoreOnce B6200 is part of HP’s Converged Storage strategy and thus is built on a scale-out architecture, unlike other systems in this arena. Global Deduplication is a distinct concept from a single system or a single namespace. The StoreOnce B6200 is a single logical system built on an underlying clustering engine that is capable of scaling to over 1000 nodes and a 16PB single namespace. The current configuration is split into 4 couplets to deliver High-Availability and exposed as 4 different pools in order to balance ingest performance and deduplication efficiency. HP contends that equating global deduplication technology with a either a single system or a single namespace definition is the underlying confusion.

    Reply
    • June 9, 2012 at 4:24 pm
      Permalink

      Appreciate the note, Craig. Among the three things I listed above (number of systems, number of name spaces and truth in comparisons), I am couldn’t support HP on number of name spaces. From your comment it looks like the underlying engine has the potential to offer a 16 PB single name space. That is indeed exciting. But what about the current offering? If the solution provides 4 different name spaces, adding the TB/hr for each pool is not a fair comparison to DD990 truly from absolute performance perspective. I would certainly say that HP provides customer value by offering HA and single-pane-glass management

      Reply
  • June 10, 2012 at 5:41 pm
    Permalink

    Everyone is chasing dedupe speeds, after 20TB/HR ingest its almost pointless. Being able to pull data that fast during backups these days is nearly impossible. They need to concentrate on restore speeds and functionality. It’s hard to overrun a dedupe appliance these days.

    Show me the features now.

    Reply
    • July 2, 2012 at 7:49 pm
      Permalink

      I couldn’t agree more! Appreciate your insight!

      Reply
      • October 25, 2015 at 1:13 am
        Permalink

        Hi Sarah,

        I believe the earlier comment was about restore speeds being important. During backup, 100TB/hr from HP (or 33TB/hr number provided by EMC) is all about effective backup throughput at a specific dedupe level. When it comes to restore, you would be moving the entire data set, especially in the case of a full recovery.

        Let me know if I misunderstood the question.

        Warm regards,

        Rasheed

        Reply
  • June 15, 2012 at 6:38 pm
    Permalink

    Hi Sean, You’ve missed the point. Both EMC Boost with DD990 and HP Catalyst with B6200 deduplicate before dragging the data across the network – so both solve the very issue you are highlighting.

    My understanding is that the B6200 has a single namespace but splits the dedupe by couplet for performance reasons and also because a VM doesnt dedupe well against your spreadsheets anyway. Granted you CANT stick the dedupe in to one giant multi-peta-byte-of-user-data pool to dedupe everything against everything else….but that’s not a particularly sensible thing to do in the first place.

    HP’s argument is that a system is what you buy, deploy and manage. You buy a B6200 as a system, install it and manage it all as a single system – ergo it’s a system and that system, with catalyst, set a benchmark 100TB/hr.

    They dont seem to be claiming it is the same architecture as EMC (EMC = scaleup / HP = scaleout) but it’s still a valid comparison. All customers shoudl care about is how quickly it backs up (or more importantly restores!

    Reply
  • June 15, 2012 at 7:01 pm
    Permalink

    Further to that it would be interesting to see how EMC measures isilon performance – as it’s a scale out architecture – versus a scale up system….

    I assume they take one individual “node” vs an entire system then 🙂

    Reply
    • July 2, 2012 at 7:45 pm
      Permalink

      Hi Neil,

      In my humble opinion, both Sean and you are coming to the same point. Symantec deduplication provides dedupe everywhere with the flexibility to change the point where deduplication occurs anytime without affecting the advantages from previous backups. Data Domain was inspired by this and introduced a partial solution called Boost (the name was originally coined from the expression Bandwidth Optimized OST, where OST is Symantec’s OpenStorage). I am glad to see Quantum (Accent) and HP (Catalyst) moving to similar strategies. Now the real ‘benchmark’ is needed for the cost (not just the CapEx in terms of how much it costs to deploy, but also the OpEx in terms of how much less production resources — CPU, memory and I/O from the production system actually doing the deduplication, network bandwidth used, application awareness, administration overhead etc — as well) involved in this data deduplication.

      I wrote this blog only because the announcements in both EMC World and HP Discover highlighted the benchmark truly from TB/hr protection. But there is much more than this mere number that an organization need to look at prior to investing in a solution.

      RE: Isilon performance

      That is a good one Neil! Great point. EMC should ask BRS and Storage division to sync up 🙂

      Warm regards,

      Rasheed

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *