The Big Hole in EMC Big Data backup story

It is one of the crucial roles for the marketing team in any organization to communicate the value of its products and services. It is not uncommon (pardon the double negative) for organizations to show the best side of its story while deliberately hiding the weaker aspects through fine prints. The left side of the picture below is the snapshot of breakfast cereal (General Mills’ Total) that came with my breakfast order in Sheraton while travelling on business.

EMC appears to have a Big Hole in its Big Data Backup
EMC appears to have a Big Hole in its Big Data Backup

Note that General Mills had claimed 100% of daily value of 11 vitamins and minerals but with an asterisk. The claim is true only if I consume 53g serving, but the box has only 33g!

Although I may have felt a bit taken back as a consumer, I enjoyed giving a bit of hard time to my General Mills friends and I moved on. This is a small transaction.

What if you were responsible for a transaction worth tens of thousands of dollars and were pitched a glass half-full story like this? It does happen. That General Mills cereal box is what came to my mind when I saw this blog from EMC on protecting Big Data (Teradata) workloads using EMC ‘Big Data backup solution’.

General Mills had the courtesy put the fine print that part of the vitamins and minerals are missing from its box. EMC’s blog didn’t really call out what was missing from its ‘box’ aka Data Domain device to protect Teradata workload using Teradata Data Stream Architecture. In fact it is missing the real brain of the solution: NetBackup!

First a little bit of history and some naked truth. Teradata had been working with NetBackup for over a decade to provide data protection for its workloads. In fact, Teradata sells the NetBackup Agent for Teradata for its customers. This agent pushes the data stream to NetBackup media servers. This is where the real workload aware intelligence (the real brain for this Big Data backup) is built. Once NetBackup media server receives the data stream it can store it on any supported storage: NetBackup Deduplication Pool, NetBackup Advanced Disk Pool, NetBackup OpenStorage Pool or even on a tape storage unit! When it comes to NetBackup OpenStorage Pool, it does not matter who the OpenStorage partner is; it can be EMC Data Domain, Quantum DXi,… The naked truth is that the backend devices are dumb storage devices from the view of NetBackup Agent for Teradata (the Teradata BAR component depicted in the blog).

EMC’s blog appears to have been designed to mislead the reader. It tends to imply that there is some sort of special sauce built natively into Data Domain (or Data Domain Boost) for Teradata BAR stream. The blog is trying to attach EMC to Big Data type workloads through marketing. May I say that the hole is quite big in EMC’s Big Data backup story!

I am speculating that EMC had been telling this story for a while in private engagements with clients. Note that the blog is simply displaying some of EMC’s slides that are marked ‘confidential’. The author forgot to remove it before publishing it. In closed meetings with joint customers of Teradata and NetBackup, a slide like this will create the illusion that Data Domain has something special for Teradata backup. Now the truth just leaked!

Data Domain: The TiVo in data deduplication market

It is that time of the year when Christmas shopping is in full swing. If you rewind time by just six to eight years, TiVo’ing was a verb used widely and my team was not an exception. TiVo was a must have gadget in the house. My manager once said, “it is not easy to find a technical gadget that my wife would love, but TiVo was something she could operate and enjoy without being coached.”

TiVo was the gold standard of digital video recorders. Consumers were willing to pay a premium for the box upfront and sign up for monthly fees to get ‘Tivo Service”, the data service using which TiVo populated programming schedules and tasks. These costs are in addition what consumers may be paying for Cable or Satellite services.

Cable and satellite operators came with built-in DVRs in their set top boxes. Still, TiVo used to be the star. It had 4.36 million subscribers in 2006. The technology and usability for DVRs from competitors was so poor that consumers continued to pay a premium to enjoy hassle free experience in TiVo.

Fast forward a few years from its peak in 2006, TiVo stumbled into identity crisis. The competition came not just from Cable and Satellite providers who matched the simplicity of TiVo in their all-in-one set top offerings, but it also came indirectly from streaming services (NetFlix, Amazon, Hulu), cheaper purpose built set top boxes for streaming (Apple TV, Roku) and multi-purpose devices (Wii, PlayStation 3, Xbox). Now TiVo is struggling to stay relevant. It is no more asking for upfront premium for the box, if you commit to a 2-year subscription the box is yours. It partners with competitors to bring their services into TiVo. If there was a market for ‘Digital Video Recorders’, now it is squeezed by players from adjacent markets.

Holiday Offer from TiVo.com, seen on November 27, 2012
Holiday Offer from TiVo.com, seen on November 27, 2013

 

Today Deduplication Storage is such a market where Data Domain used to be the TiVo. It was a powerful but yet simple device that an IT administrator could manage without reading its manual. It moved out tape as main backup storage medium once it started to integrate with market leading backup applications, especially with Symantec’s NetBackup through OpenStorage. EMC had to pay a fortune (and fight with NetApp) to acquire this technology, but it paid off, as Data Domain was the only cash cow in EMC’s Backup and Recovery Services division.

Data Domain could ask for a premium as other players in the market couldn’t match the technology and simplicity. But now… the tide is changing…

Direct competitors are getting their act together. HP matched Data Domain’s scale and performance and added high availability on top of it. Symantec launched integrated all-in-one appliances with content aware deduplication built-in. Most backup software vendors have deduplication available as a feature. Even standard file systems (Symantec’s VxFS, Microsoft’s NTFS, Oracle’s ZFS) are now including deduplication. Now “data deduplication as a market” is being squeezed by competitors and adjacent players.  Customers are less and less likely pay a premium for deduplication, as it is becoming a commodity.

TiVo managed to kill VHS tapes, which was the primary recording device for television shows. EMC touted Data Domain as a tape killer in backup industry. While disk based solutions have indeed limited the value of tape (now it is used primarily for long term retention), Data Domain as a standalone premium deduplication storage device may be extinct even before tape gives up its last breath. Time will tell.

 

 

 

 

Dear EMC Avamar, please stop leeching from enterprise vSphere environments

VMware introduced vStorage APIs for Data Protection (VADP) so that backup products can do centralized, efficient, off-host LAN free backup of vSphere virtual machines.

In the physical world, most systems have plenty of resources, often underutilized. Running backup agent in such a system wasn’t a primary concern for most workloads. The era of virtualization changed things drastically. Server consolidation via virtualization allowed organizations to get the most out of their hardware investment. That means backup agents do not have the luxury to simply take up resources from production workloads anymore as the underlying ESXi infrastructure is optimized and right-sized to get line of business applications running smoothly.

VMware solved the backup agent problem from the early days of ESX/ESXi hosts. The SAN transport method for virtual machine backup was born during the old VCB (VMware Consolidated Backup) days and further enhanced in VADP (vStorage APIs for Data Protection). The idea is simple. Let the snapshots of virtual machine be presented to a workhorse backup host and allow that system do the heavy lifting of processing and moving data to backup storage. The CPU, memory and I/O resources on ESX/ESXi hosts are not used during backups. Thus the production virtual machines are not starved for hypervisor resources during backups.

For non-SAN environments like NFS based datastores, the same dedicated host can use Network Block Device (NBD) transport to stream data through management network. Although it is not as efficient as SAN transport, it still offloaded most of the backup processing to the dedicated physical host.

Dedicating one or more workhorse backup systems to do backups was not practical for small business environments and remote offices. To accommodate that business need, VMware allowed virtual machines to act as backup proxy hosts for smaller deployments. This is how hotadd transport was introduced.

Thus your backup strategy is to use a dedicated physical workhorse backup system to offload all or part of backup processing using SAN or NBD transports. For really small environments, a virtual machine with NBD or hotadd transport would suffice.

Somehow EMC missed this memo. Ironically, EMC had been the proponent of running Avamar agent inside the guest instead of adopting VMware’s VADP. The argument was that the source side deduplication at Avamar agent minimizes the amount of data to be moved across the wire. While that is indeed true, EMC conveniently forgot to mention that CPU intensive deduplication within the backup agent would indeed leech ESXi resources away from production workloads!

Then EMC conceded and announced VADP support. But the saga continues. What EMC had provided is hotadd support for VADP. That means you allocate multiple proxy virtual machines even in the case of enterprise vSphere environments. Some of the best practice documents for Avamar suggest deploying a backup proxy host for every 20 virtual machines. Typical vSphere environment in an enterprise would have 1000 to 3000 virtual machines. That translates to 50 to 150 proxy hosts! These systems are literally the leach worms in vSphere environment draining resources that belong to production applications.

The giant tower of energy consuming nodes in Avamar grid is not even lifting a finger in processing backups! It is merely a storage system. The real workhorses are ESXi hosts giving in CPU, memory and I/O resources to Avamar proxy hosts to generate and deduplicate backup stream.

The story does not change even if you replace Avamar Datastore with a Data Domain device. In that case, the DD Boost agent running on Avamar proxy hosts are draining resources from ESXi to reduce data at source and send deduplicated data to Data Domain system.

EMC BRS should seriously look at the way Avamar proxy hosts with or without DD Boost are leaching resources from precious production workloads. The method used by Avamar is recommended only for SMB and remote office environments. Take the hint from VMware engineering as to why Avamar technology was borrowed to provide a solution for SMB customers in VMware Data Protection (VDP) product. You can’t chop a tree with a penknife!

The best example for effectively using VADP for enterprise vSphere is NetBackup 5220. EMC BRS could learn a lesson or two from how Symantec integrates with VMware in a much better way. This appliance is a complete backup system with intelligent deduplication and VADP support built right in for VMware backups.  This appliance does the heavy lifting so that production workloads are unaffected by backups.

How about recovery? For thick provisioned disks SAN transport is indeed the fastest. For thin provisioned disks, NBD performs much better. The good news on Symantec NetBackup 5220 is that the user could control the transport method for restores as well. You might have done the backup using SAN transport, however you can do the restore using NBD if you are restoring thin provisioned virtual machines. For Avamar, hot-add is the end-all for all approaches. NBD on a virtual proxy isn’t useful, hence using that is a moot point when the product offers just virtual machine proxy for VADP.

The question is…

Dear EMC Avamar, when will you offer an enterprise grade VADP based backup for your customers? They deserve enterprise grade protection for the investment they had done for large Avamar  Datastores and Data Domain devices.

 

 

What do NetApp ONTAP and Symantec NetBackup have in common?

A friend of mine forwarded this link to the interview SearchStorage.com recently did with Dave Hitz, one of the founders of NetApp. It is an interesting read and the major topic is the new clustering capabilities in OnTap 8. When he was asked about EMC’s Isilon, I found his response to hit a home run.

“If you look at features EMC can support, you end up with a complete list. If you break apart their architectures and look at the same feature list by architecture, you end up finding the main feature Isilon has is clustering, which is great. Unfortunately, it’s not in combination with the full suite of rich data management capabilities. That’s the No. 1 difference Ontap has — it’s the same Ontap that has all this cool stuff in it.” ,  said Dave Hitz. 

The context here is the fact that the foundational technology powering all storage systems from NetApp is ONTAP (with E-series being an outlier) and customers get the choice of footprint and features to match their workloads. EMC’s storage division, on the other hand, provides different products for overlapping set of workloads like VNX, VMAX, Isilion etc.

If you think about it, this response is applicable even when you look at other business units from EMC as well. My favorite is EMC’s Backup and Recovery Services (BRS) division. They have four different products; Avamar, Data Domain, NetWorker and HomeBase, pretty much serving the same market. If I were to fit Dave’s quote in the context of Backup and Recovery and use Symantec’s NetBackup as the competitor for EMC Backup, it would go something like this.

If you look at features EMC can support as a vendor for backup and recovery, you end up with a near-complete list. If you break apart their architectures and look at the same feature list by architecture, you end up finding that the main value Data Domain has is storage reduction at target with federation capabilities for limited application workloads. Avamar has full management capabilities but only for smaller workloads. NetWorker has decent long-term retention capabilities and track record but had been on life support. HomeBase provides Bare Metal Recovery. Unfortunately, none of these products are with a full suite of rich data management capabilities for end-to-end protection that can bring down capital and operational expenses in managing recovery points. That’s the No. 1 difference NetBackup has — it’s the same NetBackup that has all those cool stuff in one platform and a lot more innovations like managing snapshots, replicas, virtualized applications, backup acceleration etc. 

As always, the standard disclaimer applies here. This is just my opinion. Although I work for Symantec, the above statement should not be considered as the view of my employer.

 

Will EMC BRS kill Avamar or NetWorker?

EMC World 2012 has come and gone. For those watching the Backup and Recovery Services (BRS) division would notice a drastic shift in strategy since last year. Is Avamar counting its days?

Surprised? Let me explain. Remember the “Tape sucks! Move on!”  Campaign sung by BRS last year? They even mocked Google for recovering from tapes. They wanted the world to look at Avamar and Data Domain, the two products with spinning disks as the houses of backups. The other child NetWorker was mostly ignored and was on life support just to get by with the era of tapes.

BRS seems to have come to grip with the reality to some extent. The incremental updates to Avamar and revelation of NetWorker 8 features tend to indicate that BRS is taking a 180-degree turn.

No real updates for Avamar Data Store: All the announced business critical applications support in Avamar are for both Data Domain Boost and Avamar native client. Hyper-V that is popular among SMB workloads is now available through Boost to a Data Domain target. Last year, BRS’ announcement was that DD is for specific work loads and Avamar Data Store is for everything else. Now Boost is getting more attention and Avamar engine by itself pretty much stays the same.  The blackout windows in Avamar Data Store already annoy customers. Data Domain deduplication engine is preferred for target dedupe and DD Boost will replace source side deduplication eventually? Inspired by Symantec’s Dedupe Everywhere strategy?

Note: Thank to Ian’s comment on clarifying that newer application support is available for Avamar as well. Not just for Data Domain through DD Boost.

Emergence of Media Access Node: BRS realized that customers with longer retention requirements would not buy in on ‘keep it on disk’ message. Tape provides economies of scale. Modern tape technologies are superior in performance and reliability. Now, BRS ships a NetWorker node underneath the cover as Media Access Node in Avamar to copy rehydrated data into tape in NetWorker tape format.

NetWorker 8.0 getting some facelift: Although NetWorker was ignored in keynotes, BRS made a deliberate attempt this year to show what is happening to NetWorker. It was expecting the morgue but now pulled back and is getting revved up. There is a long road ahead to convince customers, but BRS says it is putting equal number of resources on NetWorker as was done on Avamar.  Not to mention about the newfound love, Spectralogic, to compete with IBM and Oracle.

If you pay closer attention, all that Avamar got is to make things better for Data Domain (Boost expansion, multi-stream support…) and NetWorker (data stored in NetWorker tape format). In a nutshell, BRS wants everyone to keep backup data on either Data Domain dedupe format or NetWorker tape format. Once NetWorker and Data Domain Boost combination can support backups through WAN, Avamar may not have anything to offer. From operating margin perspective, Avamar as a product may become a dog in BCG Growth-share matrix? The one eventually going to morgue looks to be Avamar Dedupe engine?

Not seeing your comments about this post? Please read this note. 

Deduplication Storage Pool Reliability: The devil is in the details

As you guys already know, I do travel a lot and attend trade shows where I represent Symantec. While I was briefing a visitor at Symantec booth on NetBackup 5020 appliance, he asked a question which was quite interesting. “We have requested RFPs from multiple vendors for deploying deduplication solution for backups. EMC sales team told us that Data Domain 800 series is better than NetBackup 5020 appliances in terms of reliability. They said that if one node in a multi-node NetBackup 5020 goes down, the entire deduplication pool goes down. What do you think about it?”

I thanked him for his question. I took a good 20 minutes to explain the situation. I thought it will be nice to document this in a blog for a fair comparison.

Let us compare configurations based on Data Domain 860 and NetBackup 5020. Let us say that the customer is looking to create 96TB of deduplication pool right now. He may need more storage in future.

With Data Domain 860, it would require four ES30 shelves (with 2TB drives) to create this capacity. Plus you need the 860 head unit.  With NetBackup 5020, you would need three nodes.

Implementing a 96TB deduplication pool

Implementing a 96TB deduplication pool

Thus, the EMC solution has a total of 5 components (1 head and 4 shelves). EMC’s 96TB deduplication pool will go down if any of the five components fail.

Symantec solution has a total of three components (3 NetBackup 5020 nodes). Symantec’s 96TB deduplication pool will go down if any of the three components fail.

Observation 1: EMC solution has more single points of failure than Symantec’s solution for a given capacity.

Let us dig deeper. Let us look at the components that actually store data, the storage modules.

Each Data Domain ES30 shelf will have 15 spindles: 12 data drives, 2 parity drives and 1 hot spare. Each shelf can withstand 3 concurrent drive failures.

Each NetBackup 5020 nodes have 22 spindles (not counting the two drives in RAID1 for system disk): 18 data drives, 2 parity drives and 2 hot spares. This configuration can withstand four concurrent drive failures.

Both systems use SATA drives. The theoretical1 annualized failure rate (AFR) for a SATA drive is approximately 1.46%. Robin Harris’ StorageMojo2 blog has some great information on a study done by Google. He quotes the idea of calculated AFR to be 2.88%

Since we are actually comparing the overall storage modules (ES30 storage shelf vs. NetBackup 5020 storage shelf), let us not worry about the absolute value of AFR of a disk drive. For our discussion, let us assume that both Symantec and Data Domain are buying disks from the same manufacturer. Let the AFR be 3% to simplify probability calculations.

An AFR of 3% indicates that the probability of a SATA drive to fail within a year is 3/100.

In case of Data Domain 860 with ES30 shelves, you will lose data if more than 3 drives fail in a year and failed drives were not replaced. The probability of four drives failing in a year can be calculated using conditional probability3. The value is (3/100)4 = 0.000081%

In case of a NetBackup 5020 node, you will lose data if more than 4 drives fail in a year and were not replaced. The probability here is (3/100)5 = 0.00000243%

Note the probability of data loss is low in both cases even if you don’t replace the failed drives for a year. This is why RAID6 and hot spare play a significant role in delivering storage reliability. That is the main point I want to make here. However the probability of losing data on ES30 shelf is 33 times higher than the probability of losing data in NetBackup 5020! The reason here is the extra hot spare that you have in NetBackup 5020 node that provides additional protection.

Observation 2: From storage module perspective, although the absolute probability of losing data is quite low for both EMC and Symantec solutions, the relative probability of losing data on EMC’s ES30 shelf is 33 times higher than that in NetBackup 5020 if drives have identical AFR.

So don’t you disagree with what EMC sales rep has reportedly told about NetBackup 5020 appliances? The devil is always in the details, isn’t it?

Disclaimer: As I had already stated in About Me page in MrVRay.com, the thoughts expressed here are my own. My employer or school has not endorsed/supported any of the content in this blog. If there are errors in this post, contact me at @AbdulRasheed127 on Twitter and I will be happy to correct it. I am not entertaining comments until I invest in a good spam blocker, sorry for the inconvenience 🙁

References:

  1. Annualized Failure Rate (AFR) and Mean Time between Failures (MTBF) in: Seagate Barracuda ES SATA Product Manual, Page 29, Chapter 2.12: Reliability
  2. Robin Harris. Google’s Disk Failure Experience
  3. Conditional Probability: P(AB) = P(A)*P(B|A)

If A and B are independent outcomes, P(B|A) = P(B)

In which case, P(AB) = P(A) * P(B)