VMware announces vSphere Data Protection (VDP), what is in it for you?

vSphere Data Protection (VDP) is VMware’s new virtual backup appliance for SMB available in VMware vSphere 5.1. It replaces the older VMware Data Recovery (vDR) product. There had been a number of confusions around this announcement; partly due the way EMC, VMware’s parent company, made some press releases.

Is VDP the same as EMC’s Avamar Virtual Edition (Avamar VE)?

No, it is not. VDP is a product from VMware. The only technology VMware had used from Avamar is its deduplication engine. The older vDR had limited dedupe capabilities as it was mainly coming from change block tracking (CBT) in vStorage APIs for Data Protection (VADP). With Avamar’s technology, VDP now provides variable block based deduplication.

I heard that I can upgrade from VDP to EMC Avamar if I need to grow beyond 2TB, is that true?

No, VDP is not a ‘lite’ version of Avamar. It is a different product altogether.

What are my options if I need to grow beyond 2TB?

You could add additional VDP appliances. Up to 10 VDP appliances are supported under one vCenter server. However, these are separate islands of storage. These appliances do not provide global deduplication among these storage pools.

Having said that it is more likely for you to hit other limitations in VDP before hitting the 2TB limit. Note that Avamar based deduplication engine is suitable only for SMBs who could afford to have black out windows and maintenance windows in their backup solution. These are the periods of time where the house keeping work is being done by dedupe engine.  The system is not available for running backup jobs.

Only 8 virtual machines can be backed up concurrently that might increase backup windows. There is no SAN transport capability to offload production ESXi hosts from backup tasks. There is no good way to make additional copies for redundancy or extended retention like replication to remote location or cloud. VMware has made it clear that VDP is truly for SMBs and encourages customers to look at enterprise class backup solutions from partners for larger environments.

Why would EMC let VMware use its Avamar technology at no additional cost to customers? Is EMC trying to promote its products?

Just like how Windows/UNIX/Linux operating environments provide basic utilities for backups, VMware had always provided basic backup solution with its offerings. In the days of ESX service console, the Linux based console provided tools like tar and cpio. With ESXi where service console is no more, vDR was brought to the table. vDR had its limitations. Now the choice is to innovate vDR or license a relatively mature technology. As parent company has a solution, VMware went the route of taking Avamar dedupe engine for storage and build its own capabilities for scheduling backups and managing recovery points.

EMC’s Avamar is a popular product in small environments. Although EMC had been trying hard to make Avamar enterprise ready, its deduplication engine has significant limitations. It requires blackout and maintenance windows. With larger capacities, the duration of these windows also increases. With the acquisition of Data Domain, EMC is now focusing more on using its DD Boost technology for distributing the deduplication workload. In fact, EMC recommends the use of Data Domain Boost with Avamar (instead of using Avamar’s dedupe engine) for larger workloads. I believe it was a good decision to support VMware’s SMB market with a technology that was meant for SMB in the first place. I think Avamar dedupe engine is counting its days as a technology that can make money. See my earlier blog on EMC’s backup portfolio.

Stay tuned. More on VDP coming soon!

Client Direct: NetBackup vs. NetWorker

NetBackup introduced Client Direct capability a few years back with NetBackup 7.0 release. This is a break-through innovation in backup infrastructure architecture. Traditionally backup is a process where data is read from production client, transmitted over wire in its entirety to a backup server and then written to storage. The emergence of target dedupe appliances behind a backup server meant that backup can now take three hops through network. It hops from client to backup server first, then it hops from backup server to deduplication appliance. NetBackup changed this game. NetBackup client can dedupe backup stream at source and send deduplicated data directly to NetBackup’s deduplication pool, for example a NetBackup 5020 deduplication appliance, as illustrated below.

NetBackup Client Direct

This architecture is possible in NetBackup because it has several innovations that reduce the impact of running deduplication at the production client.

  1. NetBackup Accelerator: This technology features a platform independent track log that intelligently detects changed files without the need for enumerating the entire file system. Then it optimally synthesizes a full backup image at the storage. The result: Full backups can be run using the resources needed to run an incremental backup.
  2. NetBackup Client Side Deduplication Cache: This enables the production client to run deduplication by comparing the generated fingerprints for the chucks in the changed file (detected intelligently as explained in 1) against the previous backup set without shipping the fingerprint to storage for comparison. The result: Superior federated deduplication without the slow chatter across network.
  3. Intelligent Hybrid Chunking that is not CPU bound: Deduplication chunking is done typically using variable block method or fixed block method. The first one is CPU intensive and the second one is less efficient in data reduction rate. NetBackup uses the best of both worlds by using intelligent hybrid chunking. As deduplication-fingerprinting logic is built into the client, it can start the chunking exactly after identifying the object boundaries. Thus you get the advantage of not being CPU bound while also not suffering from low deduplication rate.

Reducing of impact on production client’s resources, reducing the impact on production network, reducing the number of hops and reducing the impact on backup server (translation, increased scalability) make NetBackup Client Direct a unique feature. The popularity of this feature had made ‘Client Direct’ a common innovation name that appears in RFPs for backup solutions.

The pressure is causing other backup vendors to come up with ‘Client Direct’. EMC announced last week that NetWorker 8.0 will have this capability and even named it ‘Client Direct’ so that the checkboxes in RFP can be ticked. A closer look reveals that NetWorker Client Direct is suitable for checkbox in RFP, but really not ready for primetime as is.

  1. NetWorker Client has no intelligent detection of changed files. NetWorker also does not have any sort of optimized synthetics. The result: Running full backups with NetWorker Client Direct will use significant amount of processing power from production clients.
  2. The NetWorker client and does not federate deduplication; it is done by DD Boost. As these two are essentially unaware of each other’s format, there is no way to cache fingerprints of the chunks from previous backups. That means excessive chitchat with the target Data Domain device during backups.
  3. DD Boost is the process of offloading some of the Data Domain deduplication processing to other systems. In this case, the production clients are taking that load. As clearly documented in Data Domain SISL architecture “SISL takes the pressure off of disk accesses as a bottleneck so that the system relies on the speed of the CPU to deliver inline deduplication performance”. Translation: CPU bound chunking. When this is offloaded to production clients, it can severely affect the performance of production systems with large backup workloads.

Even though EMC can mark the checkboxes in RFPs; their specialists are less likely to encourage POCs with NetWorker Client Direct. In a neck-to-neck battle, it appears that NetWorker has a long road ahead to match NetBackup Client Direct.

What do NetApp ONTAP and Symantec NetBackup have in common?

A friend of mine forwarded this link to the interview SearchStorage.com recently did with Dave Hitz, one of the founders of NetApp. It is an interesting read and the major topic is the new clustering capabilities in OnTap 8. When he was asked about EMC’s Isilon, I found his response to hit a home run.

“If you look at features EMC can support, you end up with a complete list. If you break apart their architectures and look at the same feature list by architecture, you end up finding the main feature Isilon has is clustering, which is great. Unfortunately, it’s not in combination with the full suite of rich data management capabilities. That’s the No. 1 difference Ontap has — it’s the same Ontap that has all this cool stuff in it.” ,  said Dave Hitz. 

The context here is the fact that the foundational technology powering all storage systems from NetApp is ONTAP (with E-series being an outlier) and customers get the choice of footprint and features to match their workloads. EMC’s storage division, on the other hand, provides different products for overlapping set of workloads like VNX, VMAX, Isilion etc.

If you think about it, this response is applicable even when you look at other business units from EMC as well. My favorite is EMC’s Backup and Recovery Services (BRS) division. They have four different products; Avamar, Data Domain, NetWorker and HomeBase, pretty much serving the same market. If I were to fit Dave’s quote in the context of Backup and Recovery and use Symantec’s NetBackup as the competitor for EMC Backup, it would go something like this.

If you look at features EMC can support as a vendor for backup and recovery, you end up with a near-complete list. If you break apart their architectures and look at the same feature list by architecture, you end up finding that the main value Data Domain has is storage reduction at target with federation capabilities for limited application workloads. Avamar has full management capabilities but only for smaller workloads. NetWorker has decent long-term retention capabilities and track record but had been on life support. HomeBase provides Bare Metal Recovery. Unfortunately, none of these products are with a full suite of rich data management capabilities for end-to-end protection that can bring down capital and operational expenses in managing recovery points. That’s the No. 1 difference NetBackup has — it’s the same NetBackup that has all those cool stuff in one platform and a lot more innovations like managing snapshots, replicas, virtualized applications, backup acceleration etc. 

As always, the standard disclaimer applies here. This is just my opinion. Although I work for Symantec, the above statement should not be considered as the view of my employer.

 

EMC or HP: Who is stretching the truth on deduplication system performance?

EMC proudly announced the availability of Data Domain 990 during EMC World 2012 on May 21st. The claim in the news release was that the system could backup up to 248 TB in 8-hour backup window with 31 TB/hr throughput. Further, it claimed that it is 6x faster than closer competitor.

The pride was shattered within 2 weeks. Even Kardashion’s marriage lasted longer than the claim. HP announced that it could protect up 100 TB/hr using its StoreOnce family of products. EMC looked at it with tears and finally responded as given here

EMC said HP’s decision was “puzzling”, and argued the comparison was not fair because HP’s claim was for four hardware systems working on four storage pools compared to EMC’s figures for one system and one pool. Deduplication, which removes copies of data from storage to improve usage, only works within pools of data.

Now is time for a reality check.

Number of systems involved in deduplication processing: EMC’s claim is that Data Domain 990 is a single head unit while HP StoreOnce B6200 is a multi-node system. From the first look, it sounds like a legitimate argument; but the reality is that EMC has no reason to shed crocodile’s tears about this. Here is why.

The 31 TB/hr rate for Data Domain 990 is coming from Data Domain Boost, the software component that offloads most of the processor-intensive deduplication processing to backup servers and/or application servers. The unit by itself is not doing all the work. The story is not different for HP B6200 either; it is making use of StoreOnce Catalyst software, which does similar to what Data Domain Boost does for Data Domain 990.

The absolute number of processing heads shouldn’t matter in this case as the actual performance numbers are skewed on account of distributed processing. I would even give credit to HP, as their solution is highly available with two nodes serving one storage pool. Backups are the last line of defense in an enterprise. High Availability brings additional customer value.

Number of name spaces: Single name space provides deduplication across all the workload ingested into the storage pool. Data Domain 990 is a single name space device with one processing head. You buy HP B6200 in the form of two nodes and storage known as couplets.  It is not crystal clear from HP’s documentation whether multiple couplets can share the same name space or they use dedicated name spaces. I am giving the benefit of doubt that EMC did the research and made the statement on this. Some of the defensive comments HP did after EMC’s reaction tend to indicate the HP stretched the truth a little here.

HP marketing veep Craig Nunes says an 8-node B6200 is a single system because it is managed as one and has a single namespace. The single namespace is segmented into four individual namespaces, one per couplet, and, he says, “next year I could do a firmware update and change that”.

So, I am inclined to support EMC from this point unless someone can confirm from HP’s documentation that a four-couplet unit uses a single name space.

Truth in comparisons: 

EMC’s claim: 6x faster than closer competitor. HP’s claim: 3 times faster (backups) than closest competitor

The statements won’t actually tell you how ‘closer/closest’ competitor is decided. EMC is defining closer competition based on IDC’s report on market share on Purpose-Built Backup Appliances (PBBA) and they are referring to IBM. They selected to compare IBM because they have the poorest number. The other vendors in the list with– HP at 25 TB/hr without Catalyst and Symantec at 23.7 TB/hr for its NetBackup 5220– have solutions superior to IBM! EMC cannot even claim 2x (let alone 6x) if the closest comparison was based on performance itself.

HP defined closest competitor in terms of the actual performance. They compared against EMC’s 31 TB/hr to make the 3 times faster claim with 100 TB/hr.

Verdict: Always ask questions on metrics! It is easy to make a claim while staying vague on details.

Not seeing your comments on this post? Please read this note.

Will EMC BRS kill Avamar or NetWorker?

EMC World 2012 has come and gone. For those watching the Backup and Recovery Services (BRS) division would notice a drastic shift in strategy since last year. Is Avamar counting its days?

Surprised? Let me explain. Remember the “Tape sucks! Move on!”  Campaign sung by BRS last year? They even mocked Google for recovering from tapes. They wanted the world to look at Avamar and Data Domain, the two products with spinning disks as the houses of backups. The other child NetWorker was mostly ignored and was on life support just to get by with the era of tapes.

BRS seems to have come to grip with the reality to some extent. The incremental updates to Avamar and revelation of NetWorker 8 features tend to indicate that BRS is taking a 180-degree turn.

No real updates for Avamar Data Store: All the announced business critical applications support in Avamar are for both Data Domain Boost and Avamar native client. Hyper-V that is popular among SMB workloads is now available through Boost to a Data Domain target. Last year, BRS’ announcement was that DD is for specific work loads and Avamar Data Store is for everything else. Now Boost is getting more attention and Avamar engine by itself pretty much stays the same.  The blackout windows in Avamar Data Store already annoy customers. Data Domain deduplication engine is preferred for target dedupe and DD Boost will replace source side deduplication eventually? Inspired by Symantec’s Dedupe Everywhere strategy?

Note: Thank to Ian’s comment on clarifying that newer application support is available for Avamar as well. Not just for Data Domain through DD Boost.

Emergence of Media Access Node: BRS realized that customers with longer retention requirements would not buy in on ‘keep it on disk’ message. Tape provides economies of scale. Modern tape technologies are superior in performance and reliability. Now, BRS ships a NetWorker node underneath the cover as Media Access Node in Avamar to copy rehydrated data into tape in NetWorker tape format.

NetWorker 8.0 getting some facelift: Although NetWorker was ignored in keynotes, BRS made a deliberate attempt this year to show what is happening to NetWorker. It was expecting the morgue but now pulled back and is getting revved up. There is a long road ahead to convince customers, but BRS says it is putting equal number of resources on NetWorker as was done on Avamar.  Not to mention about the newfound love, Spectralogic, to compete with IBM and Oracle.

If you pay closer attention, all that Avamar got is to make things better for Data Domain (Boost expansion, multi-stream support…) and NetWorker (data stored in NetWorker tape format). In a nutshell, BRS wants everyone to keep backup data on either Data Domain dedupe format or NetWorker tape format. Once NetWorker and Data Domain Boost combination can support backups through WAN, Avamar may not have anything to offer. From operating margin perspective, Avamar as a product may become a dog in BCG Growth-share matrix? The one eventually going to morgue looks to be Avamar Dedupe engine?

Not seeing your comments about this post? Please read this note. 

Deduplication Storage Pool Reliability: The devil is in the details

As you guys already know, I do travel a lot and attend trade shows where I represent Symantec. While I was briefing a visitor at Symantec booth on NetBackup 5020 appliance, he asked a question which was quite interesting. “We have requested RFPs from multiple vendors for deploying deduplication solution for backups. EMC sales team told us that Data Domain 800 series is better than NetBackup 5020 appliances in terms of reliability. They said that if one node in a multi-node NetBackup 5020 goes down, the entire deduplication pool goes down. What do you think about it?”

I thanked him for his question. I took a good 20 minutes to explain the situation. I thought it will be nice to document this in a blog for a fair comparison.

Let us compare configurations based on Data Domain 860 and NetBackup 5020. Let us say that the customer is looking to create 96TB of deduplication pool right now. He may need more storage in future.

With Data Domain 860, it would require four ES30 shelves (with 2TB drives) to create this capacity. Plus you need the 860 head unit.  With NetBackup 5020, you would need three nodes.

Implementing a 96TB deduplication pool

Implementing a 96TB deduplication pool

Thus, the EMC solution has a total of 5 components (1 head and 4 shelves). EMC’s 96TB deduplication pool will go down if any of the five components fail.

Symantec solution has a total of three components (3 NetBackup 5020 nodes). Symantec’s 96TB deduplication pool will go down if any of the three components fail.

Observation 1: EMC solution has more single points of failure than Symantec’s solution for a given capacity.

Let us dig deeper. Let us look at the components that actually store data, the storage modules.

Each Data Domain ES30 shelf will have 15 spindles: 12 data drives, 2 parity drives and 1 hot spare. Each shelf can withstand 3 concurrent drive failures.

Each NetBackup 5020 nodes have 22 spindles (not counting the two drives in RAID1 for system disk): 18 data drives, 2 parity drives and 2 hot spares. This configuration can withstand four concurrent drive failures.

Both systems use SATA drives. The theoretical1 annualized failure rate (AFR) for a SATA drive is approximately 1.46%. Robin Harris’ StorageMojo2 blog has some great information on a study done by Google. He quotes the idea of calculated AFR to be 2.88%

Since we are actually comparing the overall storage modules (ES30 storage shelf vs. NetBackup 5020 storage shelf), let us not worry about the absolute value of AFR of a disk drive. For our discussion, let us assume that both Symantec and Data Domain are buying disks from the same manufacturer. Let the AFR be 3% to simplify probability calculations.

An AFR of 3% indicates that the probability of a SATA drive to fail within a year is 3/100.

In case of Data Domain 860 with ES30 shelves, you will lose data if more than 3 drives fail in a year and failed drives were not replaced. The probability of four drives failing in a year can be calculated using conditional probability3. The value is (3/100)4 = 0.000081%

In case of a NetBackup 5020 node, you will lose data if more than 4 drives fail in a year and were not replaced. The probability here is (3/100)5 = 0.00000243%

Note the probability of data loss is low in both cases even if you don’t replace the failed drives for a year. This is why RAID6 and hot spare play a significant role in delivering storage reliability. That is the main point I want to make here. However the probability of losing data on ES30 shelf is 33 times higher than the probability of losing data in NetBackup 5020! The reason here is the extra hot spare that you have in NetBackup 5020 node that provides additional protection.

Observation 2: From storage module perspective, although the absolute probability of losing data is quite low for both EMC and Symantec solutions, the relative probability of losing data on EMC’s ES30 shelf is 33 times higher than that in NetBackup 5020 if drives have identical AFR.

So don’t you disagree with what EMC sales rep has reportedly told about NetBackup 5020 appliances? The devil is always in the details, isn’t it?

Disclaimer: As I had already stated in About Me page in MrVRay.com, the thoughts expressed here are my own. My employer or school has not endorsed/supported any of the content in this blog. If there are errors in this post, contact me at @AbdulRasheed127 on Twitter and I will be happy to correct it. I am not entertaining comments until I invest in a good spam blocker, sorry for the inconvenience 🙁

References:

  1. Annualized Failure Rate (AFR) and Mean Time between Failures (MTBF) in: Seagate Barracuda ES SATA Product Manual, Page 29, Chapter 2.12: Reliability
  2. Robin Harris. Google’s Disk Failure Experience
  3. Conditional Probability: P(AB) = P(A)*P(B|A)

If A and B are independent outcomes, P(B|A) = P(B)

In which case, P(AB) = P(A) * P(B)

vSphere changed block tracking: A powerful weapon for backup applications to shrink backup window

Changed block tracking is not a new technology. Those who have used Storage Foundation for Oracle would know that VERITAS file system (VxFS) provides no-data check points which can be used by backup applications to identify and backup just the changed blocks from the file systems where database files are housed.  This integration was in NetBackup since version 4.5 that was released 10 years ago! It is still used by Fortune 500 companies to protect mission critical Oracle databases that would otherwise require a large backup window with traditional RMAN streaming backups.

VMware introduced change block tracking (CBT) since vSphere 4.0 and is available for virtual machines version 7 or higher. NetBackup 7.0 added support for CBT right away. Backing up VMware vSphere environments got faster. When a VM has CBT turned on, it can track changes to virtual machine disk (VMDKs) sectors.  Its impact on VM performance is marginal. Backup applications with VADP (vStorage APIs for Data Protection) support can use an API (named QueryChangedDiskAreas) to identify and copy changed blocks from a particular point in time. This time point is identified using an argument named ChangeId in the API call.

VMware has made this quite easy for backup vendors to implement. Powerful weapons can be dangerous when not used with utmost care. An unfortunate problem in Avamar’s implementation of CBT came to light recently. I am not picking on Avamar developers here, it is not possible to predict all the edge cases during development and they are working hard to fix this data loss situation. As an engineer myself, I truly empathize with Avamar developers for getting themselves into this unfortunate situation. This blog is a humble attempt to explain what had happened as I got a few questions from the field seeking input on the use of CBT after the EMC reported issues in Avamar.

As we know, VADP lets you query the changed disk areas to get all the changes in a VMDK since a point in time corresponding to a previous snapshot. Once the changed blocks are identified, those blocks are transferred to the backup storage. The way the changed blocks are used by the backup application to create the recovery point (i.e. backup image) varies from vendor to vendor.

No matter how the recovery point is synthesized, the backup application must make sure that the changed blocks are accurately associated with the correct VMDK because a VM can have many disks. As you can imagine if the blocks were associated with the wrong disk in backup image; the image is not an accurate representation of source. The recovery from this backup image will fail or will result in corrupt data on source.

The correct way to identify VMDK is using their UUIDs which are always unique. Using positional identifies like controller-target-LUN at the VM level are not reliable as those numbers could change when some of the VMDK are removed or new ones are added to a VM. This is an example of disk re-order problem. This re-order can also happen for non-user initiated operations. In Avamar’s case, the problem was that the changed blocks belonging one VMDK was getting associated with a different VMDK in backup storage on account of VMDK re-ordering. Thus the resulting backup image (recovery point) generated did not represent the actual state of VMDK being protected.

To make the unfortunate matter worse, there was a cascading effect. It appears that Avamar’s implementation of generating a recovery point is to use the previous backup as the base. If disk re-order happened after nth backup, all backups after nth backup are affected on account of the cascading effect because new backups are inheriting the base from corrupted image.

This sounds scary. That is how I started getting questions on reliability of CBT for backups from the field. Symantec supports CBT in both Backup Exec and NetBackup. Are Symantec customers safe?

Yes, Symantec customers using NetBackup and Backup Exec are safe.

How do Symantec NetBackup and Backup Exec handle re-ordering? Block level tracking and associated risks were well thought out during the implementation. Implementation for block level tracking is not something new for Symantec engineering because such situations were accounted for in the design for implementing VxFS’s no-data check point block level tracking several years ago.

There are multiple layers of resiliency built-in Symantec’s implementation of CBT support. I shall share oversimplified explanations for two of those relevant in ensuring data integrity that are relevant here.

Using UUID to accurately associate ChangeId to correct VMDK: We already touched on this. UUID is always unique and using it to associate the previous point in time for VMDK is safe. Even when VMDKs get re-ordered in a VM, UUID stays the same. Thus both NetBackup and Backup Exec always associate the changed blocks to the correct VM disk.

Superior architecture that eliminates the ‘cascading-effect’:  Generating a corrupted recovery point is bad. What is worse is to use it as the base for newer recovery points. The corruption goes on and hurt the business if left unnoticed for long time. NetBackup and Backup Exec never directly inject changed blocks to an existing backup to create a new recovery point. The changed blocks are referenced separately in the backup storage. During a restore, NetBackup recreates the point in time during run-time. This is the reason NetBackup and Backup Exec are able to support block level incremental backups even to tape media! Thus a corrupted backup (should that ever happen) never ‘propagates’ corruption to future backups.

Introduction to VMware vStorage APIs for Data Protection aka VADP

6. Getting to know NetBackup for VMware vSphere

Note: This is an extended version of my blog in VMware Communities: Where do I download VADP? 

Now that we talked about NetBackup master servers and media servers, it is time to get into learning how NetBackup Client on VMware backup host (sometimes known as VMware proxy host) protects the entire vSphere infrastructure. In order to get there, we first need a primer on vStorage APIs for Data Protection (VADP) from VMware.  We will use two blogs to cover this topic.

Believe it or not, this question of what VADP really is comes up quite often in VMware Communities, especially in Backup & Recovery Discussions

Backup is like an insurance policy. You don’t want to pay for it, but not having it is the recipe for sleepless nights. You need to protect data on your virtual machines to guard against hardware failures and user errors. You may also have regulatory and compliance requirements to protect data for longer term.

With modern day hardware and cutting edge hypervisors like that from VMware, you can protect data just by running a backup agent within the guest operating system. In fact, for certain workloads; this is still the recommended way.

VMware had made data protection easy for administrators. That is what vStorage APIs for Data Protection (VADP) is. It available since vSphere 4.0 release. It is a set of APIs (Application Programming Interfaces) made available by VMware for independent backup software vendors. These APIs make it possible for backup software vendors to embed the intelligence needed to protect virtual machines without the need to install a backup agent within the guest operating system. Through these APIs, the backup software can create snapshots of virtual machines and copy those to backup storage.

Okay, now let us come to the point. As a VMware administrator what do I need to do to make use of VADP? Where do I download VADP? The answer is…Ensure that you are NOT using hosts with free ESXi licenses.

  1. Ensure that you are NOT using hosts with free ESXi licenses
  2. Choose a backup product that has support for VADP

The first one is easy to determine. If you are not paying anything to VMware, the chances are that you are using free ESXi. In that case, the only way to protect data in VMs is to run a backup agent within the VM. No VADP benefits.

Choosing a backup product that supports VADP can be tricky. If your organization is migrating to a virtualized environment, see what backup product is currently in use for protecting physical infrastructure. Most of the leading backup vendors have added support for VADP. Symantec NetBackup, Symantec Backup Exec, IBM TSM, EMC NetWorker, CommVault Simpana are examples.

If you are not currently invested in a backup product (say, you are working for a start-up), there are a number of things you need to consider. VMware has a free product called VMware Data Recovery (VDR) that supports VADP. It is an easy to use virtual appliance with which you can schedule backups and store it in deduplicated storage. There are also point products (Quest vRanger, Veeam Backup & Replication etc.) which provide additional features. All these products are good for managing and storing backups of virtual machines on disk for shorter retention periods. However, if your business requirements need long term retention, you would need another backup product to protect the backup repositories of these VM only solutions which can be a challenge. Moreover, it is less unlikely to see businesses that are 100% virtualized. You are likely to have those NAS devices for file serving, desktops and laptops for end users and so on.  Hence a backup product that supports both physical systems and VADP are ideal in most solutions.

Although VADP support is available from many backup vendors, the devil is in the details. Not all solutions use VADP the same way. Furthermore, many vendors add innovative features on top of VADP to make things better.  We will cover this next.

Back to NetBackup 101 for VMware Professionals main page

Next: Coming Soon!

Turning cheap disk storage into an intelligent deduplication pool in NetBackup

5. NetBackup Intelligent Deduplication Pool

Deduplication for backups does not need an introduction. In fact, deduplication is what made disk storage a viable alternative for tapes. Deduplication storage is available from several vendors in the form of pre-packaged storage and software. Most of the backup vendors also provide some level of data reduction using deduplication or deduplication-like features.

Often we hear that backups of virtual environments are ideal for deduplication. While I agree with this statement, several articles tend to give the wrong perception when it comes to why it is a good idea.

The general wisdom goes like this. As there are many instances of guest operating systems, there are many duplicate files and hence deduplication is recommended.  A vendor may use this reasoning to sell you the deduplication appliance or to differentiate their backup product from others. This is short-sighted view. First of all, multiple instances of the same version of operating system are possible even when your environment is not virtualized; hence that argument is weak. Secondly, operating system files contribute less than 10% of your data in most virtual machines hosting production applications. Hence if a vendor tells you that you need to group virtual machines from the same template to be on a backup job to make use of ‘deduplication’; what they provide is not true deduplication. Typically such techniques involve simply using the block level tracking provided by vStorage APIs for Data Protection (vADP) combined with excessive compression. Data reduction does not go beyond a given backup job.

Behold NetBackup Intelligent Deduplication. We talked about NetBackup media servers before. Attach cheap disk storage of your choice and turn on NetBackup Intelligent Deduplication by running a wizard. Your storage transforms into a powerful deduplication pool that deduplicates inline across multiple backup jobs. You can deduplicate at the target (i.e. the media server) or you can let it deduplicate at the source, if you have configured a dedicated VMware backup host.

Why is this referred to as an intelligent deduplication pool? When backup streams arrive, the deduplication engine sees the actual objects (files, database objects, application objects etc.) through a set of patent pending technologies referred to as Symantec V-Ray. Thus it deduplicates blocks after accurately identifying exact object boundaries. Compare this to third party target deduplication devices where the backup stream is blindly chopped to guess the boundaries and identify duplicate segments.

   The other aspect of NetBackup Intelligent Deduplication pool is its scale-out capability.  The ability to grow storage and processing capacity independently as your environment grows.  The storage capacity can be grown from 1TB to 32TB  thereby letting you protect 100s of terabytes of backup images. In addition you can add additional media servers to do dedupe processing on behalf of the media server hosting the deduplication storage. The scale out capability can also be established by simply adding additional VMware backup hosts. The global deduplication occurs across multiple backup jobs, multiple VMware backup hosts and multiple media servers. It is scale out in multiple dimensions! A typical NetBackup environment can protect multiple vSphere environments and deduplicate across virtual machines in all of them.

Back to NetBackup 101 for VMware Professionals main page

No point in locking the door when walls have fallen

Security for information assets is crucial for business continuity. Corporate information in the wrong hands can compromise the survival of an organization which is why security must be considered wherever that information lives. The last stop for the protection and recovery of this information is backup.  It protects against the loss of information from hardware failures, logical corruptions and human errors.  Unfortunately, the security of backup information can often be overlooked. As a leader in backup and security, Symantec advocates a holistic approach to managing and securing corporate information.

Let us consider a few of the key components for secure enterprise backup solutions.

  • The solution must securely transfer data from source to backup storage.
  • The solution must store the backup image securely.
  • It should offer authentication and authorization built-in to control access.  You don’t want an intruder or a client system masquerading as a production client and retrieving data from backups.
  • It must protect itself completely. If the system hosting the solution suffers a hardware failure, it must be able to recover image metadata, security certificates, access control rules and other important data so that the solution once again controls access to backup images.

While there are more considerations that could be listed, these underscore the broader requirements for an enterprise backup solution.  Backup is much more than moving data as quickly as possible from source to backup storage.  Other functions need to exist to ensure the security of backup images and to prevent the production data embedded in a backup image from falling into wrong hands.

Moving data to the cloud is a great use case to consider in more detail. In an era where organizations are looking to cut operating costs by moving information to the cloud, attention must be paid to the protection of user data from end-to-end, regardless of whether it is on primary storage on a VM hosted for the user or sent to a backup storage.  Choosing a backup solution that fails to deliver security for backup images may result in corporate embarrassment, liability and loss of business.

I came across an interesting post from Mike Beevor who works for Veeam as a Systems Engineer. You can read the details here. Mike creates a nice article by consolidating the scripted responses needed when an IT security team is evaluating the risks in using Veeam for VM backups in a secured environment. Unfortunately, he looks to have left one huge hole unattended. If a tornado knocks down the walls, is there any point in putting locks on rest of the doors?  Let me explain what I mean.

Locking the front door when walls have fallen!
Locking the front door when walls have fallen! Illustration by Scott G.

The failure point in question is the Veeam Backup File(VBK). When production VMs with precious data are backed by Veeam, it stores virtual machine files in a container file with extension vbk . This file is kept on a plain file system on a backup server with direct attached storage, SAN attached array or on a NAS device.

Most production VMs will have one VMDK file for operating system and one or more VMDK files for data. A utility was originally developed to provide users with a way to import a VM using a VBK backup file directly into vSphere. Veeam created this process because the backup solution does not offer a good way to protect itself (see rule number 4 – protect the backup & recovery system).  A person who gets a copy of a VBK file can import the VM in the file onto his/her own ESXi host, detach the data disk and mount it on his/her own VM to get access to production data! Veeam does not provide any sort of encryption for VBK files.  Unfortunately, the only way to recover individual objects from a Veeam backup is to run the entire VM from backup storage.

The lack of security for the container file makes it easy for anyone to retrieve data. The users of Veeam are already concerned about this weakness. Posts in the Veeam forums related to this issue seem to be conveniently moderated out. Here is an example of a post which appeared in a Google search before it was deleted for Veeam 6 launch. Finally a modified response appeared that Veeam will consider this as a future feature.

 

Customers requesting enhanced security for VBK, moderated thread reappeared recently

The user is asking for enhanced VBK security through the use of password protection.  It is a step in the right direction. Hopefully, Veeam will work on this soon.

As you can imagine, this is a huge security hole especially as users virtualize more mission critical applications.  Unfortunately, Veeam 6 makes this problem even worse. Now these VBK files are scattered around multiple repository hosts thereby increasing the chances for exposure.  

What to do if you are currently using Veeam for backups?

  • Enable file system level encryption on all repositories if overall performance after encryption is acceptable.
  • When using a NFS/CIFS based deduplication device for storage (e.g. Data Domain), enable encryption within the device.
  • Make sure that the NFS/CIFS shares are exposed only to proxy servers and Veeam servers. NAS devices’ default export policies are generally read-only access for ‘world’. In the case of VBK files this read-only access is enough to compromise production data in backups.
  • Harden passwords on all Veeam backup servers, repository servers and proxy servers.  Do not use the same password on all repository servers.
  • Talk to your security team; leverage their investments. For example, the security team may have Symantec Critical System Protection suite.  Install CSP agents on backup servers and repository servers to provide non-signature based Host Intrusion Prevention protection. It protects  against zero-day attacks using granular OS hardening policies along with application, user and device controls
  • Consider switching to a backup solution that offers encryption for both data in-flight and data at rest.  In fact, you may already be using another backup solution to backup Veeam backup files. Most backup applications offer vADP integration with VMware. Do you know that Backup Exec is #1 in Veeam backup? Veeam’s Doug Hazelman admitted this to Curtis Preston.

Veeam has done similar things in the past to conveniently hide the root of the problem with deflection techniques. Remember DCIG’s Jerome Wendt who uncovered the real motto behind SureBackup? Learn more about his discovery here.

Naturally, the next question is how a leader like Symantec provides security for backups.  Let us use Symantec’s Backup Exec as an example.

 

  • When making use of deduplication-folder (Backup Exec’s built-in deduplication), it is not possible for an intruder to identify specific backup images even if he gains access to the file system where the folder resides. If he decides to steal the entire folder, he cannot import the images on this folder to an alternate system without having the credentials needed to access this deduplication folder.
  • When sending backups to tape, software and hardware encryption (T10 encryption standard) are supported. Thus you do not have to worry about information getting leaked even if tapes are stolen.
  • Backup Exec uses security certificates between clients and media servers. It is not possible for an intruder to masquerade as a client and request a restore of production data.
  • Self-protection: Backup Exec not only protects the production data, but it has the capability to protect itself against hardware failures or human errors.