Getting to know the Network Block Device Transport in VMware vStroage APIs for Data Protection

When you backup a VMware vSphere virtual machine using vStorage APIs for Data Protection (VADP), one of the common ways to transmit data from VMware data store to backup server is through Network Block Device (NBD) transport. NBD is a Linux-like module that attaches to VMkernel and makes the snapshot of the virtual machine visible to backup server as if the snapshot is a block device on network. While NBD is quite popular and easy to implement, it is also the least understood transport mechanisms in VADP based backups.

NBD is based on VMware’s Network File Copy (NFC) protocol. NFC uses VMkernel port for network traffic. As you already know, VMkernel ports may also be used by other services like host management, vMotion, Fault Tolerance logging, vSphere Replication, NFS, iSCSI an so on. It is recommended to create specific VMkernel ports that attach to dedicated network adapters if you are using a bandwidth intensive service. For example, it is highly recommended to dedicate an adapter for Fault Tolerance logging.

Naturally, the first logical solution to drive high throughput from NBD backups would be to dedicate a bigger pipe for VADP NBD transport. Many vendors put this as the best practice but that alone won’t give you performance and scale.

Let me explain this using an example. Let us assume that you have a backup server streaming six virtual machines from an ESXi host using NBD transport sessions. The host and backup server are equipped with 10Gb adapters. In general a single 10Gb pipe can deliver around 600 MB/sec. So you would expect that each virtual machine would be backed up at around 100 MB/sec (600 MB/sec divided into 6 streams for each virtual machine), right? However, in reality each stream would have access to much lower share of bandwidth because VMkernel automatically caps each session for stability. Let me show you the actual results from a benchmark that we conducted where we measured performance as we increased the number of streams.

NBD Transport and number of backup streams
NBD Transport and number of backup streams

As you can see, by the time the number of streams has reached 4 (in other words, four virtual machines were simultaneously getting backed up), each stream is able to deliver just 55 MB/sec and the overall throughput is 220 MB/sec. This is nowhere near the available bandwidth of 600 MB/sec.

The reasoning behind this type of bandwidth throttling is straightforward. You don’t want VMkernel to be strained by serving this type of copy operations while it has better things to do. VMkernel’s primary function is to orchestrate VM processes. VMware engineering (VMware was also a partner in this benchmark, we submitted the full story as a paper for VMworld 2012) confirmed this behavior as normal.

This naturally puts NBD as a second-class citizen in backup transport world, doesn’t it? The good news is that there is a way to solve this problem! Instead of backing up too many virtual machines from the same host, just make your backup policy/job configuration to distribute the load over multiple hosts. Unfortunately, in environments with 100s of hosts and 1000s of virtual machines, it may be difficult to do it manually. Veritas NetBackup provides VMware Resource Limits as part of its Intelligent Policies for VMware backup where you can limit the number of jobs at VMware vSphere object levels, which is quite handy in this type of situations. For example, I ask customers to limit number of jobs per ESXi host to 4 or less using such intelligent policies and resource limit setting. Thus NetBackup can scale-out its throughput by tapping NBD connections from multiple hosts to keep its available pipe fully utilized while limiting the impact of NBD backups on production ESXi hosts.

Thus Veritas NetBackup moves NBD to first class status in protecting large environments even when the backend storage isn’t on Fiber Channel SAN. For example, NetBackup’s NBD has proven its scale in NetApp FlexPod, VCE VBLOCK, Nutanix and VMware EVO (VSAN). Customers could enjoy the simplicity of NBD and scale-out performance of NetBackup in these converged platforms.

References:

Taking VMware vSphere Storage APIs for Data Protection to the Limit: Pushing the Backup Performance Envelope; Rasheed, Winter et al. VMworld 2012

Full presentation on Pushing the Backup Performance Envelope

Checkmate Amazon! Google Nearline may be the Gmail of cold storage

April Fools’ Day 2004: Google announced Gmail, a free search based e-mail service with storage capacity of 1 gigabyte per user1. The capacity was unbelievably high when compared to other free Internet e-mail providers of that time. Hotmail and Yahoo! were giving 2-4MB per user. The days when inbox management used to be a daily chore are no more. The initial press release from the search giant differentiated it’s offering from others on three S’s: Search, Storage and Speed.

Google Nearline may be the Gmail of cold storage
Google Nearline may be the Gmail of cold storage

I wish Google waited a couple more weeks to announce Google Cloud Storage Nearline. It would have been fun to see it announced on April Fools’ Day. Nearline to a business today is how Gmail was to a consumer a decade ago.

Search: Google doesn’t talk about search in the context of Nearline. But nuts don’t fall that far away from the tree. Google wants your business to dump all your cold data in its cloud. It has the resources to adopt a loss leader strategy to help you keep data at lower cost in its cloud. Later you may be offered data mining and analytics as a service where Google would really shine and make money. The economies of scale will benefit both Google and you. Does anyone remember the search experience in Hotmail a decade ago?

Storage: Sorry, you aren’t getting the storage for free but it is cheap. It is a penny per month per gigabyte for data at rest. Instead of declaring a price war with Amazon’s Glacier, Google decided to match its pricing while differentiating itself from Glacier radically with simplicity and access. Unlike Amazon, the cold and standard storage from Google uses the same method of access thereby eliminating operational overhead or programming needs.

Speed: Amazon went old school with Glacier. It is designed look and feel like tape. It takes a few days for you to retrieve data, analogous to getting tapes shipped to you from an offsite location. This is where Google directly poked Amazon. Google is offering an average 3-second response time for data requests! Do you recall how Gmail JavaScript based coding made Hotmail to look like a turtle reloading entire web pages for each action?

Let’s come back to April Fools’ Day again. It happens to be the day after World Backup Day. The cold storage today is backup for most businesses. One of the strategic partnerships that Google made for Nearline launch is impeccable. According to Veritas/Symantec, NetBackup manages half of world’s enterprise data. It is not surprising why Google wanted Veritas to be in the Nearline bandwagon2. The best data pumps for business data is NetBackup and that relationship is a strategic win for Google right off the bat.

  1. Google Gets the Message, Launches Gmail
  2. Access, Agility, Availability: NetBackup and Google Cloud Storage Nearline

The Big Hole in EMC Big Data backup story

It is one of the crucial roles for the marketing team in any organization to communicate the value of its products and services. It is not uncommon (pardon the double negative) for organizations to show the best side of its story while deliberately hiding the weaker aspects through fine prints. The left side of the picture below is the snapshot of breakfast cereal (General Mills’ Total) that came with my breakfast order in Sheraton while travelling on business.

EMC appears to have a Big Hole in its Big Data Backup
EMC appears to have a Big Hole in its Big Data Backup

Note that General Mills had claimed 100% of daily value of 11 vitamins and minerals but with an asterisk. The claim is true only if I consume 53g serving, but the box has only 33g!

Although I may have felt a bit taken back as a consumer, I enjoyed giving a bit of hard time to my General Mills friends and I moved on. This is a small transaction.

What if you were responsible for a transaction worth tens of thousands of dollars and were pitched a glass half-full story like this? It does happen. That General Mills cereal box is what came to my mind when I saw this blog from EMC on protecting Big Data (Teradata) workloads using EMC ‘Big Data backup solution’.

General Mills had the courtesy put the fine print that part of the vitamins and minerals are missing from its box. EMC’s blog didn’t really call out what was missing from its ‘box’ aka Data Domain device to protect Teradata workload using Teradata Data Stream Architecture. In fact it is missing the real brain of the solution: NetBackup!

First a little bit of history and some naked truth. Teradata had been working with NetBackup for over a decade to provide data protection for its workloads. In fact, Teradata sells the NetBackup Agent for Teradata for its customers. This agent pushes the data stream to NetBackup media servers. This is where the real workload aware intelligence (the real brain for this Big Data backup) is built. Once NetBackup media server receives the data stream it can store it on any supported storage: NetBackup Deduplication Pool, NetBackup Advanced Disk Pool, NetBackup OpenStorage Pool or even on a tape storage unit! When it comes to NetBackup OpenStorage Pool, it does not matter who the OpenStorage partner is; it can be EMC Data Domain, Quantum DXi,… The naked truth is that the backend devices are dumb storage devices from the view of NetBackup Agent for Teradata (the Teradata BAR component depicted in the blog).

EMC’s blog appears to have been designed to mislead the reader. It tends to imply that there is some sort of special sauce built natively into Data Domain (or Data Domain Boost) for Teradata BAR stream. The blog is trying to attach EMC to Big Data type workloads through marketing. May I say that the hole is quite big in EMC’s Big Data backup story!

I am speculating that EMC had been telling this story for a while in private engagements with clients. Note that the blog is simply displaying some of EMC’s slides that are marked ‘confidential’. The author forgot to remove it before publishing it. In closed meetings with joint customers of Teradata and NetBackup, a slide like this will create the illusion that Data Domain has something special for Teradata backup. Now the truth just leaked!

What’s up with VADP backups and VDDK on vSphere 5.1?

VMware vSphere 5.1 has been in the market for more than a few months now and the interest in the new capabilities is high. Because of this the market saw many backup vendors rush to announce support for vSphere 5.1 in their VADP (vStorage APIs for Data Protection) integration. Everything looked clean and shiny and new.

On November 21, Symantec made an interesting announcement1. In a nutshell, the statement was that support for vSphere 5.1 would be delayed in its NetBackup and Backup Exec products. It was because they discovered issues while testing the VADP 5.1 API for integration. The API in the current form may introduce risk in performing consistent backups and ensuring reliable restores. All vendors receive the same API, not all vendors perform the same level of testing.

In order to explain the intricacies, first we need to take a quick look at how a backup product is integrated with VMware vSphere. With each release of vSphere, VMware publishes a set of APIs known as VMware APIs for Data Protection or VADP. One of the key components of VADP is Virtual Disk Development kit aka VDDK. This is the component through which third party code receives authenticated access to vSphere Datastores and virtual machine disk files. VMware makes this component available to its technology partners. Partners (backup product vendors in this case) ship this along with their product that has calls to vStorage APIs.

With each version of vSphere, an equivalent version of VDDK is released. The VDDK is generally backward compatible to one or more earlier versions of vSphere. For example, VDDK 5.1 supports2 vSphere 5.1, 5.0 and 4.1. VDDK 5.0 supports3 vSphere 5.0, 4.1, 4.0 and VI 3.5. Since the updated VDDK is required to understand the modified data structures in a new version of vSphere, lower versions of VDDK are in general not supported for accessing a higher version of vSphere. For example, VMware historically and currently (as of today) does not support the use of VDDK 5.0 to access datastores in vSphere 5.1.  VMware documents supported versions of vSphere for each of its VDDK versions in release notes.

The key to remember is the statement in bold face above. VMware does not support any violated combinations because of the risks and uncertainties. The partners are expected to ship the correct version of VDDK when they announce the availability of support for a given vSphere release.

What Symantec announced and VMware confirmed4 is that VDDK 5.1 has issues and hence the support for vSphere 5.1 in its products will be delayed. This makes sense since VDDK 5.1 is the only version currently allowed to access vSphere 5.1. The face-saving reactions from other vendors to this announcement revealed some of the dirty games and ugly truths to come out in the area of VADP/VDDK integration.

 

  1. Vendors were claiming support for vSphere 5.1 but still shipping VDDK 5.0 with their products. This is currently not supported by VMware because of the uncertainties.  This may change but at the time vendors claiming support, they were taking risks that typically are not acceptable in field of data protection business.
  2. Vendors were mucking with API calls and silently killing hung processes. That may work for an isolated or random hang. But will not work when there are repeatable hang situations like those observed in VDDK 5.1. Plus, there are performance and reliability concerns in abruptly ending sessions with vSphere.
  3. Most vendors weren’t testing all the edge cases and never realized the problems in VDDK 5.1, thus prematurely announcing support for 5.1

 

If your backup vendor currently supports vSphere 5.1, be sure to ask what their situation is.

Sources and references:

1. Quality wins every time: vSphere 5.1 support update, Symantec official blog.

2. VDDK 5.1 Release Notes, VMware Support resources

3. VDDK 5.0 Release Notes, VMware Support resources

4. Third-party backup software using VDDK 5.1 may encounter backup/restore failures, VMware Support KB

Deduplication Storage Pool Reliability: The devil is in the details

As you guys already know, I do travel a lot and attend trade shows where I represent Symantec. While I was briefing a visitor at Symantec booth on NetBackup 5020 appliance, he asked a question which was quite interesting. “We have requested RFPs from multiple vendors for deploying deduplication solution for backups. EMC sales team told us that Data Domain 800 series is better than NetBackup 5020 appliances in terms of reliability. They said that if one node in a multi-node NetBackup 5020 goes down, the entire deduplication pool goes down. What do you think about it?”

I thanked him for his question. I took a good 20 minutes to explain the situation. I thought it will be nice to document this in a blog for a fair comparison.

Let us compare configurations based on Data Domain 860 and NetBackup 5020. Let us say that the customer is looking to create 96TB of deduplication pool right now. He may need more storage in future.

With Data Domain 860, it would require four ES30 shelves (with 2TB drives) to create this capacity. Plus you need the 860 head unit.  With NetBackup 5020, you would need three nodes.

Implementing a 96TB deduplication pool

Implementing a 96TB deduplication pool

Thus, the EMC solution has a total of 5 components (1 head and 4 shelves). EMC’s 96TB deduplication pool will go down if any of the five components fail.

Symantec solution has a total of three components (3 NetBackup 5020 nodes). Symantec’s 96TB deduplication pool will go down if any of the three components fail.

Observation 1: EMC solution has more single points of failure than Symantec’s solution for a given capacity.

Let us dig deeper. Let us look at the components that actually store data, the storage modules.

Each Data Domain ES30 shelf will have 15 spindles: 12 data drives, 2 parity drives and 1 hot spare. Each shelf can withstand 3 concurrent drive failures.

Each NetBackup 5020 nodes have 22 spindles (not counting the two drives in RAID1 for system disk): 18 data drives, 2 parity drives and 2 hot spares. This configuration can withstand four concurrent drive failures.

Both systems use SATA drives. The theoretical1 annualized failure rate (AFR) for a SATA drive is approximately 1.46%. Robin Harris’ StorageMojo2 blog has some great information on a study done by Google. He quotes the idea of calculated AFR to be 2.88%

Since we are actually comparing the overall storage modules (ES30 storage shelf vs. NetBackup 5020 storage shelf), let us not worry about the absolute value of AFR of a disk drive. For our discussion, let us assume that both Symantec and Data Domain are buying disks from the same manufacturer. Let the AFR be 3% to simplify probability calculations.

An AFR of 3% indicates that the probability of a SATA drive to fail within a year is 3/100.

In case of Data Domain 860 with ES30 shelves, you will lose data if more than 3 drives fail in a year and failed drives were not replaced. The probability of four drives failing in a year can be calculated using conditional probability3. The value is (3/100)4 = 0.000081%

In case of a NetBackup 5020 node, you will lose data if more than 4 drives fail in a year and were not replaced. The probability here is (3/100)5 = 0.00000243%

Note the probability of data loss is low in both cases even if you don’t replace the failed drives for a year. This is why RAID6 and hot spare play a significant role in delivering storage reliability. That is the main point I want to make here. However the probability of losing data on ES30 shelf is 33 times higher than the probability of losing data in NetBackup 5020! The reason here is the extra hot spare that you have in NetBackup 5020 node that provides additional protection.

Observation 2: From storage module perspective, although the absolute probability of losing data is quite low for both EMC and Symantec solutions, the relative probability of losing data on EMC’s ES30 shelf is 33 times higher than that in NetBackup 5020 if drives have identical AFR.

So don’t you disagree with what EMC sales rep has reportedly told about NetBackup 5020 appliances? The devil is always in the details, isn’t it?

Disclaimer: As I had already stated in About Me page in MrVRay.com, the thoughts expressed here are my own. My employer or school has not endorsed/supported any of the content in this blog. If there are errors in this post, contact me at @AbdulRasheed127 on Twitter and I will be happy to correct it. I am not entertaining comments until I invest in a good spam blocker, sorry for the inconvenience 🙁

References:

  1. Annualized Failure Rate (AFR) and Mean Time between Failures (MTBF) in: Seagate Barracuda ES SATA Product Manual, Page 29, Chapter 2.12: Reliability
  2. Robin Harris. Google’s Disk Failure Experience
  3. Conditional Probability: P(AB) = P(A)*P(B|A)

If A and B are independent outcomes, P(B|A) = P(B)

In which case, P(AB) = P(A) * P(B)

vSphere changed block tracking: A powerful weapon for backup applications to shrink backup window

Changed block tracking is not a new technology. Those who have used Storage Foundation for Oracle would know that VERITAS file system (VxFS) provides no-data check points which can be used by backup applications to identify and backup just the changed blocks from the file systems where database files are housed.  This integration was in NetBackup since version 4.5 that was released 10 years ago! It is still used by Fortune 500 companies to protect mission critical Oracle databases that would otherwise require a large backup window with traditional RMAN streaming backups.

VMware introduced change block tracking (CBT) since vSphere 4.0 and is available for virtual machines version 7 or higher. NetBackup 7.0 added support for CBT right away. Backing up VMware vSphere environments got faster. When a VM has CBT turned on, it can track changes to virtual machine disk (VMDKs) sectors.  Its impact on VM performance is marginal. Backup applications with VADP (vStorage APIs for Data Protection) support can use an API (named QueryChangedDiskAreas) to identify and copy changed blocks from a particular point in time. This time point is identified using an argument named ChangeId in the API call.

VMware has made this quite easy for backup vendors to implement. Powerful weapons can be dangerous when not used with utmost care. An unfortunate problem in Avamar’s implementation of CBT came to light recently. I am not picking on Avamar developers here, it is not possible to predict all the edge cases during development and they are working hard to fix this data loss situation. As an engineer myself, I truly empathize with Avamar developers for getting themselves into this unfortunate situation. This blog is a humble attempt to explain what had happened as I got a few questions from the field seeking input on the use of CBT after the EMC reported issues in Avamar.

As we know, VADP lets you query the changed disk areas to get all the changes in a VMDK since a point in time corresponding to a previous snapshot. Once the changed blocks are identified, those blocks are transferred to the backup storage. The way the changed blocks are used by the backup application to create the recovery point (i.e. backup image) varies from vendor to vendor.

No matter how the recovery point is synthesized, the backup application must make sure that the changed blocks are accurately associated with the correct VMDK because a VM can have many disks. As you can imagine if the blocks were associated with the wrong disk in backup image; the image is not an accurate representation of source. The recovery from this backup image will fail or will result in corrupt data on source.

The correct way to identify VMDK is using their UUIDs which are always unique. Using positional identifies like controller-target-LUN at the VM level are not reliable as those numbers could change when some of the VMDK are removed or new ones are added to a VM. This is an example of disk re-order problem. This re-order can also happen for non-user initiated operations. In Avamar’s case, the problem was that the changed blocks belonging one VMDK was getting associated with a different VMDK in backup storage on account of VMDK re-ordering. Thus the resulting backup image (recovery point) generated did not represent the actual state of VMDK being protected.

To make the unfortunate matter worse, there was a cascading effect. It appears that Avamar’s implementation of generating a recovery point is to use the previous backup as the base. If disk re-order happened after nth backup, all backups after nth backup are affected on account of the cascading effect because new backups are inheriting the base from corrupted image.

This sounds scary. That is how I started getting questions on reliability of CBT for backups from the field. Symantec supports CBT in both Backup Exec and NetBackup. Are Symantec customers safe?

Yes, Symantec customers using NetBackup and Backup Exec are safe.

How do Symantec NetBackup and Backup Exec handle re-ordering? Block level tracking and associated risks were well thought out during the implementation. Implementation for block level tracking is not something new for Symantec engineering because such situations were accounted for in the design for implementing VxFS’s no-data check point block level tracking several years ago.

There are multiple layers of resiliency built-in Symantec’s implementation of CBT support. I shall share oversimplified explanations for two of those relevant in ensuring data integrity that are relevant here.

Using UUID to accurately associate ChangeId to correct VMDK: We already touched on this. UUID is always unique and using it to associate the previous point in time for VMDK is safe. Even when VMDKs get re-ordered in a VM, UUID stays the same. Thus both NetBackup and Backup Exec always associate the changed blocks to the correct VM disk.

Superior architecture that eliminates the ‘cascading-effect’:  Generating a corrupted recovery point is bad. What is worse is to use it as the base for newer recovery points. The corruption goes on and hurt the business if left unnoticed for long time. NetBackup and Backup Exec never directly inject changed blocks to an existing backup to create a new recovery point. The changed blocks are referenced separately in the backup storage. During a restore, NetBackup recreates the point in time during run-time. This is the reason NetBackup and Backup Exec are able to support block level incremental backups even to tape media! Thus a corrupted backup (should that ever happen) never ‘propagates’ corruption to future backups.

Introduction to VMware vStorage APIs for Data Protection aka VADP

6. Getting to know NetBackup for VMware vSphere

Note: This is an extended version of my blog in VMware Communities: Where do I download VADP? 

Now that we talked about NetBackup master servers and media servers, it is time to get into learning how NetBackup Client on VMware backup host (sometimes known as VMware proxy host) protects the entire vSphere infrastructure. In order to get there, we first need a primer on vStorage APIs for Data Protection (VADP) from VMware.  We will use two blogs to cover this topic.

Believe it or not, this question of what VADP really is comes up quite often in VMware Communities, especially in Backup & Recovery Discussions

Backup is like an insurance policy. You don’t want to pay for it, but not having it is the recipe for sleepless nights. You need to protect data on your virtual machines to guard against hardware failures and user errors. You may also have regulatory and compliance requirements to protect data for longer term.

With modern day hardware and cutting edge hypervisors like that from VMware, you can protect data just by running a backup agent within the guest operating system. In fact, for certain workloads; this is still the recommended way.

VMware had made data protection easy for administrators. That is what vStorage APIs for Data Protection (VADP) is. It available since vSphere 4.0 release. It is a set of APIs (Application Programming Interfaces) made available by VMware for independent backup software vendors. These APIs make it possible for backup software vendors to embed the intelligence needed to protect virtual machines without the need to install a backup agent within the guest operating system. Through these APIs, the backup software can create snapshots of virtual machines and copy those to backup storage.

Okay, now let us come to the point. As a VMware administrator what do I need to do to make use of VADP? Where do I download VADP? The answer is…Ensure that you are NOT using hosts with free ESXi licenses.

  1. Ensure that you are NOT using hosts with free ESXi licenses
  2. Choose a backup product that has support for VADP

The first one is easy to determine. If you are not paying anything to VMware, the chances are that you are using free ESXi. In that case, the only way to protect data in VMs is to run a backup agent within the VM. No VADP benefits.

Choosing a backup product that supports VADP can be tricky. If your organization is migrating to a virtualized environment, see what backup product is currently in use for protecting physical infrastructure. Most of the leading backup vendors have added support for VADP. Symantec NetBackup, Symantec Backup Exec, IBM TSM, EMC NetWorker, CommVault Simpana are examples.

If you are not currently invested in a backup product (say, you are working for a start-up), there are a number of things you need to consider. VMware has a free product called VMware Data Recovery (VDR) that supports VADP. It is an easy to use virtual appliance with which you can schedule backups and store it in deduplicated storage. There are also point products (Quest vRanger, Veeam Backup & Replication etc.) which provide additional features. All these products are good for managing and storing backups of virtual machines on disk for shorter retention periods. However, if your business requirements need long term retention, you would need another backup product to protect the backup repositories of these VM only solutions which can be a challenge. Moreover, it is less unlikely to see businesses that are 100% virtualized. You are likely to have those NAS devices for file serving, desktops and laptops for end users and so on.  Hence a backup product that supports both physical systems and VADP are ideal in most solutions.

Although VADP support is available from many backup vendors, the devil is in the details. Not all solutions use VADP the same way. Furthermore, many vendors add innovative features on top of VADP to make things better.  We will cover this next.

Back to NetBackup 101 for VMware Professionals main page

Next: Coming Soon!

Turning cheap disk storage into an intelligent deduplication pool in NetBackup

5. NetBackup Intelligent Deduplication Pool

Deduplication for backups does not need an introduction. In fact, deduplication is what made disk storage a viable alternative for tapes. Deduplication storage is available from several vendors in the form of pre-packaged storage and software. Most of the backup vendors also provide some level of data reduction using deduplication or deduplication-like features.

Often we hear that backups of virtual environments are ideal for deduplication. While I agree with this statement, several articles tend to give the wrong perception when it comes to why it is a good idea.

The general wisdom goes like this. As there are many instances of guest operating systems, there are many duplicate files and hence deduplication is recommended.  A vendor may use this reasoning to sell you the deduplication appliance or to differentiate their backup product from others. This is short-sighted view. First of all, multiple instances of the same version of operating system are possible even when your environment is not virtualized; hence that argument is weak. Secondly, operating system files contribute less than 10% of your data in most virtual machines hosting production applications. Hence if a vendor tells you that you need to group virtual machines from the same template to be on a backup job to make use of ‘deduplication’; what they provide is not true deduplication. Typically such techniques involve simply using the block level tracking provided by vStorage APIs for Data Protection (vADP) combined with excessive compression. Data reduction does not go beyond a given backup job.

Behold NetBackup Intelligent Deduplication. We talked about NetBackup media servers before. Attach cheap disk storage of your choice and turn on NetBackup Intelligent Deduplication by running a wizard. Your storage transforms into a powerful deduplication pool that deduplicates inline across multiple backup jobs. You can deduplicate at the target (i.e. the media server) or you can let it deduplicate at the source, if you have configured a dedicated VMware backup host.

Why is this referred to as an intelligent deduplication pool? When backup streams arrive, the deduplication engine sees the actual objects (files, database objects, application objects etc.) through a set of patent pending technologies referred to as Symantec V-Ray. Thus it deduplicates blocks after accurately identifying exact object boundaries. Compare this to third party target deduplication devices where the backup stream is blindly chopped to guess the boundaries and identify duplicate segments.

   The other aspect of NetBackup Intelligent Deduplication pool is its scale-out capability.  The ability to grow storage and processing capacity independently as your environment grows.  The storage capacity can be grown from 1TB to 32TB  thereby letting you protect 100s of terabytes of backup images. In addition you can add additional media servers to do dedupe processing on behalf of the media server hosting the deduplication storage. The scale out capability can also be established by simply adding additional VMware backup hosts. The global deduplication occurs across multiple backup jobs, multiple VMware backup hosts and multiple media servers. It is scale out in multiple dimensions! A typical NetBackup environment can protect multiple vSphere environments and deduplicate across virtual machines in all of them.

Back to NetBackup 101 for VMware Professionals main page

vPower: brand new solution, really?

When I started exploring AIX nearly eight years ago, there were two things that fascinated me right of the bat. I was already a certified professional for Solaris that time. I had also managed Tru64 UNIX and HP-UX mainly for Oracle workloads. Those used to be the time of tuning shared memory, message queue and semaphore parameters. During my days working as a contractor for a large financial institution and later for VERITAS/Symantec NetBackup technical support; tuning the UNIX system kernel for IPCS parameters were more of a norm than exception. AIX intrigued me because it featured a dynamic kernel! It was really a big deal for the kind of job I used to do!

The second thing that looked unique in comparison with rest of the UNIX platforms was AIX’s mksysb. In AIX, you could send the entire rootvg (all the boot files, system files and additional data file systems you may want to include in the root volume group) to a backup tape. When you need to restore your system from bare metal, you simply boot from tape medium and run the installer; your system is back to the same point in time when you did the mksysb backup. Furthermore, if needed, you can also boot from tape and restore selected files with a little help from tape positioning commands.

I went on to get certified on AIX, not just because of those two bells and whistles, but VERITAS Storage Foundation was expanding to AIX and it was a good thing to add AIX certification when we integrated its snapshot capabilities in NetBackup.

The mksysb started to become a bit obsolete for two reasons.

  1. It is expensive to have a standalone tape drive with every pSeries system. Not just because of the need for a tape drive on each system, rather the increased operational expenditure for the system administrators to manually track tapes with mksysb images for each system and also maintain a time-series catalog of all images.
  2. Enterprise data protection solutions like NetBackup added Bare Metal Restore (BMR) support. NetBackup BMR feature makes it possible to recover any physical system (be it AIX, HP-UX, Solaris, Linux, Windows…) from bare metal just by running a single command on master server to tell NetBackup that a client needs to be rebuild from bare metal. You also have the option to specify whether you need to bring the client to the most recent point in time (suitable in case of hardware failures) or a point in time from the past (suitable in case of logical corruptions that had happened before the most recent backup). After that you simply reboot the client. The client boots from network and recovers itself. The process is 100% unattended once the reboot is initiated.

What about virtual machines? You can indeed use NetBackup BMR feature on virtual machines. It is supported. The availability of deeper integration with VMware vADP and Hyper-V VSS makes it possible to perform agent-less backups of virtual machines whereby you could restore the entire VM or individual objects. Hence you don’t need it for VMs hosted by those hypervisors. You can use NetBackup BMR for VMs on other hypervisors like Citrix XenServer, IBM PowerVM, Oracle VM Server etc.  With NetBackup BMR and NetBackup Intelligent Deduplication, you have a solution no matter how many kinds of hypervisors are powering your clouds.

Why this story? Recently, during the after-party of a PR event hosted by Intel; I had a conversation with an old friend. He works for an organization who happens to be a partner for Veeam. He mentioned about Veeam and Visioncore are having a patent battle on the ability to run a system directly from the backup image. Veeam calls this feature as vPower, VisionCore calls it FlashRestore. This technology is really the virtual machine version of what IBM offered for AIX pSeries systems. You boot and run the system directly from the backup image and recover the whole system or selected files. The value additions like the flexibility to keep it running while being able to live migrate it to production storage comes from VMware’s innovative Storage vMotion technology which isn’t really something Veeam or VisionCore can take credit for. Visioncore may not have much difficulty fighting this battle.

We had a good laugh when we pulled Veeam’s marketing pitch on U-AIR which is nothing but running the VM from backup and copy required application files back to production VM over the wire. He raised his iPad to show Veeam’s datasheet to the group.

“vPower also enables quick recovery of individual objects from any virtualized application, on any OS. It’s a brand-new solution to the age-old problem of what to do when users accidentally delete important emails or scripts incorrectly update records.”

Brand new solution for the age-old problem, really?

NetBackup media servers and vSphere ESXi hosts: The real workhorses

4. OpenStorage for secondary storage, now VMware is onto the same thing for VM storage

If vCenter is the control and command center in vSphere environment, the ESXi hosts are the workhorses really doing most of the heavy lugging. ESXi hosts house VMs. They provide CPU, memory, storage and other resources for virtual machines to function. Along the same line, NetBackup media servers make backups, restores and replications happen in a NetBackup domain under the control of NetBackup master server. Media servers are the ones really ‘running’ various jobs.

ESXi hosts have storage connected to them for housing virtual machines. This storage allocated to ESXi hosts is called data store. More than one ESXi host can share the same data store. In such configurations, we refer to the set of ESXi hosts as an ESXi cluster.

NetBackup media servers also have storage connected to them for storing backups. More than one media server can share the same storage. NetBackup decouples storage from media server in its architecture to a higher degree than vSphere ESXi hosts. An ESXi host does not treat storage as intelligent. Although most enterprise grade storage systems have more intelligence built-in, you still have to allocate LUNs from the storage for ESXi hosts. VMware understands that the old school method of storage (which had been used in the industry over many decades) does not scale well and does not take advantage of a number of features and functions the intelligent storage systems can manage on their own. If you were in VMworld 2011, you may already know that VMware is taking steps to move away from LUN based storage model. See Nick Allen’s blog for more info. NetBackup took the lead for secondary storage half a decade back!

NetBackup is already there! Symantec announced OpenStorage program along the same time NetBackup 6.5 was released which revolutionized the way backups are stored on disks. All backup vendors treat disk in the LUN model. You allocate a LUN to the backup server and create a file system on top of it. Or you present a file system to the backup server via NFS/CIFS share. To make the matter worse, some storage systems presented disk as tape using VTL interfaces. The problem with these old school methods can be categorized into two.

First of all, the backup application is simply treating the intelligent storage system as a dumping ground for backup images. There is really no direct interaction between the backup system and storage system. Thus if your storage system has the capability to selectively replicate objects to another system, the backup server does not know about the additional copy that was made. If your storage system is capable of deduplicating data, the backup server does not know about it. Thus the backup server cannot intelligently manage storage capacity. For example, free space reported at the file system layer may be 10Gb, but it may be able to handle a 50Gb backup as the storage features deduplication. Similarly expiring a backup image with size 100Gb may not really free up that much space, but the backup server has no way of knowing this.

Secondly, the general-purpose file systems like NTFS, UFS, ext3, CIFS, NFS etc are optimized for random access. This is a good thing for production applications. But it comes with its own additional overhead. Backups and restores generally follow sequential I/O pattern with large chunks of writes and reads. For example, presenting a high performance deduplication system like NetBackup 5000 series appliances, Data Domain, Quantum DXi, Exgrid etc as NFS share would imply unnecessary overhead as NFS protocol is really for random access.

Symantec OpenStorage addresses this problem by asking storage vendors to provide OpenStorage disk pools and disk volumes for backups. This is just like what VMware wants Capacity pools and VM volumes to do for VM data stores in future. OpenStorage is a framework where NetBackup media servers simply provide a framework using which it can query, write read from intelligent storage systems. The API and SDK is made available to storage vendors so that they could develop plug-ins. When this plug-in is installed on media server, now the media server gains the intelligence to see the storage system and speak its language. Now the media server can simply stream backups to the storage device (without depending on overloaded protocols) and the intelligent storage system can store it in its native format. The result is 3 to 5x faster performance and the ability to tap into other features in storage system like replication.

In NetBackup terms, now the media server is simply a data mover. It moves data from client to storage. Since the storage system is intelligent and media server can communicate with it, it is referred as a storage server. Multiple media servers can share a storage server. When backups (or other jobs like restores, duplication etc) need to be started, NetBackup master server determines which media server has the least load. Then the selected media server loads the plug-in and preps the storage server to start receiving backups. You can compare this to the way VMware DRS and HA works where vCenter server picks the least loaded ESXi hosts for starting a VM from common data store.

Okay, so we talked about intelligent storage servers. How about the dump storage (JBOD) and tape drives? NetBackup media servers support those as well. Even in the case of a JBOD, which can be attached to the media server, NetBackup media servers make them intelligent! That story is next.

Next: Coming Soon!

Back to NetBackup 101 for VMware Professionals main page