VMware EVO: The KFC of SDDC

EVO is the KFC of SDDC
EVO is the KFC of SDDC

VMware EVO is bringing to software-defined data centers the same type of business model that Kentucky Fried Chicken had brought to restaurants decades ago. VMware is hungry to grow and is expanding its business to new territories. Colonel Sanders’s revolutionary vision to sell his chicken recipe and brand through franchise model is now coming to IT infrastructure as ready-to-eat value meals.

Most of the press reports and analyst blogs are focused on VMware’s arrival into converged infrastructure market. Of course, vendors like Nutanix and SimpliVity will certainly lose sleep as the 800-pound gorilla has set its eyes on converged infrastructure market. However, VMware’s strategy is much deeper than taking over the converged infrastructure market from upstarts, it is a bold attempt to disrupt the business model of selling IT infrastructure stacks while keeping public cloud providers away from enterprise IT shops.

Bargaining power of supplier: Have you noticed the commanding power of VMware in EVO specifications? Partners like Dell and EMC are simply the franchisees of VMware’s infrastructure recipe and brand. It is no secret that traditional servers and storage are on the brink of disruption because buyers wouldn’t pay premium for brand names much longer. It is the time for them to let go of individuality and become delivery model for a prescriptive architecture (franchise model) from a stronger supplier in the value chain.

Software is now the king, no more OEM: In the old world where hardware vendors owned brand power and distribution chains, software vendors had to make OEM deals to get their solutions to the market in those hardware vehicles. Now the power is shifting to software. The software vendor prescribes (a softened term that actually stands for ‘dictates’) how infrastructure stacks should be built.

Short-term strategy, milk the converged infrastructure market: This is the most obvious hint VMware has given; reporters, bloggers and analysts have picked up this obvious message. As more and more CIOs are looking to reduce capital and operational costs, the demand for converged systems is growing rapidly. Even the primitive assembled-to-order type solutions from VCE and NetApp-Cisco are milking the current demand for simplified IT infrastructure stacks. Nutanix leads the pack in relatively newer and better hyper-convergence wave. VMware’s entry into this market validates that convergence is a key trend in modern IT.

Long-term strategy, own data center infrastructure end-to-end while competing with public clouds: The two of three key pillars of VMware strategy are enabling software-defined data centers and delivering hybrid clouds. Although SDDC and hybrid cloud would look like two separate missions, the combination is what is needed to fight Amazon and other public cloud solutions from taking over the workloads from IT shops. The core of VMware’s business is selling infrastructure solutions for on-prem data centers. Although VMware positions itself as the enabler of service providers, it understands that the bargaining power of customers would continue to stay low if organizations stick to on-prem solutions. This is where SDDC strategy fits. By commoditizing infrastructure components (compute, storage and networking) and shifting the differentiation to infrastructure management and service delivery, VMware wants to become the commander in control for SDDCs (just like how Intel processors dictated direction for PCs in the last two decades). EVO happens to be that SDDC recipe it wants to franchise to partners so that customers could taste the same SDDC no matter who their current preferred hardware vendors are. Thus EVO is the KFC of SDDC. It is not there as a Nutanix killer, VMware also wants to take shares from Cisco (Cisco UCS is almost #1 in server market, Cisco is #1 in networking infrastructure), EMC Storage (Let us keep the money in the family, the old man’s hardware identity is counting its days) and other traditional infrastructure players. At the same time, VMware wants to transform vCloud Air (the rebranded vCloud Hybrid Service) as the app store for EVO based SDDCs to host data services in cloud. It is a clever plan to keep selling to enterprises and hide them away from the likes of Amazon. Well played, VMware!

So what will the competitive action from Amazon and other public cloud providers? Amazon has resources to build a ready-to-eat private Fire Cloud for enterprises that can act as the gateway to AWS. All this time, Amazon focused mainly on on-prem storage solutions that extend to AWS. We can certainly expect the king of public clouds do something more. It is not a question of ‘if’; rather it is the question of ‘when’.

EMC’s Hardware Defined Control Center vs. VMware’s Software Defined Data Center

EMC trying to put the clock back from software-defined storage movement
EMC trying to put the clock back from software-defined storage movement

EMC’s storage division appears to be in old yeller mode. It knows that customers would eventually stop paying a premium for branded storage. The bullets to put branded storage out of its misery are coming from software defined storage movement led by its own stepchild VMware. But the old man is still clever and pretending to hangout with the cool kids to stay relevant while trying to survive as long as there are CIOs willing to pay premium for storage with a label.

Software-defined storage is all about building storage and data services on top of commodity hardware. No more vendor locked storage platforms on proprietary hardware. This movement offers high performance at lower cost by bringing storage closer to compute. Capacity and performance are two independent vectors in software-defined storage.

TwinStrata follows that simplicity model and had helped customers extend the life of existing investments with true software solutions. The data service layer offers storage tiering where that last tier could be a public cloud. EMC wants the market to believe that its acquisition of TwinStrata is an attempt to embrace software-defined storage movement. But the current execution plan is a little backward. EMC’s plan is a bolted-on type of integration for TwinStrata IP on top of legacy VMAX storage platform. That means EMC wants to keep the ‘software-defined’ IP closer to its proprietary array itself. The goal is, of course, to prolong the life of VMAX in the software-defined world. While it defeats the rationale behind software-defined storage movement, it may be the last straw to pull the clock back a little.

Hopefully there is another project where EMC will seriously consider building a true software-defined storage solution from the acquired IP without the deadweight of legacy platforms. Perhaps transform ViPR from vaporware to something that really rides the wave of software-defined movement?

vSphere changed block tracking: A powerful weapon for backup applications to shrink backup window

Changed block tracking is not a new technology. Those who have used Storage Foundation for Oracle would know that VERITAS file system (VxFS) provides no-data check points which can be used by backup applications to identify and backup just the changed blocks from the file systems where database files are housed.  This integration was in NetBackup since version 4.5 that was released 10 years ago! It is still used by Fortune 500 companies to protect mission critical Oracle databases that would otherwise require a large backup window with traditional RMAN streaming backups.

VMware introduced change block tracking (CBT) since vSphere 4.0 and is available for virtual machines version 7 or higher. NetBackup 7.0 added support for CBT right away. Backing up VMware vSphere environments got faster. When a VM has CBT turned on, it can track changes to virtual machine disk (VMDKs) sectors.  Its impact on VM performance is marginal. Backup applications with VADP (vStorage APIs for Data Protection) support can use an API (named QueryChangedDiskAreas) to identify and copy changed blocks from a particular point in time. This time point is identified using an argument named ChangeId in the API call.

VMware has made this quite easy for backup vendors to implement. Powerful weapons can be dangerous when not used with utmost care. An unfortunate problem in Avamar’s implementation of CBT came to light recently. I am not picking on Avamar developers here, it is not possible to predict all the edge cases during development and they are working hard to fix this data loss situation. As an engineer myself, I truly empathize with Avamar developers for getting themselves into this unfortunate situation. This blog is a humble attempt to explain what had happened as I got a few questions from the field seeking input on the use of CBT after the EMC reported issues in Avamar.

As we know, VADP lets you query the changed disk areas to get all the changes in a VMDK since a point in time corresponding to a previous snapshot. Once the changed blocks are identified, those blocks are transferred to the backup storage. The way the changed blocks are used by the backup application to create the recovery point (i.e. backup image) varies from vendor to vendor.

No matter how the recovery point is synthesized, the backup application must make sure that the changed blocks are accurately associated with the correct VMDK because a VM can have many disks. As you can imagine if the blocks were associated with the wrong disk in backup image; the image is not an accurate representation of source. The recovery from this backup image will fail or will result in corrupt data on source.

The correct way to identify VMDK is using their UUIDs which are always unique. Using positional identifies like controller-target-LUN at the VM level are not reliable as those numbers could change when some of the VMDK are removed or new ones are added to a VM. This is an example of disk re-order problem. This re-order can also happen for non-user initiated operations. In Avamar’s case, the problem was that the changed blocks belonging one VMDK was getting associated with a different VMDK in backup storage on account of VMDK re-ordering. Thus the resulting backup image (recovery point) generated did not represent the actual state of VMDK being protected.

To make the unfortunate matter worse, there was a cascading effect. It appears that Avamar’s implementation of generating a recovery point is to use the previous backup as the base. If disk re-order happened after nth backup, all backups after nth backup are affected on account of the cascading effect because new backups are inheriting the base from corrupted image.

This sounds scary. That is how I started getting questions on reliability of CBT for backups from the field. Symantec supports CBT in both Backup Exec and NetBackup. Are Symantec customers safe?

Yes, Symantec customers using NetBackup and Backup Exec are safe.

How do Symantec NetBackup and Backup Exec handle re-ordering? Block level tracking and associated risks were well thought out during the implementation. Implementation for block level tracking is not something new for Symantec engineering because such situations were accounted for in the design for implementing VxFS’s no-data check point block level tracking several years ago.

There are multiple layers of resiliency built-in Symantec’s implementation of CBT support. I shall share oversimplified explanations for two of those relevant in ensuring data integrity that are relevant here.

Using UUID to accurately associate ChangeId to correct VMDK: We already touched on this. UUID is always unique and using it to associate the previous point in time for VMDK is safe. Even when VMDKs get re-ordered in a VM, UUID stays the same. Thus both NetBackup and Backup Exec always associate the changed blocks to the correct VM disk.

Superior architecture that eliminates the ‘cascading-effect’:  Generating a corrupted recovery point is bad. What is worse is to use it as the base for newer recovery points. The corruption goes on and hurt the business if left unnoticed for long time. NetBackup and Backup Exec never directly inject changed blocks to an existing backup to create a new recovery point. The changed blocks are referenced separately in the backup storage. During a restore, NetBackup recreates the point in time during run-time. This is the reason NetBackup and Backup Exec are able to support block level incremental backups even to tape media! Thus a corrupted backup (should that ever happen) never ‘propagates’ corruption to future backups.

Introduction to VMware vStorage APIs for Data Protection aka VADP

6. Getting to know NetBackup for VMware vSphere

Note: This is an extended version of my blog in VMware Communities: Where do I download VADP? 

Now that we talked about NetBackup master servers and media servers, it is time to get into learning how NetBackup Client on VMware backup host (sometimes known as VMware proxy host) protects the entire vSphere infrastructure. In order to get there, we first need a primer on vStorage APIs for Data Protection (VADP) from VMware.  We will use two blogs to cover this topic.

Believe it or not, this question of what VADP really is comes up quite often in VMware Communities, especially in Backup & Recovery Discussions

Backup is like an insurance policy. You don’t want to pay for it, but not having it is the recipe for sleepless nights. You need to protect data on your virtual machines to guard against hardware failures and user errors. You may also have regulatory and compliance requirements to protect data for longer term.

With modern day hardware and cutting edge hypervisors like that from VMware, you can protect data just by running a backup agent within the guest operating system. In fact, for certain workloads; this is still the recommended way.

VMware had made data protection easy for administrators. That is what vStorage APIs for Data Protection (VADP) is. It available since vSphere 4.0 release. It is a set of APIs (Application Programming Interfaces) made available by VMware for independent backup software vendors. These APIs make it possible for backup software vendors to embed the intelligence needed to protect virtual machines without the need to install a backup agent within the guest operating system. Through these APIs, the backup software can create snapshots of virtual machines and copy those to backup storage.

Okay, now let us come to the point. As a VMware administrator what do I need to do to make use of VADP? Where do I download VADP? The answer is…Ensure that you are NOT using hosts with free ESXi licenses.

  1. Ensure that you are NOT using hosts with free ESXi licenses
  2. Choose a backup product that has support for VADP

The first one is easy to determine. If you are not paying anything to VMware, the chances are that you are using free ESXi. In that case, the only way to protect data in VMs is to run a backup agent within the VM. No VADP benefits.

Choosing a backup product that supports VADP can be tricky. If your organization is migrating to a virtualized environment, see what backup product is currently in use for protecting physical infrastructure. Most of the leading backup vendors have added support for VADP. Symantec NetBackup, Symantec Backup Exec, IBM TSM, EMC NetWorker, CommVault Simpana are examples.

If you are not currently invested in a backup product (say, you are working for a start-up), there are a number of things you need to consider. VMware has a free product called VMware Data Recovery (VDR) that supports VADP. It is an easy to use virtual appliance with which you can schedule backups and store it in deduplicated storage. There are also point products (Quest vRanger, Veeam Backup & Replication etc.) which provide additional features. All these products are good for managing and storing backups of virtual machines on disk for shorter retention periods. However, if your business requirements need long term retention, you would need another backup product to protect the backup repositories of these VM only solutions which can be a challenge. Moreover, it is less unlikely to see businesses that are 100% virtualized. You are likely to have those NAS devices for file serving, desktops and laptops for end users and so on.  Hence a backup product that supports both physical systems and VADP are ideal in most solutions.

Although VADP support is available from many backup vendors, the devil is in the details. Not all solutions use VADP the same way. Furthermore, many vendors add innovative features on top of VADP to make things better.  We will cover this next.

Back to NetBackup 101 for VMware Professionals main page

Next: Coming Soon!

Turning cheap disk storage into an intelligent deduplication pool in NetBackup

5. NetBackup Intelligent Deduplication Pool

Deduplication for backups does not need an introduction. In fact, deduplication is what made disk storage a viable alternative for tapes. Deduplication storage is available from several vendors in the form of pre-packaged storage and software. Most of the backup vendors also provide some level of data reduction using deduplication or deduplication-like features.

Often we hear that backups of virtual environments are ideal for deduplication. While I agree with this statement, several articles tend to give the wrong perception when it comes to why it is a good idea.

The general wisdom goes like this. As there are many instances of guest operating systems, there are many duplicate files and hence deduplication is recommended.  A vendor may use this reasoning to sell you the deduplication appliance or to differentiate their backup product from others. This is short-sighted view. First of all, multiple instances of the same version of operating system are possible even when your environment is not virtualized; hence that argument is weak. Secondly, operating system files contribute less than 10% of your data in most virtual machines hosting production applications. Hence if a vendor tells you that you need to group virtual machines from the same template to be on a backup job to make use of ‘deduplication’; what they provide is not true deduplication. Typically such techniques involve simply using the block level tracking provided by vStorage APIs for Data Protection (vADP) combined with excessive compression. Data reduction does not go beyond a given backup job.

Behold NetBackup Intelligent Deduplication. We talked about NetBackup media servers before. Attach cheap disk storage of your choice and turn on NetBackup Intelligent Deduplication by running a wizard. Your storage transforms into a powerful deduplication pool that deduplicates inline across multiple backup jobs. You can deduplicate at the target (i.e. the media server) or you can let it deduplicate at the source, if you have configured a dedicated VMware backup host.

Why is this referred to as an intelligent deduplication pool? When backup streams arrive, the deduplication engine sees the actual objects (files, database objects, application objects etc.) through a set of patent pending technologies referred to as Symantec V-Ray. Thus it deduplicates blocks after accurately identifying exact object boundaries. Compare this to third party target deduplication devices where the backup stream is blindly chopped to guess the boundaries and identify duplicate segments.

   The other aspect of NetBackup Intelligent Deduplication pool is its scale-out capability.  The ability to grow storage and processing capacity independently as your environment grows.  The storage capacity can be grown from 1TB to 32TB  thereby letting you protect 100s of terabytes of backup images. In addition you can add additional media servers to do dedupe processing on behalf of the media server hosting the deduplication storage. The scale out capability can also be established by simply adding additional VMware backup hosts. The global deduplication occurs across multiple backup jobs, multiple VMware backup hosts and multiple media servers. It is scale out in multiple dimensions! A typical NetBackup environment can protect multiple vSphere environments and deduplicate across virtual machines in all of them.

Back to NetBackup 101 for VMware Professionals main page

vPower: brand new solution, really?

When I started exploring AIX nearly eight years ago, there were two things that fascinated me right of the bat. I was already a certified professional for Solaris that time. I had also managed Tru64 UNIX and HP-UX mainly for Oracle workloads. Those used to be the time of tuning shared memory, message queue and semaphore parameters. During my days working as a contractor for a large financial institution and later for VERITAS/Symantec NetBackup technical support; tuning the UNIX system kernel for IPCS parameters were more of a norm than exception. AIX intrigued me because it featured a dynamic kernel! It was really a big deal for the kind of job I used to do!

The second thing that looked unique in comparison with rest of the UNIX platforms was AIX’s mksysb. In AIX, you could send the entire rootvg (all the boot files, system files and additional data file systems you may want to include in the root volume group) to a backup tape. When you need to restore your system from bare metal, you simply boot from tape medium and run the installer; your system is back to the same point in time when you did the mksysb backup. Furthermore, if needed, you can also boot from tape and restore selected files with a little help from tape positioning commands.

I went on to get certified on AIX, not just because of those two bells and whistles, but VERITAS Storage Foundation was expanding to AIX and it was a good thing to add AIX certification when we integrated its snapshot capabilities in NetBackup.

The mksysb started to become a bit obsolete for two reasons.

  1. It is expensive to have a standalone tape drive with every pSeries system. Not just because of the need for a tape drive on each system, rather the increased operational expenditure for the system administrators to manually track tapes with mksysb images for each system and also maintain a time-series catalog of all images.
  2. Enterprise data protection solutions like NetBackup added Bare Metal Restore (BMR) support. NetBackup BMR feature makes it possible to recover any physical system (be it AIX, HP-UX, Solaris, Linux, Windows…) from bare metal just by running a single command on master server to tell NetBackup that a client needs to be rebuild from bare metal. You also have the option to specify whether you need to bring the client to the most recent point in time (suitable in case of hardware failures) or a point in time from the past (suitable in case of logical corruptions that had happened before the most recent backup). After that you simply reboot the client. The client boots from network and recovers itself. The process is 100% unattended once the reboot is initiated.

What about virtual machines? You can indeed use NetBackup BMR feature on virtual machines. It is supported. The availability of deeper integration with VMware vADP and Hyper-V VSS makes it possible to perform agent-less backups of virtual machines whereby you could restore the entire VM or individual objects. Hence you don’t need it for VMs hosted by those hypervisors. You can use NetBackup BMR for VMs on other hypervisors like Citrix XenServer, IBM PowerVM, Oracle VM Server etc.  With NetBackup BMR and NetBackup Intelligent Deduplication, you have a solution no matter how many kinds of hypervisors are powering your clouds.

Why this story? Recently, during the after-party of a PR event hosted by Intel; I had a conversation with an old friend. He works for an organization who happens to be a partner for Veeam. He mentioned about Veeam and Visioncore are having a patent battle on the ability to run a system directly from the backup image. Veeam calls this feature as vPower, VisionCore calls it FlashRestore. This technology is really the virtual machine version of what IBM offered for AIX pSeries systems. You boot and run the system directly from the backup image and recover the whole system or selected files. The value additions like the flexibility to keep it running while being able to live migrate it to production storage comes from VMware’s innovative Storage vMotion technology which isn’t really something Veeam or VisionCore can take credit for. Visioncore may not have much difficulty fighting this battle.

We had a good laugh when we pulled Veeam’s marketing pitch on U-AIR which is nothing but running the VM from backup and copy required application files back to production VM over the wire. He raised his iPad to show Veeam’s datasheet to the group.

“vPower also enables quick recovery of individual objects from any virtualized application, on any OS. It’s a brand-new solution to the age-old problem of what to do when users accidentally delete important emails or scripts incorrectly update records.”

Brand new solution for the age-old problem, really?

NetBackup media servers and vSphere ESXi hosts: The real workhorses

4. OpenStorage for secondary storage, now VMware is onto the same thing for VM storage

If vCenter is the control and command center in vSphere environment, the ESXi hosts are the workhorses really doing most of the heavy lugging. ESXi hosts house VMs. They provide CPU, memory, storage and other resources for virtual machines to function. Along the same line, NetBackup media servers make backups, restores and replications happen in a NetBackup domain under the control of NetBackup master server. Media servers are the ones really ‘running’ various jobs.

ESXi hosts have storage connected to them for housing virtual machines. This storage allocated to ESXi hosts is called data store. More than one ESXi host can share the same data store. In such configurations, we refer to the set of ESXi hosts as an ESXi cluster.

NetBackup media servers also have storage connected to them for storing backups. More than one media server can share the same storage. NetBackup decouples storage from media server in its architecture to a higher degree than vSphere ESXi hosts. An ESXi host does not treat storage as intelligent. Although most enterprise grade storage systems have more intelligence built-in, you still have to allocate LUNs from the storage for ESXi hosts. VMware understands that the old school method of storage (which had been used in the industry over many decades) does not scale well and does not take advantage of a number of features and functions the intelligent storage systems can manage on their own. If you were in VMworld 2011, you may already know that VMware is taking steps to move away from LUN based storage model. See Nick Allen’s blog for more info. NetBackup took the lead for secondary storage half a decade back!

NetBackup is already there! Symantec announced OpenStorage program along the same time NetBackup 6.5 was released which revolutionized the way backups are stored on disks. All backup vendors treat disk in the LUN model. You allocate a LUN to the backup server and create a file system on top of it. Or you present a file system to the backup server via NFS/CIFS share. To make the matter worse, some storage systems presented disk as tape using VTL interfaces. The problem with these old school methods can be categorized into two.

First of all, the backup application is simply treating the intelligent storage system as a dumping ground for backup images. There is really no direct interaction between the backup system and storage system. Thus if your storage system has the capability to selectively replicate objects to another system, the backup server does not know about the additional copy that was made. If your storage system is capable of deduplicating data, the backup server does not know about it. Thus the backup server cannot intelligently manage storage capacity. For example, free space reported at the file system layer may be 10Gb, but it may be able to handle a 50Gb backup as the storage features deduplication. Similarly expiring a backup image with size 100Gb may not really free up that much space, but the backup server has no way of knowing this.

Secondly, the general-purpose file systems like NTFS, UFS, ext3, CIFS, NFS etc are optimized for random access. This is a good thing for production applications. But it comes with its own additional overhead. Backups and restores generally follow sequential I/O pattern with large chunks of writes and reads. For example, presenting a high performance deduplication system like NetBackup 5000 series appliances, Data Domain, Quantum DXi, Exgrid etc as NFS share would imply unnecessary overhead as NFS protocol is really for random access.

Symantec OpenStorage addresses this problem by asking storage vendors to provide OpenStorage disk pools and disk volumes for backups. This is just like what VMware wants Capacity pools and VM volumes to do for VM data stores in future. OpenStorage is a framework where NetBackup media servers simply provide a framework using which it can query, write read from intelligent storage systems. The API and SDK is made available to storage vendors so that they could develop plug-ins. When this plug-in is installed on media server, now the media server gains the intelligence to see the storage system and speak its language. Now the media server can simply stream backups to the storage device (without depending on overloaded protocols) and the intelligent storage system can store it in its native format. The result is 3 to 5x faster performance and the ability to tap into other features in storage system like replication.

In NetBackup terms, now the media server is simply a data mover. It moves data from client to storage. Since the storage system is intelligent and media server can communicate with it, it is referred as a storage server. Multiple media servers can share a storage server. When backups (or other jobs like restores, duplication etc) need to be started, NetBackup master server determines which media server has the least load. Then the selected media server loads the plug-in and preps the storage server to start receiving backups. You can compare this to the way VMware DRS and HA works where vCenter server picks the least loaded ESXi hosts for starting a VM from common data store.

Okay, so we talked about intelligent storage servers. How about the dump storage (JBOD) and tape drives? NetBackup media servers support those as well. Even in the case of a JBOD, which can be attached to the media server, NetBackup media servers make them intelligent! That story is next.

Next: Coming Soon!

Back to NetBackup 101 for VMware Professionals main page

 

NetBackup master server and VMware vCenter

3. The control and command center

vCenter server is the center of an enterprise vSphere environment. Although the ESXi hosts and virtual machines will continue to function even when vCenter server is down, enterprise data centers and cloud providers cannot afford such a downtime. Without vCenter server crucial operations like vMotion, VMware HA, VMware FT, DRS etc will cease to function. A number of third party applications count on plug-ins in vCenter server. A number monitoring and notification functions are governed by vCenter. Hence larger enterprises and cloud providers deploy vCenter on highly redundant systems. Some use high availability clustering solutions line Microsoft Cluster Server or VERITAS cluster server. Some deploy vCenter on a virtual machine protected by VMware HA that is run by another vCenter server.

NetBackup master server plays a similar role. It is the center of NetBackup domain. If this system goes down, you cannot do backups or restores. Unlike vCenter server which runs on Windows (and now on Linux) you can deploy master server on a variety of operating systems like Windows, enterprise flavours of Linux, AIX, HP-UX and Solaris. NetBackup includes cluster agents for Microsoft Cluster Server, VERITAS Cluster Server, IBM HACMP, HP-UX Service Guard and Sun/Oracle Cluster for free. If you have any of these HA solutions, NetBackup lets you install master server with an easy to deploy cluster installation wizard.

An enterprise vCenter server uses a database management system, usually Microsoft SQL Server, for storing its objects. NetBackup comes with Sybase ASA which is embedded in the product. This is a highly scalable application database. No need to provide a separate database management system.

In addition to Sybase ASA database, NetBackup also stores backup image metadata (index) and other configurations in binary and ASCII formats. The entire collection of Sybase ASA databases, binary image indexes and ASCII based databases is referred to as NetBackup Catalog.  NetBackup does provide you a specific kind of backup policy called Catalog backup policy to copy the entire catalog to secondary storage devices (disk or tape) easily. Thus even if you lose your master server, you can perform a catalog recovery to rebuild the master server.

In VMware, you might have dealt with vCenter Server HeartBeat. This feature provides you that capability to replicate vCenter configuration to a remote site so that you could start the replicated vCenter server at that site in case of primary site loss. NetBackup goes a bit further. Unlike vCenter HeartBeat which has Active-Passive architecture, NetBackup provides A.I.R (Auto Image Replication). When you turn on A.I.R for your backups, NetBackup appends the catalog information for the backup in the backup image itself. The images are replicated using storage device’s native replication engine. At the remote site you can have a fully functional master server (which is serving to media servers and clients locally). The device on the remote master server domain which receives A.I.R images can automatically notify the remote master. The remote master now imports the image catalog info from storage. Unlike traditional import processes where the entire image needs to be scanned for recreating the catalog remotely, this optimized import finishes in a matter of seconds (even if the backup image was several terabytes) because the catalog info is embedded within the image for quick retrieval. The result is Active-Active NetBackup domains at both sites. They could replicate in both directions and also act as the DR domain for each other. You can have many NetBackup domains replicating to a central site (fan-in), one domain replicating to multiple domains (fan-out) or a combination of both. This is why NetBackup is the data protection platform that cloud pilots need to master. It is evolving to serve clouds which typically span multiple sites.

vCenter integrates with Active Directory to provide role based access control. Similarly NetBackup provides NetBackup Access Control that can be integrated with Active Directory, LDAP, NIS and NIS+. NetBackup also features audit trails so that you can track users’ activities.

One thing that really makes NetBackup stand out from the point solutions like vRanger, Veeam etc is the ability for virtual machine owners (the application administrators) to self serve their recovery needs. For example, the Exchange administrator can use NetBackup GUI on client, authenticate himself/herself and browse Exchange objects in backups. NetBackup presents the objects directly from its catalog or live browse interface depending on the type of object being requested.  The user simply selects the object needed and initiates the restore. NetBackup does the rest. There are no complex ticket systems where the application owner makes a request to backup administrator. No need to mount an entire VM on your precious ESX resources in production just to retrieve a 1k object. No need to learn how to manipulate objects (for example, the need to manually run application tools to copy objects from a temporary VM) and face the risks associated with user errors. All the user interfaces directly connect to master server, it figures out what to restore and starts the job on a media server.

Well, so NetBackup is an enterprise platform that makes a traditional VM administrator to a cloud pilot of the future. It is nice to see that NetBackup has support for various hardware and operating systems. Is there a way to deploy a NetBackup domain without building a master server on my own? The answer is indeed yes!! NetBackup 5200 series appliances are available for you for this purpose. These appliances are built on Symantec hardware and can be deployed in a matter of minutes. Everything you need to create a NetBackup master server and/or media server is available in these appliances.

Next: Coming Soon!

Back to NetBackup 101 for VMware Professionals main page

Deduplication for dollar zero?

One of the data protection experts asked me a question after reading my blog on Deduplication Dilemma: Veeam or Data Domain.

I am paraphrasing his question as our conversation was limited to 140 characters at time through Twitter.

“Have you seen this best practice blog on Veeam with Exagrid? Here is the blog.  It says not to do reverse incremental backups. The test Mr. Attlia ran was incomplete. The Veeam deduplication at the first pass is poor, but after that it is worth it, right?”

These are all great questions. I thought of dissecting each aspect and share it here. Before I do that I want to make it clear that deduplication devices are fantastic for use in backups. These work great with backup applications that really offer the ability to restore individual objects. If the backup application ‘knows’ how to retrieve specific objects from backup storage, target deduplication adds a lot of value.  That is why NetBackup, Backup Exec, TSM, NetWorker and the like play well with target deduplication appliances. Veeam, on the other hand, simply mounts the VMDK file from backup store and asks the application administrator to fish for the item he/she is looking for. This is where Veeam falls apart if you try to deploy it in medium to large environments. Although target deduplication appliances are disk based, they are optimized more for sequential access as backup jobs mostly follow sequential I/O pattern. When you perform random I/O on these devices (as it happens when a VM is directly run from it), there is a limit to which those devices can perform.

Exagrid: a great company helping out customers

Exgrid has an advantage here. It has flexibility to keep the most recent backup in hydrated form (Exagrid uses post-process deduplication) which works well with Veeam if you employ reverse incremental backups. In reverse incremental backups, the most recent backup is always a full backup. You can eliminate the performance issues inherent in mounting the image on an ESX host when the image is being served in hydrated form. This is good from the recovery performance perspective.  However, Exagrid recommends not turning on reverse incremental method because it burdens the appliance during backups. This is another dilemma; you have to pick backup performance or recovery performance (RTO), not both.

Let me reiterate this. The problem is not with Exagrid in this case. They are sincerely trying to help customers who happened to choose Veeam. Exagrid is doing the right thing; you want to find methods to help out customers in achieving ROI no matter what backup solution they ended up choosing. I take my hat off at Exagrid in respect.

Now let us take a closer look at other recommendations from Exagrid to alleviate the pain points with Veeam.

Turn off compression in Veeam and Optimize for Local target:  Note that Exagrid suggested turning off compression and choosing Optimize for Local target option. These settings have the effect of eliminating most of what Veeam’s deduplication offers. By choosing those options, you let the real guy (Exagrid appliance) do the work.

Weren’t Mr Attila’s tests incomplete?

Mr. Attila stopped tests after the initial backup. The advantage of deduplication is visible only on subsequent backups. Hence his tests weren’t complete. However, as I stated in the blog; that test simply triggered my own research. I wasn’t basing my opinions just on Mr. Attila’s tests. I should have mentioned this in the earlier blog, but it was already becoming too big.

As I mentioned in the blog earlier, Veeam deduplication capabilities are limited. Quoting Exagrid this time: “Once the ExaGrid has all the data, it can look at the entire job at a more granular level and compress and dedupe within jobs and also across jobs! Veeam can’t do this because it has data constantly streaming into it from the SAN or ESX host, so it’s much harder to get a “big picture” of all the data.”   

If Veeam’s deduplication is the only thing you have, the problem is not just limited to the initial backup. Here are a few other reasons why a target deduplication is important when using Veeam.

  1. The deduplication is limited to a job. Veeam’s manual recommends putting VMs created from the same template into a single job to achieve that dedupe rate. It is true that VMs created from the same template have a lot of redundant OS files and whitespace so the dedupe rate will be good at the beginning. But these are just the skins or shells of you enterprise production data. The real meat is the actual data which is less likely to be the same across multiple VMs. We are better of giving that task to the real deduplication engines!
  2. Let us say you have a job with 20 production VMs. You are going to install something new on one of the VM, so you prefer to do a one-time backup before making any changes. Veeam requires you to create a new job to do this. This is not only inconvenient, but now you lose the advantage of incremental backup. You have to stream the entire VM again. Can we afford this in a production environment?
  3. Veeam incremental backups are heavily dependent on vCenter server. If you move a VM from one vCenter to another or if you had to rebuild your vCenter (Veeam cannot protect an enterprise grade vCenter running on a physical system, but let us not go there for now), you need to start seeding full backups for all your VMs. For example, if you want to migrate from a traditional vCenter server running 4.x to a vCSA 5.0, expect to reseed all the backups again.

My point is that Veeam deduplication is not something you can count on to protect a medium to large environment with these limitations. It has the price of $0 for a reason.

NetBackup and Backup Exec let you take advantage of target deduplication appliances to the fullest potential. As these platforms tracks which image has the objects the application administrator is looking for, they can simply retrieve those objects alone from backup storage. The application administrator can self-serve their needs, no need for  20th century ticket system! The journey to the Cloud starts with empowering users to self-serve their needs from the Cloud.

Deduplication Dilemma: Veeam or Data Domain?

Recently I came across a blog post from Szamos Attila. He ran a deduplication contest between Data Domain and Veeam. His was a very small test environment, just 12 virtual machines with 133GB of data. His observations were significant. I thought I share it here.

Veeam vs. Data Domain
Veeam vs. DataDomain Deduplication Contest run by Szamos Attila

You can read more about Mr. Attila’s tests at his blog here

What does this tell you right of the bat? Well, Veeam’s deduplication is slow; not a big deal as they do not charge for deduplication separately. Not a big deal, right?

Not exactly; there is much more to this story if you take a look at the big picture.

First of all, note that this is a very small data set (just 133GB, even my laptop has more data!). Veeam’s deduplication is not really a true deduplication engine that fingerprints data segments and stores only one copy. It is basically a data reduction technique that works only on a predefined set of backup files. Veeam refers to this set of backup files as a backup repository. You can only run one backup job to a given repository at any given time. Hence if you want to backup two virtual machines concurrently, you need to send them to two different backup repositories. If you do that your backup data is not deduplicated across those two jobs. Thus your data reduction strategy using Veeam’s deduplication and concurrent processing of jobs are inversely proportional to one another. This is a major drawback as VMs generally contain a lot of redundant data. In fact, Veeam recommends to run deduplication mainly for a given backup set where all the VMs come from the same template.

Secondly, note that even with a single backup repository; this tiny data set (of just 133GB) took twice as long as Data Domain’s deduplication. Now think about a small business environment with a few terabytes of data. Imagine the time it would take to protect that data. When it comes to an enterprise data center (100s of terabytes); you must depend on a target based deduplication solution like Data Domain to get the job done.

So, can I simply let Veeam do the data movement and count on Data Domain do the deduplication? That is one way to solve this problem. But you have a multitude of other issues with that approach because of the way Veeam does restores.

Veeam does not have a good way to let application administrators in guest operating system (e.g. Exchange administrator on a VM running Microsoft Exchange) self-serve their restore needs. First the application administrator submits a ticket for restore. Then the backup administrator will mount the VMDK files from backup using a temporary VM that starts up in a production ESX host. Even to restore a small object, you have to allocate resources for the entire VM (the marketing name for this multi-step restore is U-AIR) in the ESX host. As this VM needs to ‘run’ from backup storage, it is not recommended for the backup image to be on a deduplicated storage being served through network. As target deduplication devices are designed for streaming data sequentially, the random I/O pattern caused by running a VM from such storage is painfully slow. This is even stated by the partners who are offering deduplication storage for Veeam. HP did tests with Veeam using HP StoreOnce target deduplication appliance and have published a white paper on this, please see this whitepaper in Business Week . See the section on Performance Considerations.

It is to be further noted that only the most recent backup typically stays as a single image in Veeam’s reverse incremental backup strategy. If you are in an unfortunate need to restore from a copy that is not the most recent copy, the performance degrades further while running the temporary VM from backup storage as a lot of random I/O needs to happen at the back-end.

Even after somehow you patiently waited for VM to startup from backup storage, the application administrator needs to figure out how to restore the required objects. If the object is not there in the currently mounted backup image, he/she has to send another ticket to Veeam administrator to mount a different backup image on a temporary VM. This saga continues until the application administrator finds the correct object. What a pain!

There you have it. On one side you have scalability and backup performance issues if using Veeam’s deduplication. On the other side, you have poor recovery performance and usability issues when using a target deduplication appliance with Veeam. This is the deduplication dilemma!

The good news is that target deduplication devices work well with NetBackup and Backup Exec. Both these products provide user interfaces for application administrators so that they could self-serve their recovery needs. At the same time, VM backup and recovery remains agent-less. The V-Ray powered NetBackup and Backup Exec has the capability to stream the actual object from the backup; no need to mount it using a temporary VM.