Introduction to VMware vStorage APIs for Data Protection aka VADP

6. Getting to know NetBackup for VMware vSphere

Note: This is an extended version of my blog in VMware Communities: Where do I download VADP? 

Now that we talked about NetBackup master servers and media servers, it is time to get into learning how NetBackup Client on VMware backup host (sometimes known as VMware proxy host) protects the entire vSphere infrastructure. In order to get there, we first need a primer on vStorage APIs for Data Protection (VADP) from VMware.  We will use two blogs to cover this topic.

Believe it or not, this question of what VADP really is comes up quite often in VMware Communities, especially in Backup & Recovery Discussions

Backup is like an insurance policy. You don’t want to pay for it, but not having it is the recipe for sleepless nights. You need to protect data on your virtual machines to guard against hardware failures and user errors. You may also have regulatory and compliance requirements to protect data for longer term.

With modern day hardware and cutting edge hypervisors like that from VMware, you can protect data just by running a backup agent within the guest operating system. In fact, for certain workloads; this is still the recommended way.

VMware had made data protection easy for administrators. That is what vStorage APIs for Data Protection (VADP) is. It available since vSphere 4.0 release. It is a set of APIs (Application Programming Interfaces) made available by VMware for independent backup software vendors. These APIs make it possible for backup software vendors to embed the intelligence needed to protect virtual machines without the need to install a backup agent within the guest operating system. Through these APIs, the backup software can create snapshots of virtual machines and copy those to backup storage.

Okay, now let us come to the point. As a VMware administrator what do I need to do to make use of VADP? Where do I download VADP? The answer is…Ensure that you are NOT using hosts with free ESXi licenses.

  1. Ensure that you are NOT using hosts with free ESXi licenses
  2. Choose a backup product that has support for VADP

The first one is easy to determine. If you are not paying anything to VMware, the chances are that you are using free ESXi. In that case, the only way to protect data in VMs is to run a backup agent within the VM. No VADP benefits.

Choosing a backup product that supports VADP can be tricky. If your organization is migrating to a virtualized environment, see what backup product is currently in use for protecting physical infrastructure. Most of the leading backup vendors have added support for VADP. Symantec NetBackup, Symantec Backup Exec, IBM TSM, EMC NetWorker, CommVault Simpana are examples.

If you are not currently invested in a backup product (say, you are working for a start-up), there are a number of things you need to consider. VMware has a free product called VMware Data Recovery (VDR) that supports VADP. It is an easy to use virtual appliance with which you can schedule backups and store it in deduplicated storage. There are also point products (Quest vRanger, Veeam Backup & Replication etc.) which provide additional features. All these products are good for managing and storing backups of virtual machines on disk for shorter retention periods. However, if your business requirements need long term retention, you would need another backup product to protect the backup repositories of these VM only solutions which can be a challenge. Moreover, it is less unlikely to see businesses that are 100% virtualized. You are likely to have those NAS devices for file serving, desktops and laptops for end users and so on.  Hence a backup product that supports both physical systems and VADP are ideal in most solutions.

Although VADP support is available from many backup vendors, the devil is in the details. Not all solutions use VADP the same way. Furthermore, many vendors add innovative features on top of VADP to make things better.  We will cover this next.

Back to NetBackup 101 for VMware Professionals main page

Next: Coming Soon!

Turning cheap disk storage into an intelligent deduplication pool in NetBackup

5. NetBackup Intelligent Deduplication Pool

Deduplication for backups does not need an introduction. In fact, deduplication is what made disk storage a viable alternative for tapes. Deduplication storage is available from several vendors in the form of pre-packaged storage and software. Most of the backup vendors also provide some level of data reduction using deduplication or deduplication-like features.

Often we hear that backups of virtual environments are ideal for deduplication. While I agree with this statement, several articles tend to give the wrong perception when it comes to why it is a good idea.

The general wisdom goes like this. As there are many instances of guest operating systems, there are many duplicate files and hence deduplication is recommended.  A vendor may use this reasoning to sell you the deduplication appliance or to differentiate their backup product from others. This is short-sighted view. First of all, multiple instances of the same version of operating system are possible even when your environment is not virtualized; hence that argument is weak. Secondly, operating system files contribute less than 10% of your data in most virtual machines hosting production applications. Hence if a vendor tells you that you need to group virtual machines from the same template to be on a backup job to make use of ‘deduplication’; what they provide is not true deduplication. Typically such techniques involve simply using the block level tracking provided by vStorage APIs for Data Protection (vADP) combined with excessive compression. Data reduction does not go beyond a given backup job.

Behold NetBackup Intelligent Deduplication. We talked about NetBackup media servers before. Attach cheap disk storage of your choice and turn on NetBackup Intelligent Deduplication by running a wizard. Your storage transforms into a powerful deduplication pool that deduplicates inline across multiple backup jobs. You can deduplicate at the target (i.e. the media server) or you can let it deduplicate at the source, if you have configured a dedicated VMware backup host.

Why is this referred to as an intelligent deduplication pool? When backup streams arrive, the deduplication engine sees the actual objects (files, database objects, application objects etc.) through a set of patent pending technologies referred to as Symantec V-Ray. Thus it deduplicates blocks after accurately identifying exact object boundaries. Compare this to third party target deduplication devices where the backup stream is blindly chopped to guess the boundaries and identify duplicate segments.

   The other aspect of NetBackup Intelligent Deduplication pool is its scale-out capability.  The ability to grow storage and processing capacity independently as your environment grows.  The storage capacity can be grown from 1TB to 32TB  thereby letting you protect 100s of terabytes of backup images. In addition you can add additional media servers to do dedupe processing on behalf of the media server hosting the deduplication storage. The scale out capability can also be established by simply adding additional VMware backup hosts. The global deduplication occurs across multiple backup jobs, multiple VMware backup hosts and multiple media servers. It is scale out in multiple dimensions! A typical NetBackup environment can protect multiple vSphere environments and deduplicate across virtual machines in all of them.

Back to NetBackup 101 for VMware Professionals main page

NetBackup media servers and vSphere ESXi hosts: The real workhorses

4. OpenStorage for secondary storage, now VMware is onto the same thing for VM storage

If vCenter is the control and command center in vSphere environment, the ESXi hosts are the workhorses really doing most of the heavy lugging. ESXi hosts house VMs. They provide CPU, memory, storage and other resources for virtual machines to function. Along the same line, NetBackup media servers make backups, restores and replications happen in a NetBackup domain under the control of NetBackup master server. Media servers are the ones really ‘running’ various jobs.

ESXi hosts have storage connected to them for housing virtual machines. This storage allocated to ESXi hosts is called data store. More than one ESXi host can share the same data store. In such configurations, we refer to the set of ESXi hosts as an ESXi cluster.

NetBackup media servers also have storage connected to them for storing backups. More than one media server can share the same storage. NetBackup decouples storage from media server in its architecture to a higher degree than vSphere ESXi hosts. An ESXi host does not treat storage as intelligent. Although most enterprise grade storage systems have more intelligence built-in, you still have to allocate LUNs from the storage for ESXi hosts. VMware understands that the old school method of storage (which had been used in the industry over many decades) does not scale well and does not take advantage of a number of features and functions the intelligent storage systems can manage on their own. If you were in VMworld 2011, you may already know that VMware is taking steps to move away from LUN based storage model. See Nick Allen’s blog for more info. NetBackup took the lead for secondary storage half a decade back!

NetBackup is already there! Symantec announced OpenStorage program along the same time NetBackup 6.5 was released which revolutionized the way backups are stored on disks. All backup vendors treat disk in the LUN model. You allocate a LUN to the backup server and create a file system on top of it. Or you present a file system to the backup server via NFS/CIFS share. To make the matter worse, some storage systems presented disk as tape using VTL interfaces. The problem with these old school methods can be categorized into two.

First of all, the backup application is simply treating the intelligent storage system as a dumping ground for backup images. There is really no direct interaction between the backup system and storage system. Thus if your storage system has the capability to selectively replicate objects to another system, the backup server does not know about the additional copy that was made. If your storage system is capable of deduplicating data, the backup server does not know about it. Thus the backup server cannot intelligently manage storage capacity. For example, free space reported at the file system layer may be 10Gb, but it may be able to handle a 50Gb backup as the storage features deduplication. Similarly expiring a backup image with size 100Gb may not really free up that much space, but the backup server has no way of knowing this.

Secondly, the general-purpose file systems like NTFS, UFS, ext3, CIFS, NFS etc are optimized for random access. This is a good thing for production applications. But it comes with its own additional overhead. Backups and restores generally follow sequential I/O pattern with large chunks of writes and reads. For example, presenting a high performance deduplication system like NetBackup 5000 series appliances, Data Domain, Quantum DXi, Exgrid etc as NFS share would imply unnecessary overhead as NFS protocol is really for random access.

Symantec OpenStorage addresses this problem by asking storage vendors to provide OpenStorage disk pools and disk volumes for backups. This is just like what VMware wants Capacity pools and VM volumes to do for VM data stores in future. OpenStorage is a framework where NetBackup media servers simply provide a framework using which it can query, write read from intelligent storage systems. The API and SDK is made available to storage vendors so that they could develop plug-ins. When this plug-in is installed on media server, now the media server gains the intelligence to see the storage system and speak its language. Now the media server can simply stream backups to the storage device (without depending on overloaded protocols) and the intelligent storage system can store it in its native format. The result is 3 to 5x faster performance and the ability to tap into other features in storage system like replication.

In NetBackup terms, now the media server is simply a data mover. It moves data from client to storage. Since the storage system is intelligent and media server can communicate with it, it is referred as a storage server. Multiple media servers can share a storage server. When backups (or other jobs like restores, duplication etc) need to be started, NetBackup master server determines which media server has the least load. Then the selected media server loads the plug-in and preps the storage server to start receiving backups. You can compare this to the way VMware DRS and HA works where vCenter server picks the least loaded ESXi hosts for starting a VM from common data store.

Okay, so we talked about intelligent storage servers. How about the dump storage (JBOD) and tape drives? NetBackup media servers support those as well. Even in the case of a JBOD, which can be attached to the media server, NetBackup media servers make them intelligent! That story is next.

Next: Coming Soon!

Back to NetBackup 101 for VMware Professionals main page


NetBackup master server and VMware vCenter

3. The control and command center

vCenter server is the center of an enterprise vSphere environment. Although the ESXi hosts and virtual machines will continue to function even when vCenter server is down, enterprise data centers and cloud providers cannot afford such a downtime. Without vCenter server crucial operations like vMotion, VMware HA, VMware FT, DRS etc will cease to function. A number of third party applications count on plug-ins in vCenter server. A number monitoring and notification functions are governed by vCenter. Hence larger enterprises and cloud providers deploy vCenter on highly redundant systems. Some use high availability clustering solutions line Microsoft Cluster Server or VERITAS cluster server. Some deploy vCenter on a virtual machine protected by VMware HA that is run by another vCenter server.

NetBackup master server plays a similar role. It is the center of NetBackup domain. If this system goes down, you cannot do backups or restores. Unlike vCenter server which runs on Windows (and now on Linux) you can deploy master server on a variety of operating systems like Windows, enterprise flavours of Linux, AIX, HP-UX and Solaris. NetBackup includes cluster agents for Microsoft Cluster Server, VERITAS Cluster Server, IBM HACMP, HP-UX Service Guard and Sun/Oracle Cluster for free. If you have any of these HA solutions, NetBackup lets you install master server with an easy to deploy cluster installation wizard.

An enterprise vCenter server uses a database management system, usually Microsoft SQL Server, for storing its objects. NetBackup comes with Sybase ASA which is embedded in the product. This is a highly scalable application database. No need to provide a separate database management system.

In addition to Sybase ASA database, NetBackup also stores backup image metadata (index) and other configurations in binary and ASCII formats. The entire collection of Sybase ASA databases, binary image indexes and ASCII based databases is referred to as NetBackup Catalog.  NetBackup does provide you a specific kind of backup policy called Catalog backup policy to copy the entire catalog to secondary storage devices (disk or tape) easily. Thus even if you lose your master server, you can perform a catalog recovery to rebuild the master server.

In VMware, you might have dealt with vCenter Server HeartBeat. This feature provides you that capability to replicate vCenter configuration to a remote site so that you could start the replicated vCenter server at that site in case of primary site loss. NetBackup goes a bit further. Unlike vCenter HeartBeat which has Active-Passive architecture, NetBackup provides A.I.R (Auto Image Replication). When you turn on A.I.R for your backups, NetBackup appends the catalog information for the backup in the backup image itself. The images are replicated using storage device’s native replication engine. At the remote site you can have a fully functional master server (which is serving to media servers and clients locally). The device on the remote master server domain which receives A.I.R images can automatically notify the remote master. The remote master now imports the image catalog info from storage. Unlike traditional import processes where the entire image needs to be scanned for recreating the catalog remotely, this optimized import finishes in a matter of seconds (even if the backup image was several terabytes) because the catalog info is embedded within the image for quick retrieval. The result is Active-Active NetBackup domains at both sites. They could replicate in both directions and also act as the DR domain for each other. You can have many NetBackup domains replicating to a central site (fan-in), one domain replicating to multiple domains (fan-out) or a combination of both. This is why NetBackup is the data protection platform that cloud pilots need to master. It is evolving to serve clouds which typically span multiple sites.

vCenter integrates with Active Directory to provide role based access control. Similarly NetBackup provides NetBackup Access Control that can be integrated with Active Directory, LDAP, NIS and NIS+. NetBackup also features audit trails so that you can track users’ activities.

One thing that really makes NetBackup stand out from the point solutions like vRanger, Veeam etc is the ability for virtual machine owners (the application administrators) to self serve their recovery needs. For example, the Exchange administrator can use NetBackup GUI on client, authenticate himself/herself and browse Exchange objects in backups. NetBackup presents the objects directly from its catalog or live browse interface depending on the type of object being requested.  The user simply selects the object needed and initiates the restore. NetBackup does the rest. There are no complex ticket systems where the application owner makes a request to backup administrator. No need to mount an entire VM on your precious ESX resources in production just to retrieve a 1k object. No need to learn how to manipulate objects (for example, the need to manually run application tools to copy objects from a temporary VM) and face the risks associated with user errors. All the user interfaces directly connect to master server, it figures out what to restore and starts the job on a media server.

Well, so NetBackup is an enterprise platform that makes a traditional VM administrator to a cloud pilot of the future. It is nice to see that NetBackup has support for various hardware and operating systems. Is there a way to deploy a NetBackup domain without building a master server on my own? The answer is indeed yes!! NetBackup 5200 series appliances are available for you for this purpose. These appliances are built on Symantec hardware and can be deployed in a matter of minutes. Everything you need to create a NetBackup master server and/or media server is available in these appliances.

Next: Coming Soon!

Back to NetBackup 101 for VMware Professionals main page

NetBackup domain and vSphere domain

2. The resemblance is uncanny

When you took your first class for VMware vSphere, you would have noticed that VMware platform is based on three-tier architecture. It is quite easy to learn NetBackup if you are already a certified VMware professional.

You have virtual machines; this is the life blood of the organization. This is where your applications are running.  Multiple virtual machines are hosted by ESXi hosts. Multiple ESXi hosts are managed a vCenter server. That is how the scalability is achieved and vSphere became an enterprise platform.

NetBackup pioneered this model more than a decade ago. It features three tiers. At the lowest level is NetBackup clients. Multiple clients may be protected using a NetBackup media server.  Multiple media servers are managed by a NetBackup master server.

NetBackup and vSphere: Architecture
NetBackup and vSphere: Architecture

NetBackup clients can be a physical systems (a Windows PC, a Mac, a UNIX system hosting an Oracle database etc.) sending backup streams to a media server. In a virtual environment, it can be a virtual machine or a physical system that can read data from VMware datastore. It is important to remember that your production virtual machines themselves do not stream backups, that operation is offloaded to a dedicated VM or a physical system. This system is known as VMware backup host. Thus NetBackup is providing agent-less backups for your virtual machines.

Now let us look at the media server. In terms of our architectural comparison, we compared a media server in NetBackup domain to an ESXi host in vSphere domain. Just like ESXi hosts have storage connected for serving virtual machines, media server has storage attached to it for serving backup clients. The storage connected used by ESX hosts is referred to as data store or primary storage.  It is on primary storage that your production virtual machines and applications live. The storage attached to media servers are known as secondary storage or backup storage. It is used for the purpose of storing the backups.

You know that ESXi systems can support multiple kinds of data stores. You have NFS data stores and VMFS data stores. You also know that VMFS can be on direct attached, Fibre Channel SAN attached or iSCSI SAN attached. Similarly media server can have secondary storage attached to it. There could be plain disk storage, capacity managed disk storage, deduplicated disk storage or even a tape library.  The disk storage may be directly mounted on media server or being served from dedicated storage server.

We know that multiple ESXi hosts can share the same data store. Similarly it is multiple media servers can share a storage server or tape library. Just like how VMware DRS can start VMs based on where least ESXi hosts, backup jobs are load balanced across media servers.

We know that vCenter is really the center of vSphere. vCenter is the management control station. Similarly, NetBackup master server is the center of NetBackup. Just like vCenter, master server hosts a central database and manages data protection for the entire backup environment.

In a vSphere domain ESXi (the ESXi hypervisor) and VMs coexist in a physical system. In NetBackup domain the media server and clients are almost always on different physical systems. There are some exceptions to this rule.

vCenter is generally a separate system for enterprise environments. For smaller environments, it could also be a VM on and ESXi host. Similarly, NetBackup master server is a separate system for large environments. It may also coexist with a media server.

It is worth mentioning that NetBackup also has a fourth tier. It is called NetBackup OpsCenter. NetBackup OpsCenter can do management and reporting on a number of NetBackup domains that are served by different master servers. This layer makes NetBackup even better scalability. You may have data centers across the globe. A NetBackup master server at each data center managers its own media servers that are protecting the clients. All these master servers report into OpsCenter. This is it like a super control and command center. By logging into this central OpsCenter dashboard, you can get a single-pane-of-glass view for the entire data protection infrastructure.

For a very crude comparison, think about vCenter Server Heartbeat that lets you manage multiple vCenter instances. OpsCenter is like vCenter heartbeat but much superior. OpsCenter is a standalone system with its own database for manages, monitor and report tasks. vCenter Heartbeat is more or less a glue that makes it possible to view all instances of vCenter from a single vSphere client GUI.

That is it for today! We will move on to details on each of these three layers in subsequent blogs.

Next: NetBackup Master Server vs. VMware vCenter Server

Back to NetBackup 101 for VMware Professionals main page

From VMware Administrator to Cloud Pilot

1. Your journey to the cloud

Are you an expert in virtualization? What is your title? VMware System Administrator, Virtualization Specialist, VM Administrator, vSphere guru……

Congratulations! You are on your way to Cloud. Sean Regan wrote a blog on Cloud Pilots. As organizations are getting into virtualizing business critical applications, the six figure income (or whatever is the equivalent) has to come from something bigger that managing a bunch of ESX hosts.

Why? Well, smaller environments are less likely host their own virtual infrastructure. Their computing requirements are served from Cloud. So they need you at Cloud providers’ data centers. Larger businesses may have their own private clouds. Whether you are a technology savvy college grad looking to start a career in IT or an expert in managing virtual machine environments, it will be a huge plus to add a few additional skills in your arsenal to get ready for the future.

How do we get there? If we have learned something from the recent economic downturn, the keyword is relevance. More and more organizations are looking for multi-skilled professionals. As a VMware professional you have already gotten some level of exposure to storage. That is one area where you could sharpen the saw. Another area you must have exposure is data protection. This is really the golden nugget for the future.

Data Protection is a huge responsibility and hence comes with higher compensation benefits. Now that cloud providers (whether serving external or internal customers) are hosting everything for mission critical applications, their cloud infrastructure must meet the RPO and RTO requirements which was traditionally the role of a backup administrator.

If you are a vSphere professional for a small environment, you may be already using some point products to protect your virtual machine data. VMware Data Recovery (vDR), vRanger, Veeam etc. are a few examples. Now imagine your next role as a Cloud Pilot for a large organization.

  • The infrastructure is no more a bunch of ESX hosts managed by a vCenter server
  • Now you are not an exclusive virtual machine administrator, you are in charge of protecting assets on various platforms required to power the Cloud
  • You need end to end visibility into the entire infrastructure, you must see both physical and virtual layers powering the Cloud
  • Data protection is no more copying a bunch of VMDK files to disk storage. Enter the world of storage level replication, continuous data protection, long term archival and more

This is what inspired me to write a series of blogs for virtual machine administrators. It is easy to learn lower end solutions like the ones mentioned earlier as those are designed for very small environments. Those solutions cannot grow the Cloud Pilot in you. An enterprise data protection platform like NetBackup may look a bit intimidating at the first sight, but imagine your worth and relevance if you can be the one capable of managing the cloud. So here is my attempt to bring you the concepts of NetBackup in the language you are already familiar.

Next: NetBackup domain vs. vSphere domain

Back to NetBackup 101 for VMware Professionals main page