Did Rubrik make Veeam’s Modern Data Protection a bit antiquated?

Veeam Antiquated?
Veeam Antiquated?

Modern Data Protection™ got a trademark from Veeam. No, I am not joking. It is true! Veeam started with a focused strategy. It will do nothing but VMware VM backups. Thankfully VMware had done most of the heavy lifting with vStorage APIs for Data Protection (VADP) so developing a VM-only backup solution was as simple as creating a software plugin for those APIs and developing a storage platform for keeping the VM copies. With a good marketing engine Veeam won the hearts of virtual machine administrators and it paid off.

As the opportunity to reap the benefits as a niche VM-only backup started to erode (intense competition, low barrier to entry on account of VADP), Veeam is attempting to re-invent its image by exploring broader use cases like physical systems protection, availability etc. Some of these efforts make it look like its investors are hoping for Microsoft to buy Veeam. The earlier wish to sell itself to VMware shattered when VMware adopted EMC Avamar’s storage to build its data protection solution.

Now Rubrik is coming to market and attacking the very heart of Veeam’s little playground while making Veeam’s modern data protection a thing of past. Rubrik’s market entry is also through VMware backups using vStorage APIs but with a better storage backend that can scale out.

Both Veeam and Rubrik have two high level tiers. The frontend tier connects to vSphere through VMware APIs. It discovers and streams virtual machine data. Then there is a backend storage tier where virtual machine data is stored.

For Veeam the front-end is a standalone backup server and its possible backup proxies. The proxies (thanks to VMware hot-add) enable limited level of scale-out for the frontend, but this approach leeches resources from production and increases complexity. The backend is one or more backup repositories. There is nothing special about the repository; it is a plain file system. Although Veeam claims to have deduplication built-in, it is perhaps the most primitive in the industry and works only across virtual machines from the same backup job.

Rubrik is a scale-out solution where the frontend and backend are fused together from users’ perspective. You buy Rubrik bricks where each brick consists of four nodes. These are the compute and storage components that cater to both frontend in streaming virtual machines from vSphere via NBD or SAN transport (kudos to Rubrik for ditching hot-add!) and backend, which is a cluster file system that spans nodes and bricks. Rubrik claims to have global deduplication across all its cluster file system namespace.

Historically, the real innovation from Veeam was the commercial success of powering on virtual machines directly from the backup storage. Veeam may list several other innovations (e.g. they may claim that they ‘invented’ agentless backups, but it was actually done by VMware in its APIs) in their belt but exporting VMs directly from backup is something every other vendor followed afterwards and hence kudos go to Veeam on that one. But this innovation may backfire and may help Veeam customers to transition to Rubrik seamlessly.

Veeam customers are easy targets for Rubrik for a few reasons.

  • One of the cornerstones of Veeam’s foundation is its dependency on vStorage APIs from VMware; it is not a differentiator because all VMware partners have access to those APIs. Unlike other backup vendors, Veeam didn’t focus on building application awareness and granular quiescence until late in the game
  • Veeam is popular in smaller IT shops and shadow projects within large IT environments. It is a handy backup tool, but it is not perceived as a critical piece in meeting regulatory specs and compliance needs. It had been marketed towards virtual machine administrators; hence higher-level buying centers do no have much visibility. That adversely affects Veeam’s ‘stickiness’ in an account.
  • Switching from one backup application to another had been a major undertaking historically. But that is not the case if customers want to switch from Veeam to something else. Earlier days, IT shops needed to standup both solutions until all the backup images from the old solution would hit the expiration dates. Or you have to develop strategies to migrate old backups into the new system, a costly affair. When the source is Veeam with 14 recovery points per VM by default, you could build workflows that spin up each VM backup in a sandbox and let the new solution back it up as if it is a production copy. (Rubrik may want to work on building a small migration tool for this)
  • Unlike Veeam that started stitching support for other hypervisors and physical systems afterwards, Rubrik has architected its platform to accommodate future needs. That design may intrigue customers when VMware customers are looking to diversify into other hypervisors and containers.

The fine print is that Rubrik is yet to be proven. If the actual product delivers on the promises, it may have antiquated Veeam. The latter may be become a good case study for business schools on not building a product that is dependent too much on someone else’s technology.

Thanks to #VFD5 TechFieldDay for sharing Rubrik’s story. You can watch it here: Rubrik Technology Deep Dive

Disclaimer: I work for Veritas/Symantec, opinions here are my own.

No point in locking the door when walls have fallen

Security for information assets is crucial for business continuity. Corporate information in the wrong hands can compromise the survival of an organization which is why security must be considered wherever that information lives. The last stop for the protection and recovery of this information is backup.  It protects against the loss of information from hardware failures, logical corruptions and human errors.  Unfortunately, the security of backup information can often be overlooked. As a leader in backup and security, Symantec advocates a holistic approach to managing and securing corporate information.

Let us consider a few of the key components for secure enterprise backup solutions.

  • The solution must securely transfer data from source to backup storage.
  • The solution must store the backup image securely.
  • It should offer authentication and authorization built-in to control access.  You don’t want an intruder or a client system masquerading as a production client and retrieving data from backups.
  • It must protect itself completely. If the system hosting the solution suffers a hardware failure, it must be able to recover image metadata, security certificates, access control rules and other important data so that the solution once again controls access to backup images.

While there are more considerations that could be listed, these underscore the broader requirements for an enterprise backup solution.  Backup is much more than moving data as quickly as possible from source to backup storage.  Other functions need to exist to ensure the security of backup images and to prevent the production data embedded in a backup image from falling into wrong hands.

Moving data to the cloud is a great use case to consider in more detail. In an era where organizations are looking to cut operating costs by moving information to the cloud, attention must be paid to the protection of user data from end-to-end, regardless of whether it is on primary storage on a VM hosted for the user or sent to a backup storage.  Choosing a backup solution that fails to deliver security for backup images may result in corporate embarrassment, liability and loss of business.

I came across an interesting post from Mike Beevor who works for Veeam as a Systems Engineer. You can read the details here. Mike creates a nice article by consolidating the scripted responses needed when an IT security team is evaluating the risks in using Veeam for VM backups in a secured environment. Unfortunately, he looks to have left one huge hole unattended. If a tornado knocks down the walls, is there any point in putting locks on rest of the doors?  Let me explain what I mean.

Locking the front door when walls have fallen!
Locking the front door when walls have fallen! Illustration by Scott G.

The failure point in question is the Veeam Backup File(VBK). When production VMs with precious data are backed by Veeam, it stores virtual machine files in a container file with extension vbk . This file is kept on a plain file system on a backup server with direct attached storage, SAN attached array or on a NAS device.

Most production VMs will have one VMDK file for operating system and one or more VMDK files for data. A utility was originally developed to provide users with a way to import a VM using a VBK backup file directly into vSphere. Veeam created this process because the backup solution does not offer a good way to protect itself (see rule number 4 – protect the backup & recovery system).  A person who gets a copy of a VBK file can import the VM in the file onto his/her own ESXi host, detach the data disk and mount it on his/her own VM to get access to production data! Veeam does not provide any sort of encryption for VBK files.  Unfortunately, the only way to recover individual objects from a Veeam backup is to run the entire VM from backup storage.

The lack of security for the container file makes it easy for anyone to retrieve data. The users of Veeam are already concerned about this weakness. Posts in the Veeam forums related to this issue seem to be conveniently moderated out. Here is an example of a post which appeared in a Google search before it was deleted for Veeam 6 launch. Finally a modified response appeared that Veeam will consider this as a future feature.

 

Customers requesting enhanced security for VBK, moderated thread reappeared recently

The user is asking for enhanced VBK security through the use of password protection.  It is a step in the right direction. Hopefully, Veeam will work on this soon.

As you can imagine, this is a huge security hole especially as users virtualize more mission critical applications.  Unfortunately, Veeam 6 makes this problem even worse. Now these VBK files are scattered around multiple repository hosts thereby increasing the chances for exposure.  

What to do if you are currently using Veeam for backups?

  • Enable file system level encryption on all repositories if overall performance after encryption is acceptable.
  • When using a NFS/CIFS based deduplication device for storage (e.g. Data Domain), enable encryption within the device.
  • Make sure that the NFS/CIFS shares are exposed only to proxy servers and Veeam servers. NAS devices’ default export policies are generally read-only access for ‘world’. In the case of VBK files this read-only access is enough to compromise production data in backups.
  • Harden passwords on all Veeam backup servers, repository servers and proxy servers.  Do not use the same password on all repository servers.
  • Talk to your security team; leverage their investments. For example, the security team may have Symantec Critical System Protection suite.  Install CSP agents on backup servers and repository servers to provide non-signature based Host Intrusion Prevention protection. It protects  against zero-day attacks using granular OS hardening policies along with application, user and device controls
  • Consider switching to a backup solution that offers encryption for both data in-flight and data at rest.  In fact, you may already be using another backup solution to backup Veeam backup files. Most backup applications offer vADP integration with VMware. Do you know that Backup Exec is #1 in Veeam backup? Veeam’s Doug Hazelman admitted this to Curtis Preston.

Veeam has done similar things in the past to conveniently hide the root of the problem with deflection techniques. Remember DCIG’s Jerome Wendt who uncovered the real motto behind SureBackup? Learn more about his discovery here.

Naturally, the next question is how a leader like Symantec provides security for backups.  Let us use Symantec’s Backup Exec as an example.

 

  • When making use of deduplication-folder (Backup Exec’s built-in deduplication), it is not possible for an intruder to identify specific backup images even if he gains access to the file system where the folder resides. If he decides to steal the entire folder, he cannot import the images on this folder to an alternate system without having the credentials needed to access this deduplication folder.
  • When sending backups to tape, software and hardware encryption (T10 encryption standard) are supported. Thus you do not have to worry about information getting leaked even if tapes are stolen.
  • Backup Exec uses security certificates between clients and media servers. It is not possible for an intruder to masquerade as a client and request a restore of production data.
  • Self-protection: Backup Exec not only protects the production data, but it has the capability to protect itself against hardware failures or human errors.

 

vPower: brand new solution, really?

When I started exploring AIX nearly eight years ago, there were two things that fascinated me right of the bat. I was already a certified professional for Solaris that time. I had also managed Tru64 UNIX and HP-UX mainly for Oracle workloads. Those used to be the time of tuning shared memory, message queue and semaphore parameters. During my days working as a contractor for a large financial institution and later for VERITAS/Symantec NetBackup technical support; tuning the UNIX system kernel for IPCS parameters were more of a norm than exception. AIX intrigued me because it featured a dynamic kernel! It was really a big deal for the kind of job I used to do!

The second thing that looked unique in comparison with rest of the UNIX platforms was AIX’s mksysb. In AIX, you could send the entire rootvg (all the boot files, system files and additional data file systems you may want to include in the root volume group) to a backup tape. When you need to restore your system from bare metal, you simply boot from tape medium and run the installer; your system is back to the same point in time when you did the mksysb backup. Furthermore, if needed, you can also boot from tape and restore selected files with a little help from tape positioning commands.

I went on to get certified on AIX, not just because of those two bells and whistles, but VERITAS Storage Foundation was expanding to AIX and it was a good thing to add AIX certification when we integrated its snapshot capabilities in NetBackup.

The mksysb started to become a bit obsolete for two reasons.

  1. It is expensive to have a standalone tape drive with every pSeries system. Not just because of the need for a tape drive on each system, rather the increased operational expenditure for the system administrators to manually track tapes with mksysb images for each system and also maintain a time-series catalog of all images.
  2. Enterprise data protection solutions like NetBackup added Bare Metal Restore (BMR) support. NetBackup BMR feature makes it possible to recover any physical system (be it AIX, HP-UX, Solaris, Linux, Windows…) from bare metal just by running a single command on master server to tell NetBackup that a client needs to be rebuild from bare metal. You also have the option to specify whether you need to bring the client to the most recent point in time (suitable in case of hardware failures) or a point in time from the past (suitable in case of logical corruptions that had happened before the most recent backup). After that you simply reboot the client. The client boots from network and recovers itself. The process is 100% unattended once the reboot is initiated.

What about virtual machines? You can indeed use NetBackup BMR feature on virtual machines. It is supported. The availability of deeper integration with VMware vADP and Hyper-V VSS makes it possible to perform agent-less backups of virtual machines whereby you could restore the entire VM or individual objects. Hence you don’t need it for VMs hosted by those hypervisors. You can use NetBackup BMR for VMs on other hypervisors like Citrix XenServer, IBM PowerVM, Oracle VM Server etc.  With NetBackup BMR and NetBackup Intelligent Deduplication, you have a solution no matter how many kinds of hypervisors are powering your clouds.

Why this story? Recently, during the after-party of a PR event hosted by Intel; I had a conversation with an old friend. He works for an organization who happens to be a partner for Veeam. He mentioned about Veeam and Visioncore are having a patent battle on the ability to run a system directly from the backup image. Veeam calls this feature as vPower, VisionCore calls it FlashRestore. This technology is really the virtual machine version of what IBM offered for AIX pSeries systems. You boot and run the system directly from the backup image and recover the whole system or selected files. The value additions like the flexibility to keep it running while being able to live migrate it to production storage comes from VMware’s innovative Storage vMotion technology which isn’t really something Veeam or VisionCore can take credit for. Visioncore may not have much difficulty fighting this battle.

We had a good laugh when we pulled Veeam’s marketing pitch on U-AIR which is nothing but running the VM from backup and copy required application files back to production VM over the wire. He raised his iPad to show Veeam’s datasheet to the group.

“vPower also enables quick recovery of individual objects from any virtualized application, on any OS. It’s a brand-new solution to the age-old problem of what to do when users accidentally delete important emails or scripts incorrectly update records.”

Brand new solution for the age-old problem, really?

Deduplication for dollar zero?

One of the data protection experts asked me a question after reading my blog on Deduplication Dilemma: Veeam or Data Domain.

I am paraphrasing his question as our conversation was limited to 140 characters at time through Twitter.

“Have you seen this best practice blog on Veeam with Exagrid? Here is the blog.  It says not to do reverse incremental backups. The test Mr. Attlia ran was incomplete. The Veeam deduplication at the first pass is poor, but after that it is worth it, right?”

These are all great questions. I thought of dissecting each aspect and share it here. Before I do that I want to make it clear that deduplication devices are fantastic for use in backups. These work great with backup applications that really offer the ability to restore individual objects. If the backup application ‘knows’ how to retrieve specific objects from backup storage, target deduplication adds a lot of value.  That is why NetBackup, Backup Exec, TSM, NetWorker and the like play well with target deduplication appliances. Veeam, on the other hand, simply mounts the VMDK file from backup store and asks the application administrator to fish for the item he/she is looking for. This is where Veeam falls apart if you try to deploy it in medium to large environments. Although target deduplication appliances are disk based, they are optimized more for sequential access as backup jobs mostly follow sequential I/O pattern. When you perform random I/O on these devices (as it happens when a VM is directly run from it), there is a limit to which those devices can perform.

Exagrid: a great company helping out customers

Exgrid has an advantage here. It has flexibility to keep the most recent backup in hydrated form (Exagrid uses post-process deduplication) which works well with Veeam if you employ reverse incremental backups. In reverse incremental backups, the most recent backup is always a full backup. You can eliminate the performance issues inherent in mounting the image on an ESX host when the image is being served in hydrated form. This is good from the recovery performance perspective.  However, Exagrid recommends not turning on reverse incremental method because it burdens the appliance during backups. This is another dilemma; you have to pick backup performance or recovery performance (RTO), not both.

Let me reiterate this. The problem is not with Exagrid in this case. They are sincerely trying to help customers who happened to choose Veeam. Exagrid is doing the right thing; you want to find methods to help out customers in achieving ROI no matter what backup solution they ended up choosing. I take my hat off at Exagrid in respect.

Now let us take a closer look at other recommendations from Exagrid to alleviate the pain points with Veeam.

Turn off compression in Veeam and Optimize for Local target:  Note that Exagrid suggested turning off compression and choosing Optimize for Local target option. These settings have the effect of eliminating most of what Veeam’s deduplication offers. By choosing those options, you let the real guy (Exagrid appliance) do the work.

Weren’t Mr Attila’s tests incomplete?

Mr. Attila stopped tests after the initial backup. The advantage of deduplication is visible only on subsequent backups. Hence his tests weren’t complete. However, as I stated in the blog; that test simply triggered my own research. I wasn’t basing my opinions just on Mr. Attila’s tests. I should have mentioned this in the earlier blog, but it was already becoming too big.

As I mentioned in the blog earlier, Veeam deduplication capabilities are limited. Quoting Exagrid this time: “Once the ExaGrid has all the data, it can look at the entire job at a more granular level and compress and dedupe within jobs and also across jobs! Veeam can’t do this because it has data constantly streaming into it from the SAN or ESX host, so it’s much harder to get a “big picture” of all the data.”   

If Veeam’s deduplication is the only thing you have, the problem is not just limited to the initial backup. Here are a few other reasons why a target deduplication is important when using Veeam.

  1. The deduplication is limited to a job. Veeam’s manual recommends putting VMs created from the same template into a single job to achieve that dedupe rate. It is true that VMs created from the same template have a lot of redundant OS files and whitespace so the dedupe rate will be good at the beginning. But these are just the skins or shells of you enterprise production data. The real meat is the actual data which is less likely to be the same across multiple VMs. We are better of giving that task to the real deduplication engines!
  2. Let us say you have a job with 20 production VMs. You are going to install something new on one of the VM, so you prefer to do a one-time backup before making any changes. Veeam requires you to create a new job to do this. This is not only inconvenient, but now you lose the advantage of incremental backup. You have to stream the entire VM again. Can we afford this in a production environment?
  3. Veeam incremental backups are heavily dependent on vCenter server. If you move a VM from one vCenter to another or if you had to rebuild your vCenter (Veeam cannot protect an enterprise grade vCenter running on a physical system, but let us not go there for now), you need to start seeding full backups for all your VMs. For example, if you want to migrate from a traditional vCenter server running 4.x to a vCSA 5.0, expect to reseed all the backups again.

My point is that Veeam deduplication is not something you can count on to protect a medium to large environment with these limitations. It has the price of $0 for a reason.

NetBackup and Backup Exec let you take advantage of target deduplication appliances to the fullest potential. As these platforms tracks which image has the objects the application administrator is looking for, they can simply retrieve those objects alone from backup storage. The application administrator can self-serve their needs, no need for  20th century ticket system! The journey to the Cloud starts with empowering users to self-serve their needs from the Cloud.

Deduplication Dilemma: Veeam or Data Domain?

Recently I came across a blog post from Szamos Attila. He ran a deduplication contest between Data Domain and Veeam. His was a very small test environment, just 12 virtual machines with 133GB of data. His observations were significant. I thought I share it here.

Veeam vs. Data Domain
Veeam vs. DataDomain Deduplication Contest run by Szamos Attila

You can read more about Mr. Attila’s tests at his blog here

What does this tell you right of the bat? Well, Veeam’s deduplication is slow; not a big deal as they do not charge for deduplication separately. Not a big deal, right?

Not exactly; there is much more to this story if you take a look at the big picture.

First of all, note that this is a very small data set (just 133GB, even my laptop has more data!). Veeam’s deduplication is not really a true deduplication engine that fingerprints data segments and stores only one copy. It is basically a data reduction technique that works only on a predefined set of backup files. Veeam refers to this set of backup files as a backup repository. You can only run one backup job to a given repository at any given time. Hence if you want to backup two virtual machines concurrently, you need to send them to two different backup repositories. If you do that your backup data is not deduplicated across those two jobs. Thus your data reduction strategy using Veeam’s deduplication and concurrent processing of jobs are inversely proportional to one another. This is a major drawback as VMs generally contain a lot of redundant data. In fact, Veeam recommends to run deduplication mainly for a given backup set where all the VMs come from the same template.

Secondly, note that even with a single backup repository; this tiny data set (of just 133GB) took twice as long as Data Domain’s deduplication. Now think about a small business environment with a few terabytes of data. Imagine the time it would take to protect that data. When it comes to an enterprise data center (100s of terabytes); you must depend on a target based deduplication solution like Data Domain to get the job done.

So, can I simply let Veeam do the data movement and count on Data Domain do the deduplication? That is one way to solve this problem. But you have a multitude of other issues with that approach because of the way Veeam does restores.

Veeam does not have a good way to let application administrators in guest operating system (e.g. Exchange administrator on a VM running Microsoft Exchange) self-serve their restore needs. First the application administrator submits a ticket for restore. Then the backup administrator will mount the VMDK files from backup using a temporary VM that starts up in a production ESX host. Even to restore a small object, you have to allocate resources for the entire VM (the marketing name for this multi-step restore is U-AIR) in the ESX host. As this VM needs to ‘run’ from backup storage, it is not recommended for the backup image to be on a deduplicated storage being served through network. As target deduplication devices are designed for streaming data sequentially, the random I/O pattern caused by running a VM from such storage is painfully slow. This is even stated by the partners who are offering deduplication storage for Veeam. HP did tests with Veeam using HP StoreOnce target deduplication appliance and have published a white paper on this, please see this whitepaper in Business Week . See the section on Performance Considerations.

It is to be further noted that only the most recent backup typically stays as a single image in Veeam’s reverse incremental backup strategy. If you are in an unfortunate need to restore from a copy that is not the most recent copy, the performance degrades further while running the temporary VM from backup storage as a lot of random I/O needs to happen at the back-end.

Even after somehow you patiently waited for VM to startup from backup storage, the application administrator needs to figure out how to restore the required objects. If the object is not there in the currently mounted backup image, he/she has to send another ticket to Veeam administrator to mount a different backup image on a temporary VM. This saga continues until the application administrator finds the correct object. What a pain!

There you have it. On one side you have scalability and backup performance issues if using Veeam’s deduplication. On the other side, you have poor recovery performance and usability issues when using a target deduplication appliance with Veeam. This is the deduplication dilemma!

The good news is that target deduplication devices work well with NetBackup and Backup Exec. Both these products provide user interfaces for application administrators so that they could self-serve their recovery needs. At the same time, VM backup and recovery remains agent-less. The V-Ray powered NetBackup and Backup Exec has the capability to stream the actual object from the backup; no need to mount it using a temporary VM.