Snapshots: Data Backup and Recovery in a Snap

By CIOReview | Friday, August 19, 2016

Backup and recovery imparts computing system with a degree of time independence. Science of storage and data backup, in parallel with advancements in cloud technology and virtualization has had impressive maturity.  Emergence of a competitive market and software defined storage has fueled innovation to achieve tasks which were once regarded too expensive or herculean.

Features like snapshots and flat backups have enabled virtualization to be improvised as a speedy backup mechanism. Snapshots are copies of the Virtual Machine’s Disk file (VMDK) and can be used in restoring VMs. They pose as an alternative to time consuming backup procedures by enabling faster recovery for systems. Unlike traditional backup software that changes the format of the backup data, snapshots maintain the original disk-based format.

The snapshot feature of virtual machines was originally introduced by VMWare as modest recovery option. They were not considered as a liable backup option since they initially showed incompatibility issues in the context of application servers. Lack of application-level support was also a common complaint against using snapshots. However, the feature has matured over the years to significantly improve Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO). Snapshot mechanisms are now endowed with application awareness which gives them the ability to retain information about applications such as state, resource requirement and utilization patterns which in turn enable optimization of data layouts, tweaks in caching behaviors and improve quality of service. Relatively newer redirect-on-write (ROW) snapshots are designed such that applications are least affected compared to the earlier copy-on-write (COW) method. Flat backups, the method of replicating snapshots on to other secure locations, addresses the vulnerability of snapshots being susceptible to corruption at the instance of the very storage mechanism they reside on going awry.

To get the most off Flat backup, a recommended best practice is to get vendors implement a three-storage-system approach comprising of two on-site and one offsite storage system (exclusively for disaster recovery). Seasoned software offered by vendors may be used to orchestrate the storage systems with snapshot schedule. Cross replicating snapshots between the two onsite systems would allow data center to employ two less-powerful systems instead of one high-performance system. However, the offsite system must be equipped with enough capacity to accept data from both the onsite systems. Such a setup would ensure continuity in data access even at the event of interruptions.

Similar to a digital camera (or its burst shot feature), any number of snapshots can be taken at any time while giving administrators the ability to manage or delete them at will; storage space being the only confining factor. From capacity consumption perspective, although it is necessary to allocate space for snapshots, the metrics regarding the frequency of snapshots and space allocation would depend on factors like nature of workload and vendor specifications. There is a paradigm shift happening in the backup realm. More emphasis is laid on the backup meta-data than the data itself. As a matter of fact, snapshot meta-data can manifest itself as a visual/virtual database detailing backup history which provides clear insights into recovery points and recovery time objectives. Snapshot Manager Software may be used to manage multiple copies of meta-data they generate. They may be a part of an application, a file system, hypervisor, software-defined storage platform or a physical storage array.

Although advancements in disaster recovery preparedness and virtualization have imparted innovative capabilities to the recovery and retention technologies, several analysts advocate for traditional backup options citing reasons of security and longevity. Traditional backup approaches can also be very well implemented on to a cloud ecosystem. Issues such as bandwidth limitation and memory type mismatch is inevitable addressable too. Since either of the approaches is not exclusive, considering opportunities to implement both in tandem after thorough evaluation of a firm’s objectives and product offerings is a farfetched intent. It would perhaps be the ideal strategy offering all round protections to a company’s data assets.