Backup, Archive and Restore – The keys to Disaster Recovery and Business Continuity
As we come to the close of 2009 and the larger close of a decade, I remind all of you out there to test your backups. With recent events like the loss of flight data at Shaw Air Force Base and subsequent realization that backups were not working, it is more important than ever to test the integrity of your backup, archive and restoration platforms. It is no secret that enterprises have slashed IT budgets over the last few years, and some of this pruning came at the expense of backup platforms and personnel. In many of these enterprises, no one has stepped up to verify that the critical data backups (if they are even backing this data up) are valid. I implore you to not be one of the negative statistics. Let’s take a look at the holy trinity of Disaster Recovery and Business Continuity.
Backup
Once a company has identified the business value of systems and data, they typically assign a risk value to losing those. This typically sets the wheels in motion to get a backup system in place. Backup, the very first step in Disaster Recovery and Business Continuity planning, is the base upon which you will build your strategy. Without the backup, there is nothing to archive nor restore later. Backing up typically entails duplicating data onto a secondary medium which acts as a safeguard against primary storage failure. This can be something as simple as a disk to disk (D2D) replication to a second storage system, or as complex as an NDMP stream across a fabric infrastructure to tape libraries waiting to write the data to magnetic tape media.
Archive
If we then take those tapes, on which we backed up our data, and move them to a secure remote storage facility, we have now archived that data. The main difference between a backup and an archive (although some hybrid solutions out there blur the lines a little) is that backups can be on-line or live while archives are typically off-line and stored somewhere other than the physical facility that the primary data storage resides. A disk to disk solution can be an intermediary to a full fledged tape backup and archive solution with the ability to restore data locally in a very rapid fashion. With an archive, you typically have to request the appropriate data set (on tape media) from the off-site facility, load it into the local tape library and restore the data from the media. This is essential for comprehensive Disaster Recovery and Business Continuity plans. If your primary physical location becomes unusable due to natural or man made disaster, you can relocate to a new facility and restore all data from tape media. The down time may be longer versus a local backup solution, but the down time is not permanent. Failure to have an adequate off-site data archival solution can result in loss of all data and permanent down time.
Restore
The end result of data loss or catastrophic storage system failure is a restore from the appropriate media. This final step should always be prefaced with several tests to validate the integrity of the data residing on the local or archived backup. When there is a loss of data, a local backup can be tapped to restore the lost data, or a data restoration from archival media can be made. Either way, the lost data is replaced with the last known good copy, and business continues as usual. Every successful restoration hinges on the validity of the data that was backed up in the initial stage. If any part of the backup and archival process failed, there is a good chance that the media will not be valid for restore. This issue can be compounded when your backup, archival and restore system does not properly alert an administrator to an error in the process. This is why tests of the backup media are so critical.
There are a plethora of backup, archival and restoration solutions on the market, and they vary greatly in cost and complexity. Some include features that many enterprises will not need, and some are missing in areas where a more robust solution is often warranted. It is most imperative that you first plan your Disaster Recovery and Business Continuity scenarios, then select the best platform to fit those. While one enterprise can get away with simple storage to tape archival, more complex enterprises will want array level replication to a hot site with local D2D cached copies and a full fledged tape archival solution. Adequate cost analysis and risk assessment should point you in the right overall direction. Avoid leaning on a particular vendor and seek the professional assessment and recommendation of a DR or BC adviser.
Some great news came out of Sun Microsystems yesterday with the release of VirtualBox 3.1.o. This is Sun’s virtualization platform, which has been at the core of many of Sun’s newest technologies. What is great about VirtualBox, aside from being a professional quality hypervisor based virtualization solution, is that it is open source.
In one of my older articles, I explained how Ubuntu Server has slowly made significant inroads in many enterprise data centers. Ubuntu has been the favorite distribution of many system administrators, architects and various other IT staff. It was these fans of the Linux distribution that brought Ubuntu into their data centers and slowly deployed Ubuntu into ever increasing roles. I have seen Ubuntu Server deployed in roles from simple kiosks and desktops to full blown web server and database farms. Many senior level managers do not even know that key components of their infrastructures are running on Ubuntu Server.
With the massive push toward cloud computing in the enterprise, there are some considerations that hardware vendors will have to come to terms with in the long run. Unlike the old infrastructure model with hardware bearing the brunt of fault tolerance, the new infrastructure model places all fault tolerance concerns within the software layer itself. I won’t say that this is a new concept as Google has been doing exactly this for a very long time (in IT time at least.) This accomplishes many things, but two particular benefits are that load balancing can now be more intelligent overall, and hardware can be reduced to the absolute most commodity parts available to cut cost.
Over the years, the heavy reliance on computers and servers has increased exponentially. With that reliance comes an increase in the overall number of machines in service. The server side of the business has ballooned recently to the point where businesses are faced with concerns such as electricity consumption, cooling and space constraints. Although the overall power (or capacity) of servers has grown rapidly over time, the efficient usage of these servers had not.