April 27, 2017

Small to Medium Business, Uptime and Single Points of Failure

Throughout the history of IT, there has always been an effort to reduce the number of single points of failure (SPOF). 

A single point of failure is simply defined as any part or system that, if it fails, brings down your entire system or network. We have always had to live in the smaller business world with several of these SPOF’s but whenever possible, with the parts that failed most often, we would have redundancy. Examples include Hard Drives combined together in RAID arrays or multiple power supplies in a server. (RAID systems often but not always use several hard disks to provide redundancy.)

These are great systems but they can exacerbate the problem of single points of failure in the case of RAID controllers which are also single points of failure. Although RAID controllers do not fail regularly, when they do, its ugly. And most clients don’t have spare RAID controllers sitting around. Other SPOF’s include mainboards or motherboards and processors. The resolution from these issues has often been relying on backup but that can have a terrible cost if you are paying 50 employees hourly as they wait for the backup to be restored. There could also be some loss of data depending upon how recent the last backup was. The more recent solutions are BDR (Backup and Disaster Recovery) Appliances and Virtualization specifically VMware shared storage. While neither solution is inexpensive, both provide for quick recovery should a SPOF fail. Each has a very different function and are not interconnected but can both be used in any IT environment. Each works as follows.

VMware Shared Storage

VMware shared storage is a solution that takes advantage of inexpensive storage. Specifically two or more physical servers (commonly described as hosts in the world of virtualization) keep exact copies of your “guest” servers on all the physical “hosts”. If a SPOF event happens on one physical server, you can simply turn on the “guest” server on the other physical device. Not clear? I’ll try again using the “host” and “guest” terms that define the virtualization world. Say your Uncle Wilbur comes to stay at your house.

But since your spouse doesn’t really like Uncle Wilbur, Uncle Wilbur decides to keep an exact replica of his belongings at your sisters house for the inevitable time when your spouse kicks him out. When it does happen, his belongings are available at your sisters house. It is a bit of simplification but hopefully the concept is clearer. VMware shared storage is NOT a backup solution. So anytime you think of using this technology make sure you budget for a good backup solution as well.

BDR Appliance

A Backup and Disaster Recovery Appliance is tool that leverages your backups to avoid downtime. While this device doesn’t work at all if you do not get good backups and test them frequently, if you do, it can minimize any losses due to SPOF’s. In the past if something failed and the part you needed was not readily available, you would run around like a chicken with your head cut off to find a replacement for the failed part or you replaced the whole system and recovered from backup, inevitably you would run into a few unexpected speed bumps in this recovery process and your employees would hang around asking if the system was up yet.

The BDR appliance resolves this by acting as the storage location for your backups and most importantly, it also acts as the server you can recover to in event of a failure. Effectively the appliance becomes the "host" if needed for your "guest" server.