Planning a Disaster Recovery

Well, February was a lost cause for me, between the flu and three emergencies that came in that required me and not the old man.   I fell behind on everything, and ended up in emergency “I put out fires” mode until this week.   As bad as it was for me, it was almost worse for one of my customers.

Since I haven’t updated routinely in a while, I’m glad to get back to weekend updates.  That aside, I feel compelled to share some of the knowledge from the most labor intensive project, some of the planning and the things learned may serve to be useful for people who have data to preserve, although I think most of it is common sense and I’m probably preaching to a choir.  Forgive me if I oversimplify or fail to simplify, both are possible given the technical nature of the topic:

On Feb 4th, one of my customers had his office broken into and his Laptop- his primary work computer- stolen.

Since this customer is in the financial industry, he was required to have a disaster recovery plan for reestablishing operations within 24~48 hours of a disaster. (I’m not sure which set of rules he falls under, or I’d reference them.)  We updated his recovery plan about 4 years ago, and about eight months ago we helped him obtain a new office computer as the old one was dying a slow death.

Disaster recovery plans are interesting things.  Fundamentally they boil down to coming up with a good answer to the question of “How do I not lose anything?” and on the computer side of things, redundancy is the answer.

In the case of my customer, I created a minimum of four levels of redundancy in the form of a cloud backup service (Insert Carbonite sales pitch here) and a local network backup to another computer in his office.  This is a fairly common and generic plan, and it probably falls short of what the industry considers “best practices”, but I’m sharing it since it’s easy to set up and is relatively inexpensive.  If you have a desktop, a laptop, and are willing to invest in a cloud backup program or purchase extra space on something like skydrive/dropbox/etc, a little bit of time will let you implement the same or a similar plan to protect your own info.

The logic goes something like this:

  1. In the event his system failed, he could swap to the backup system while purchasing replacements.Technical note – The backup in the implementation we used is provided by having a second windows computer with a shared folder and using windows sync center to synchronize the contents between the two computers over the network.  This process is fully automatic with the exception of “collisions” – situations where the local copy and remote copy are changed, or one is deleted.  These have to be resolved manually, and there’s an interface for resolving that that pops up down by the system tray/notification area.There are other third party programs that will do this service as well, a few of which are able to use Volume Shadow Copy to synchronized locked files like outlook PSTs/OSTs.  These programs are generally not free, although some offer crippleware/trialware versions.
  2. If the backup system fails, he still has cloud backups to rely on while the backup is replaced.
  3. If the cloud service is shut down, experiences an outage, or freezes his account he still has local backups.  In this case, I happen to know that Carbonite keeps redundant copies of information in more than one data center, meaning the technical side is unlikely to experience outages.

This specific incident fell under 1.  When the call came in, before we realized how much was required, my dad went out to the customer’s office to handle it himself.  He discovered that the thieves had left the backup computer alone, and quickly shifted it over, reestablishing operational minimums immediately as virtually all the information was already there.  I was pulled in to get the CRM software reinstalled and recover some minor stuff that couldn’t be marked for synchronization with windows sync.  (Databases that employ file locking are a pain, more on that later.)

Once we had basic function restored, the curveball that caused the majority of the work came: He wanted to take the opportunity to update all of his computer use habits and more forward on cloud adoption as his industry’s periodicals are emphasizing taking advantage of the cloud.  We spent the next 4 weeks improving the recovery plan,  purchasing replacement equipment and software, and fixing the hiccups that inevitably come from upgrading all of your 5~10 year old software at once.

Some useful observations from this process, generalized:

Consider what kind of backup you need.

 While automated cloud services are great for most things (For example, they met all of my customer’s needs), occasionally you will need a backup where you can roll back to an earlier point.  Most cloud backup services will not offer incremental rollback, and if they do it will rarely go back farther than 30 days – they’re doing good to keep ahead of their customers’ CURRENT data requirements.  For those files where revisions matter, revision control software like Git can be a lifesaver.  (Git is also not the only of it’s kind, just the most popular)

Revision control software is typically used by programmers to keep track of their work, so if they break something that they can go back and see what it looked like before they broke it.  While Git isn’t intended for backing up personal data it works remarkably well in that role.   As Git is intended for programmers, it might be a little hard to set up your own private git server if you aren’t one, but there are some nice front-end tools for it and some great guides on using it.  A possible remote backup strategy might be using a cloud service to back up your Git repository, or subscribing to a closed-source git hosting service.   That closed source note is important.  Most of the Git hosting options you can sign up for are not private.  Verify you can keep private what needs to be private before you start using something like Github.

When in doubt, you can always fall back to the tried and true practice of creating your own local incremental backup.   Most classical “backup to CD/DVD/tape” software will feature an “incremental backup” option.  While not as convenient, it does work and will give you a fallback option, albeit with greater amounts of nannywork.

Beware of any database driven application that you are trying to backup with automated cloud backup services that simply backup files.

In particular, anything that uses MSSQL or MSSQL Lite seems to be particularly vulnerable to this, but MySQL, Btrieve and other variations of databases might have similar issues: there’s a resource conflict that occurs when the automated backup service and the database service both access the same file at the same time.  Certain programs are able to use Volume Shadow Copy to touch “in use” data without triggering a conflict, but it still doesn’t work very well.  You can usually identify these programs because they either say “installing sql” as part of their initial setup, or will just stop working when you add the data to your cloud backup program’s list of things to back up.  They also might throw a fit if you encrypt them with bitlocker or the like.

Database driven applications in general have their own backup process.  In the case of most end user programs, they feature a menu option for creating a manual backup.  For actual server based database tools, there’s typically instructions somewhere on the internet for using a database export option to create a backup, and often ways to automate it.

Make sure that what you’re trying to back up is actually storing it’s data where you think it is.

There’s more than one way this common problem can rear its head.  Verify that you’re backing up the right files by checking for Software data backup/maintenance instructions on product websites and forums.  There’s also utilities out there to help with this, although I couldn’t tell you much about those.

Besides just failing to back up a file you can end up restoring data to the wrong place or in the event of having three or four versions of a file, simply restore the wrong version.  The “last changed” date also may not be correct, which is the issue I ran into with a particularly terrible CRM program- the software’s “data file” is nothing more than an XML document specifying the folder that the actual information is stored in, and so it will only show the date the original file was created on, while other files in subdirectories will show the correct date.  This is not uncommon with database driven applications, either.

Other things to be mindful of are the C:\programdata\ folder and the (user)\appdata\ folders.  Lots of files end up in those folders, usually they’re not data so much as user settings and temp files, but some major software hides it’s data files there, such as Firefox, Chrome, Outlook, Minecraft. etc.   Minecraft players will take note that their lives can be summed up by the size  of the (user)\appdata\roaming\.minecraft\world\ folder.

Some software from the Windows 3.1~XP era are also notorious for saving data inside their program folder or straight out of the C drive, but this is not possible if you are using UAC and decline to grant a program administrator rights.  The go to example here is old copies of quicken, which would default to storing all of its information in C:\quickenw\.  If a program like this lets you choose where to save, save it in your user folder to simplify things.  (This even is good advice on non-windows operating systems, the only thing all the operating systems agree on is that a user’s user folder is his to do with as he pleases.)

When recovering, verify data integrity THEN upgrade applications.

It’s an easy thing to have a data file with corruption issues creep along unnoticed for years because it doesn’t affect the version of software you’re using, but then all of a sudden when you go to upgrade, it throws all sorts of errors.  It’s a very easy mistake to install the latest version of a program and upgrade your restored backup all in one go, only to end up with unusable data that you blame on a bad backup.

This one has an analogy in cars – You have an old car, it runs good and you never have problems with it, lend it to a family member, they experience all sorts of problems with it because they’re expecting something different than what they get.  It’s the same with different versions of a program and the data they expect.  If it expects something different than what it gets, sometimes it will just say “I can’t upgrade this it’s corrupt”.

If there’s something wrong with the data, be prepared to run in circles while you figure out ways to generate a clean copy for your software.  If the software doesn’t offer a data repair tool (most do), A common and easy fix is to open it up in the old program and create a manual backup/export file, if the application allows for it.  It doesn’t always work, but many times it will.

Plan for new hardware before recovering from a failure.

If you are integrating new devices at the same time you’re restoring data, Plan the order you set things up in advance before you start restoring data to minimize complications.In the case of my customer, he made the decision based on his office building’s insecurity to relocate his local backup computer from the workplace to his house, requiring a rework of his backup implementation.  The solution we chose was to install skydrive on both computers to sync them through the cloud.  A similar cloud service would be required if he had chosen to add a Tablet to his workflow, depending on which kind of tablet.

Implementing earlier than later allowed me to make the experience appear seamless,  installing skydrive on his new computer in my workshop before delivery and allowing me to sync his data overnight on 35 Mb/s internet rather than 7 Mb/s internet in his workplace.

You can usually save yourself some time or work just by doing a little bit of planning.  Still, don’t forget to be leery of running upgrades from the start.

Don’t buy an upgrade without planning for the upgrade after.

When upgrading or replacing obsolete software during a recovery, be careful to choose software based on how progressive the company’s adoption of new technology is.  Cloud and Handbrain (Smartphone/tablet) adoption is the current trend, and at the moment it’s almost impossible to tell what tomorrow’s trend will be, but it’s important to keep an eye out for trends while they’re still optional.  Many software companies will not survive the transition into the Handbrain era, others will bend over backwards to support every platform that has any semblance of mainstream implementation.   (And sadly, it has been decided we’re moving into the handbrain era, regardless of if that’s actually the right choice- that’s another rant, tho’.)

My personal approach to this is to identify the things that would be important to have accessible on the go and look for companies trying to fill that need.  Most people don’t want to fight with a touchscreen input system for something like word processing, but would be happy to see/change their schedule on the fly.  The more convenient something would be, the more likely someone is to attempt to sell it.  If you choose wisely, you can enjoy these conveniences, if you choose poorly you could be looking at everyone around you thinking “I need to get that program…

On a related note, keep a saved wishlist of parts with a vendor that meet your technical requirements, and update them every 6 or so months.  This way if something fails you don’t have to go shopping around, just verify that your parts list is still current and place your order.

Keep a local copy of your info.

When choosing cloud software, always look for a provider who allows you to keep private backups.  There is very little as frustrating as losing all of your work to an account closure.

This is one reason why I like the cloud services that sync data to your harddrive, such as skydrive, google drive, and dropbox, as opposed to straight SAS software with no local copy.  If Microsoft were to arbitrarily decide a file somehow violated their licensing agreement for whatever reason, there would still be local copies.

If you take most of this into account, you should do pretty well at protecting data and knowing what you need.  Now a short list of things to avoid when designing a recovery plan:

Excessive paranoia about security.  

I don’t want to get too in depth on security here because it’s a complicated topic  and could easily take this post from 3k words up to 10k and beyond.  And that’s just going through the highly plausible list – if you review every possible, every imaginable case you could be dealing with security forever.  Building a plan that’s “Good enough” for your specific purpose and features easy to implement damage control options is generally superior to making a perfect plan that has multiple layers of encrypted backup stored in five or more locations that never touches the cloud.  Going over the top is like  the Scooby Doo antagonists and their elaborate schemes- in the end, something will happen to your perfect plan and it’ll be the fault of those meddling script kiddies.

That’s not to say that security is a bad idea.  There are some very nice folder encryption tools that are available specifically for encrypting your dropbox or skydrive folder.  Just keep it simple.   “I have a program to encrypt this and I keep a backup of the decryption key in a secure place” is an easy plan to implement with a minimal amount of work and expense.  Two factor authentication is also slowly becoming the new norm.   (I strongly encourage you to turn on two factor authentication if you’re willing to put up with it.)

Generally speaking, the information you’re storing dictates the level of security you need, and some information should simply not be stored digitally at all.  Most files on a computer are not worth encrypting.  Sure, it might be embarrassing if someone hacks your computer/cloud account and your vacation photos end up on reddit, causing your derpy tourist moment to become a new meme, but that’s not going to obliterate your life the same way it would if that hacker discovered you’ve built what to him is the ultimate identity theft kit in a Word doc.   Bank account numbers?  Social Security numbers?  Credit card numbers?  Scans of your birth certificate?  I personally wouldn’t keep them on a computer, but if you do, that data does need to be encrypted.

All that time spent making a perfect plan?  Could be better spent combing your computer and cloud services to make sure you haven’t left sensitive information stored in an insecure place, such as a PDF/doc/jpg file or in your webmail inbox/outbox, or just spent memorizing a password longer than 6 characters.  (Anything important really deserves a unique 16+ character password, and, well, XKCD’s joke on the subject that almost deserves to be mandatory reading, because I think everyone loses sight of how little it takes to build a long password that’s easy to remember.)

Relying on technology in a way that introduces a sole point of failure.

I’ve emphasized redundancy earlier, so this should be redundant by itself, but any sole point of failure is bad.

The thing that made me think about this one is probably a bit more obscure, and I haven’t seen it in years, but years back I had a computer with RAID Mirroring that used a certain kind of proprietary RAID controller.  More specifically, the kind where you can’t read a harddrive without it being part of the RAID array.  This isn’t quite the redundancy you think it is – Sure, if a drive fails you can swap it without a system outage, but what happens when the controller fails?

For a more down to earth sample, a common mistake is to have RAID Mirroring and because your data is mirrored, you don’t think you need to do farther backups.  In the event of common destruction incidents (Fire/Lightning/theft) where the entire computer or building is obliterated, if you’ve failed to have an offsite backup (cloud service, tape drive, hotswap HDs, rotating USB drives, etc.) you’ve lost data.

A less common but frequent mistake is to use encryption and fail to keep proper backups of your key.   If something happens and you can’t gain access to your key, it doesn’t matter how many backups of your data you have, you’ve just lost all of them to a single point of failure.  Your key backup also falls into a special category where it does require some paranoia to properly secure.  That is to say, you shouldn’t keep it on your computer at all.  If you back it up in a cloud service, it should have two factor authentication enabled.   If you back it up on USB sticks you should back it up on more than one and store them in an appropriate place, like a safe.  If you have a safe deposit box, keeping one USB drive/CD/etc. there along with a paper copy of the key in a legible font at a good size would also be a good idea.

 

Well, that was long, but hopefully helpful.  It was also by no means exhaustive – there’s no end of information on the subject.

Now, to get back to programming!  😀

1 Comment »

  1. TNT Said,

    March 16, 2013 @ 9:34 pm

    That was a very long, but very interesting read. Thank you for sharing Norren.

RSS feed for comments on this post

Leave a Comment