Backup and Archive

 | March 1, 2009 9:42 pm

I love old photographs.  They are windows into the past and reminders that history is made of a long string of daily moments.  Consider the photo below, which was taken in Montana just after the turn of the century.  There is only one thing written on the back: Old Time Band.

As a (mostly aspiring) writer, this is just the sort of thing that gets me fired up.  Who are the people in this photo?  What did they do?  What brought them together to play their instruments?  Were they any good?  While it is fun to speculate on the answers to these questions, it is impossible to give any answers without more information.  The history of this photo is lost, and all that is left is speculation; and while speculation is a wonderful thing, it is thin fare when compared with the truth.

Old Time Band

History is central to our identities.  If we don’t know where we came from, we don’t know who we are.  And until quite recently, history was tactile: letters, pictures, artifacts, art.  But the digital era has fundamentally changed how we store the physical artifacts of memory.  Email replaced the letter, digital photography replaced film, scholarship moved online, and much of the most amazing art now exists only on the internet.  While digital certainly has the potential to exist forever, the reality is that it usually has a much shorter shelf-life.  In this way digital is infinitely inferior to physical.  And worse, when it a digital memory is destroyed, all evidence is gone forever.  A letter or grainy photograph can at least linger forgotten in the attic, ripe for rediscovery.  So here’s the bottom line: digital data needs to be treated with care.  It’s not just zeros and ones on a hard drive, but the full text of life and experience. There should be a plan to keep it safe and take care of it.

Backup: Keeping It Safe

At some point, everyone will lose a hard drive.  It is an extremely unnerving experience that can induces panic, rage, and miscellaneous despair.  Why this bizarre combination?  Because you are utterly powerless to stop it.  Hard Drives are mechanical devices and eventually they die, it is a sad truth of the physical universe.  Even solid state drives won’t last forever.  While they don’t have moving parts, there comes a time when individual blocks of memory can no longer accept a charge.  In both standard drives and solid state devices, the estimated lifespan is between five and seven years.  The only defense is to back things up.  Have a plan and use tools that simplify the process.  The more convenient the tools (with automatic being the best), the better you are able to respond to disaster.  People don’t often do things that are hard or painful, manual backup can be both.

Backup tools fall into two major camps: simple file backup and image based backup.   Here’s the important difference: a file based backup is fantastic for backing up pictures, documents, presentations, or the other miscellaneous pieces of a digital life.  Image based backup is used to restore a hard drive that has gone bad.  Both have their place.  Image based tools can require many hours to run, though they backup 100% of the data.  And while file based backup is faster and allows for multiple versions of a give file to be saved, it cannot restore a computer back to its previous state.

Given the amount of time that an image based backup takes to complete, it is not something that needs to be done every day or even month.  Personally, I image my hard drive after I first set the system up and after I make major changes like the installation of a new program.  In other words, about once or twice per year.  In contrast, I run file based backup once a week.  When I do need to recover from a crashed hard drive, I follow a two step process.  First I restore the programs (through the image) and then I restore the files from my backup.  Fortunately, nearly all modern operating systems include both types of tools.  Mac OS X has Time Machine built-in and Windows Vista has the Backup and Restore Center

Typical of the Mac approach, Time Machine is a comprehensive solution.  It manages both file based and image based backup automatically.  In fact, most users won’t ever need to draw a distinction between the two.  Backup happens automatically and restore is almost as easy.  It consists of clicking the time machine icon and finding the desired file.  For system restore, put in the install disk to click the date to which you want to return the system.

Windows Vista, in contrast, treats the two as separate entities.  And unfortunately, not all versions of the operating system contain the same tools.  The tools for automatic file backup are omnipresent, but only the business and ultimate versions include the image backup.  In both cases, they are accessed by clicking on “Backup and Restore Center.”  For people who use the home versions of Windows Vista, they will need to use a separate, third party tool for image backup.  DriveImage is one free alternative, though I personally prefer Open Source alternatives (GParted and PartImage, available on the same disk here).

Archive: Taking Care of It

While backup is a good first step, it is important to emphasize that it is only the first step.  When I stop to consider how much of my writing, thoughts and identity are locked up on my computer, it is frightening.  I can’t help but give serious thought to an important question: Will I be able to access my information in ten years?  What about thirty?  What will become of my pictures, digital drawings, files or writings?  Without a little bit of care, these things might become inaccessible; not because of disaster or mechanical, but through obsolescence.

In the battle with time, there are two major issues: 1) Advances in tech make old things obsolete and they fall out of use.  2) Software evolves; companies go out of business, OpenSource projects fail, and file formats change.  These problems require planning in addition to that of backup.

The purpose of an archive is to organize data for long term storage and  while everything should be backed up, not everything should be archived.  Archiving, when done right, takes a bit of work.  Not only are you making a copy, but it’s important to think about file format issues.  You might not always have access to a copy of Adobe Photoshop which can read that PSD, or Illustrate that can cope with the AI.  Thus, when I add an important file to my archive, I actually create two additional copies: 1) a PDF so that I can quickly preview or share the file with others and 2) an ASCII encoded text document with the unformatted content (or a jpeg if the file is an image).  I chose PDF and simple text because these two formats are nearly ubiquitous.  They are human readable and can be opened by most programs program.  This is a scenario that is unlikely to change.  The same simply cannot be said of other file formats, even those which are Open Source.

Thus, while the best way to archive something is in printed form (on acid free paper with acid free ink which as long as it is kept out of the sun can last for hundreds, if not thousands of years); careful digital archiving can be very valuable.  There is a second problem which you also have to contend with: digital media degrades relatively quickly.  Hard drives last between five to seven years and CDs and DVDs (which are slightly more durable) will probably be useless after 10.  (Note: While it is possible to get special archival media that can last for as long as 50 years, it is extremely expensive.)

The only way to ensure that digital information endures is redundancy:  multiple copies stored in different places.  And an archive, typically, should be readable by any computer or device without needing to be decoded (which disqualifies most automatic backup programs).  I, therefore, maintain a special set of archive folders organized by topic.  These include folders for work related information, research, and personal projects.  I then convert the most important files to PDF and text, and place them within the corresponding archive folder.  Some would advocate compressing the individual folders to save space.  But if you’re not careful, this might lead to corruption of the data.  Hard drive space is extremely cheap, so I don’t compress the files.

A good sync program, like Live Mesh, robocopy or rsync (Mac OS X, Linux) can greatly simplify  the process of maintaining copies of your archive.  I keep a reference copy on every computer I use in addition to a backup copy on my  server and a redundant, encrypted copy on an offsite ftp server.  All of the reference copies are synced to the version on the backup server.


Backup and archiving of data are two important and separate issues.  Whereas backup is about keeping data safe, archiving is about taking care of it.  This requires that more complicated questions (like file format) be considered.  As more parts of our lives move into the electronic space, both of these issues become frighteningly relevant.  After all, the evidence of a human life should be more than a smiling face in a forgotten photograph.

Similar Posts:

3 Responses to “Backup and Archive”

[...] what about convenience?