Backup for Linux, Done Right- Part 1: A Mini Melodrama

 | August 7, 2009 12:58 am

Time Drive

It is a terrible thing to realize that you are stuck in a rut.  Being in a rut effectively means that you’ve stopped advancing and life has evolved to monotony.  No one likes to be around people in ruts, but it’s even worse to discover that you are personally trapped in one.  And, most unfortunately, I am in a rut.

Don’t believe me?  Take a look at the home page of this blog.  You will likely notice that a full six of the ten most recent posts have dealt with one subject: backing up your computer.  That’s pretty conclusive evidence of a rut.

Now, backing up your computer is a very important thing to do; you should do it regularly and have a plan.  But … well … it’s boring.  Talking, thinking and writing about nothing but backup is dull.  As one of the doctors I work with likes to say, “That isn’t sexy.  If I’m going to spend any time with it – women, food, wine; it doesn’t matter – it should be sexy.”

He’s got a valid point, backup is not “sexy” and I’d like to write about things that are, at least for a while.  This, therefore, will be my last post on backups, archives, or servers for the relatively foreseeable future (technology is just too cool to lay it aside for too long).  But before doing that, I want summarize where I ended up in my quest for the ultimate backup system.

Backup on Mac is taken care of, I use Time Machine to a Samba share.  More adventurous persons than I might even say that this arrangement approaches sexy.  It’s convenient, fast, and robust.  It even covers disaster recovery.

Backup on Windows is also covered.  The built-in file backup is easy to use and works well.  Moreover, setting up a disaster recovery system is relatively painless.

But the third major operating system, Linux, is a bit of the odd-man out.  Certainly, you can find some excellent backup systems, Back In Time is one such example.  With a bit of work, you can even tweak it so that it is almost perfect.  But it’s the “almost perfect” and closely related cousins (“mostly useful” and “good enough”) that are the problem.  They have those stupid qualifiers – almost, mostly, enough – bolted on.

Any time you hear a qualifier, you can rest assured that you aren’t going to like what follows.  Consider the rather innocuous phrase, “that may be a problem.”   Here, the term “may,” makes an already bad situation much worse.  Instead of specifying some probability of problemhood, it all but guarantees it.  Positive qualifiers are just as bad.

As a result, it angers me that nearly every backup program available for Linux requires some kind of qualifier.  It shouldn't be like this.  Linux is a brilliant operating system in practically every way.  It is highly integrated, wonderfully modular and tremendously easy to extend.  So … after finding that nearly every backup utility in existence has failed to meet my needs, I found the situation intolerable and decided to do something about it.

I wrote my own.

Backgrounds and Backends

Fixation and ruts can make you do silly things like that.  Instead of just accepting the limitations of an existing situation, a fixated person will demand that the world bend to their expectations.  While this sometimes leads to great advances, more often it results in interpersonal disasters of epic scale.  And the general rule is, the more menial the detail, the larger the scandal.  (At this point, it might be argued that backup to an external hard drive versus across the network is a rather menial detail.  To all such detractors, all I can say is: stuff it.  This is my story.)

Basic Needs

Luckily, however, the story of my backup utility isn’t nearly so sordid.  This is probably because my needs are actually pretty simple.  I need a backup utility to do just a few things, but I require that it does them well:

  1. It should create versioned snapshots of my drive.  This lets me restore a file to any number of past states.
  2. The utility should only transfer the portions of the file or directory that have changed.  This makes the backup operation quick and efficient.
  3. I should be able to backup over a network or across the internet.  (Very important!)
  4. Restoring a file should be quick, easy and painless. A backup is only as good as your ability to get at the information.  And like it or not, information will be lost due to disaster, carelessness or miscellaneous stupidity.

Being a somewhat clever person, I didn't to avoid duplicating as much work as possible.  As a result, I started creating my program by reviewing the application that comes closest to doing what I want: Back In Time.  The Back In Time user interface is simple and elegant.  More importantly, though, the configuration options make sense.  Whether you want to run a backup, give your snapshot a name, or just get rid of it; you don’t have to go hunting to find the controls.  Additionally, Back In Time makes good use of existing open source programs to actually do the backing up.  Though they might be a bit old, rsync and the Unix copy command, cp, are a formidable duo.Clockwork

But I quickly discovered something unfortunate, Back In Time really couldn’t be tweaked to meet all of my requirements.  You see, it utilizes a technique called hard-linking to create space efficient repositories.  But hard links only work as long as they are on the same hard drive.  Because I want to store my files on the network or across the internet, that isn't good enough.


So, I started to look elsewhere for other options, which is when a colleague pointed me toward a command line script called Duplicity.  In a striking bit of irony, while the GUI tools of Linux backup are all fundamentally flawed, the command line versions are the best in existence.  They easily accomplish all of the requirements on my list, and do a great many things I’ve never even heard of.  As a result, if you are willing to roll up your sleeves and hit the learning curve, there is nothing that you can’t accomplish.  And, as you might deduce from my glowing introduction, Duplicity makes short work of my requirements list.

Can it do incremental backup?  Yes.  What about backup over the network?  Ditto.  In fact, it handles most protocols I’ve heard of, and quite a few that I haven’t.  Should incremental through ssh, webdav, ftp, smb, or imap be insufficient, it also supports backup to Amazon S3.  File restoration is also relatively painless, though you have to work through the command line interface to do so.

And that’s only for starters, Duplicity also supports encryption via passphrase or passkey, compression of the files into archives, and local caching of file signatures so that backup operations are lightning quick.  I decided relatively quickly that Duplicity would be the ideal backend for my utility.

Frontend and Features

But while a solid foundation is a great starting point, it can only get you so far.  After all, you don’t brag up the brickwork and reinforced concrete of your newly purchased home, you’re far more likely to show off the kitchen and home theater.

For a program, the points of interest are going to be the user interface.  Which makes two lessons learned while working on LyX-Outline very important: 1) It’s really hard to put together an interface that is both intuitive and uncluttered.  2)  I’m really bad at it.  Both points prompted me to act on the advice of a famous painter:

Good artists copy; great artists steal.

Thus, I decided to steal the user interface from Back In Time.  (It’s open source, which makes it okay.)  And speaking from a practical standpoint, why wouldn’t I?  The developer spent a great deal of time working out how he wanted his program to work.  It would be silly to duplicate that effort.  Moreover, the fact that I was able to rave about it for nearly 2800 words means that it wasn't wasted effort.  As you browse the menus and configuration panes of my derivative tool, you will probably notice that it bears a striking resemblance to other programs.  A few things will be “innovatively different,” but not many.  Just remember, I stole the interface.  Wholesale.  But I’m a discriminating thief, and only stole the best parts.

Which brings us to the real point of interest of this post and my program: the feature list.  In part 2 of this article, we’ll take a look at my little creation, which I have unimaginatively dubbed “Time Drive.”

Similar Posts:

6 Responses to “Backup for Linux, Done Right- Part 1: A Mini Melodrama”

Ryan wrote a comment on August 12, 2009

Have you looked at Deja Dup? It's another GUI solution based on Duplicity. I just found it after reading your blog, and it sounds quite similar in terms of its goals.

notize wrote a comment on August 12, 2009

How does this handle duplicate files scattered on multiple computers. In our house we've got pictures and music duplicated left and right. Would this backup files that are already somewhere? I'm asking for storage space calculation .... these days storage space comes a bit cheaper than time to organize everything ... nothing wrong with my kids playing with their own media files,but if I could save on backup storage I would.

Please let us know!

Rob Oakes wrote a comment on August 12, 2009

Hi Ryan, I came across Deja Dup after finishing most of the functionality for Time Drive (one of those bizarre ironies of life). I've been in touch with it's lead developer, Michael Terry, and we've talked about joining code-bases so that the archive browser can be ported over.

Because Time Drive is written using Qt and Deja Dup written with Gtk, they will both probably continue to live and evolve (eventually sharing a common codebase where possible). I'm personally more of a KDE man (though using Gnome at the moment), while Michael is geared towards Gnome.

But, Deja Dup is a wonderful program. And archives from Time Drive are compatible with archives made with Deja Dup because of the common backend, Duplicity.

Rob Oakes wrote a comment on August 12, 2009

@Notize, unfortunately, it will make a backup copy for each of the files. (Though it would be really cool if it could compare file signatures and only backup changes. That is something that I will pass along to the developers of Duplicity, the backend for Time Drive as a feature idea. Because it compares signatures, it probably would only be moderately difficult to add.)

To the best of my knowledge, the only major program that is capable of intelligently combining backups from multiple machines is Windows Home Server. Not even Time Machine is capable of doing that. Enterprise scale options, like Zmanda might be able to, but I've never spent too much time using them. They are just too far outside of my own needs. I do really well with file-backup and have never needed anything heavier.

Ryan wrote a comment on August 15, 2009

On the subject of saving space on duplicate files, there's a package called backuppc that does exactly that. It's designed to run on a server and pull backups from all the other machines via rsync (or other methods), and then it hardlinks all the identical files to save space. (As opposed to each computer individually pushing backups to the server.) When backing up a bunch of Windows machines, that means that it only makes one copy of each system file, for example.

There's also gibak, which is a thin command-line frontend to git that turns it into a backup suite. Git does some creative compression and hard-linking and other stuff that I don't quite understand, which makes it good for backups.

As for the on-disk compatibility of Deja Dup and Time Drive, does that mean that I could configure them both to use the same backup directory and then use one for backup and the other for restoring, or something like that?

Rob Oakes wrote a comment on August 15, 2009

@Ryan: Thanks for pointing out the two backup solutions. I hadn't heard of either one of them and they look very, very cool. I'm particularly interested in the backuppc and will probably download and take a look at it. I might even try and build support into timedrive for it, since it appears to do much the same thing that home server does, except with open source tools and Linux. I also took a look at gibak, which looks neat, but I've never really been able to wrap my head fully around git.

Re the second question, the answer is yes. If you wanted to configure Deja Dup to do the actual backing up, you could then configure Time Drive to use the same directory and restore the files. A few things to be aware of. Time Drive runs a separate duplicity script for each folder in your backup, thus, if you go to your archive directory, you will find a separate directory and set of .tar files for every folder. In contrast, Deja Dup runs a single duplicity script and just uses the --include flag to add that folder to the job.

A better bet might just to be the advanced restore options and connect right to the archive, rather than trying to mess with getting the configuration options just right. But in terms of compatibility, they are both use duplicity. As a result, if you get the config options right, you can use either program to add or restore. Given the features of the two programs, in such a scenario, I would use Deja Dup for backup and Time Drive for restore. (Presently, Deja Dup doesn't even support individual file restore, just restoration of the full archive. I've been talking with Michael Terry, the lead developer, and we intend to change that in the immediate future.)

I hope that's of some help.