Skip to content

Painless Amazon EC2 Backup

by Paul Kenjora on March 13th, 2009

The past year on Amazon EC2 has taught me many things but first and foremost is back up consistently. I’ll say it again, back up consistently! Amazon even makes the backup almost painless, almost…

Amazon has EC2 (the compute cloud) and S3 (the data repository). Out of the box you can back up from EC2 to S3 using simple Amazon provided scripts like ec2-bundle-vol and ec2-upload-bundle. They both come already installed with the Linux images you start with.

I’ll inject some philosophy here. Some people believe that backing up the entire image is a waste of resources and you should instead back up only the data you need to S3. These people obviously have copious amounts of spare time on their hands. I’m not going to cry over the extra few MB, the benefit is being able to restore a backup in 30 automated seconds vs. twiddling with buckets and databases. Time is money.

The idea behind the code below is to backup entire images of a server on a rotating daily schedule. Rotation is set to create a unique manifest in your bucket per day of the week. If the system goes down or is compromised pick a backup day to restore, relaunch the instance from that image, and re-assign your Amazon IP address. The whole restore process should take no more than 30 seconds (Not counting instance boot).

To implement this backup strategy simply copy the following script into /root/backup.py and set up the following cron task (midnight backup):

crontab -e

0 0 * * * /root/backup.py

All codes above are fake but you should replace them with your own. Run the script manually or run it using the cron. Either way you get an automated painless Amazon EC2 backup.

  • Ja
    Thanks for this, it's great. It's a minor thing but you've got 'sturday' instead of 'saturday'. And as Darren points out you have two 'sunday' in there.

    Finally, how do you restore from this?
  • I'll make the spelling edits soon. The Arkayne project is keeping me busy, also all in Django.

    You restore from this by listing all your AWS images. Log into your AWS console via the web and check your images. There is also a command line tool in the ec2-api java toolset for launching instances. First run the register command then run launch instance.
  • Really informative article, so I have to work harder on back up right? If this could happen really works for me.
  • Darren
    Thanks, this is useful. You've got an extra "Sunday" in your weekday array though - I believe you should remove the first one, or your backups will always be for the wrong day.
  • Alejandro
    Hi,

    Could you describe the restore process? I think it's clear to me but I don't want to miss anything...

    Also, this backup looks like a system-only backup, right? no /mnt
  • thom
    Why are you not registering the AMI ?
  • Thanks for the excellent script, but please do not ever run a cron at midnight, especially not in a shared environment like EC2. Most networks have surges at precisely that time because of all the cron scripts running. Better off picking an unusual time like 1:37. Midnight is far from the lowest traffic time anyway, most sites minimum is at 4-5 in the morning.
  • arpitjain
    doesn't ec2 allows images of max size 10 Gb for bundling volumes ? That sure can't be enough for bundling everything. A better way would be to store your data on a different directory on ebs and bundle just the system files
  • Interesting stuff. I did something similar with a bash script but not as a scheduled task (script would take image name as a parameter and then do the leg work).

    I am personally keeping all data, logs and application config files on an EBS Volume and only run this bash script when I make changes to the OS. This way I plan to make backups with snapshots of the volume and/or copies of the data to S3.

    Does this sound reasonable?
  • Hi, thanks so much for this post - it's come in very handy. I'm curious though if anyone else sees sporadic results with this script? Sometimes it works, and sometimes it bombs on me. When it does bomb, I get the following error from my cronjob...

    ------
    rm -f /mnt/saturday*
    /usr/local/bin/ec2-bundle-vol .....
    Copying / into the image file /mnt/saturday...
    Excluding:
    /sys
    /proc
    /proc/sys/fs/binfmt_misc
    /dev
    /media
    /mnt
    /proc
    /sys
    /mnt/saturday
    /mnt/img-mnt
    1+0 records in
    1+0 records out
    1048576 bytes (1.0 MB) copied, 0.00216 seconds, 485 MB/s
    mke2fs 1.39 (29-May-2006)
    ERROR: execution failed: "rsync -rlpgoD -t -r -S -l --exclude /sys --exclude /proc
    --exclude /proc/sys/fs/binfmt_misc --exclude /dev --exclude /media --exclude /mnt
    --exclude /proc --exclude /sys --exclude /mnt/saturday --exclude /mnt/img-mnt -X
    /* /mnt/img-mnt 2>&1 > /dev/null"
    /usr/local/bin/ec2-upload-bundle ....
    --manifest has invalid value '/mnt/saturday.manifest.xml': File does not exist or
    is not a file.
    Try 'ec2-upload-bundle --help'

    ----------

    I've removed some of the command params, but you should get the idea. It seems to me that for some reason the manifest.xml file isn't being generated prior to trying to upload it to S3. Any thoughts?

    Thanks in advance for any help!
  • Forgot to mention, that I only get that error when it runs from Cron. Anytime I run the script directly on the command line it seems to work fine. Very strange...
  • Check your spelling.. but nice post.. this will me alot.. hope this codes are not fake...
  • are these true? cause one here posted the codes here are fake, its good i haven't tried it.
  • danb
    very helpful! thanks!

    missing a line break on that first line.. import should be on its own line... and saturday is misspelled.

  • MS Windows taught me to back up. All those crasher back in the old day meant that if I wasn't saving every couple minutes I'd probably lose a lot of stuff. Now I'm backuping every piece of data I have.
  • Jonathan Fine
    'sturday' should be 'saturday'.
  • Rajesh D
    If your server gets compromised, this script gives the hacker full access to your EC2 account -- private certificates, AWS secret key, and all. If that's a concern, one could scp a version of this script to the production EC2 instance and execute it only when new application and configuration updates are deployed to production after which the script could be deleted. That coupled with employing EBS for all persistent storage (and using EBS snapshots for daily data backups) is worth considering. EBS volume snapshots can be taken from outside of AWS and they don't require your AWS secret key -- so the script that takes those data snapshots doesn't even have to reside on the EC2 instance.

  • Nick
    Please read blog post carefully all codes are fake.
  • Rajesh D
    Sorry, I didn't explain my point clearly enough. I didn't mean to say that you are exposing your own codes (I understand fully that the codes you have shown above are made up.)

    What I meant is that when you do deploy this script on to your server for real, you will have to put in real AWS codes into this script for it to work. At that point, a potential server compromise leaves a hacker with plain access to these codes and hence to your AWS account.
  • Rajesh,

    Very valid point, I'm not sure its avoidable though. To back up to S3 the script needs the codes. Even if you do the backup manually your history log will show the keys in the command. I guess you could clear your history but does anyone do that?

    Plus if your server is compromised, the hacker could simply wait until you back up again then cache the commands and codes. So Im not sure this problem is solvable, if the server is compromised then there is no way to avoid risk.
  • Rajesh D
    If you use bash, it's very easy to disable writing of your command lines to the history file when you log out of ssh (basically, you would use "unset HISTFILE" in your bashrc.) There's also a command to remove your history from memory. I have this and some other tweaks on all my EC2 boxes.

    You're right that a hacker could wait for you to use your secret keys/certs and grab them at that time. It's still a bit more reassuring not to have to keep your secret keys right on the server all the time.
  • Nick
    Actually this is an offtopic comment. But this is the only place I am able to contact you.

    Your Arkayne "Contact us" form doesn't work, showing strange {message}} message.
    "Contact us" form on Aware Labs doesn't work too, showing "SMTPDataError at /contact/
    (503, '5.0.0 Need RCPT (recipient)')".

    I believe that you want to want to listen to your users/clients but something went wrong.

    WBR, Nick

  • Mike Parsons
    Dude ... you got your Amazon keys in your code snippet!!!

    access_key = '8459945JFG8FDGJ38233'
    secret_key = 'h9rtnretlfkgdnfgg843twlejrktjwlktwekl'
  • All codes above are fake but you should replace them with your own.
  • HW
    Kind of offtopic, but I'll throw it out anyway:

    Why are you not simply using a shell script if you just want to run a few shell commands? I think in this case it would make the code even simpler.
  • Registering the AMI does not back up /mnt, and has a 10 GB limit.
  • I've never been good at shell scripting. If you send me a shell version I will gladly post it.
blog comments powered by Disqus