Bash Script: Incremental Encrypted Backups with Duplicity (Amazon S3)

Purpose: use a script to automatically backup certain folders/files to Amazon’s S3 off-site storage in an encrypted fashion. Point of the script is to make it a little easier to manage the details and keep a log of what, exactly, is going on. Last Updated: Jan 31st, 2009

about the script

This bash script was designed to automate and simplify the remote backup process using duplicity and Amazon S3.  After the script is configured, you can easily backup, restore, verify and clean without having to remember lots of different command options.

Furthermore, you can even automate the process of saving your script and the gpg key for your backups in a single password-protected file — this way, you know you have everything you need for a restore, in case your machine goes down.

You can run the script from cron with no command-line options (all options set in the script itself); however, you can also run it outside of the cron with some variables for more control.

--full: forces a full backup (instead of waiting specified number of days
--verify: verifies the backup (no cleanup is run)
--restore: restores the backup to the directory specified in the script
--backup-this-script: let’s you backup the script and secret key to the current working directory.

how to use

Download the current version of the script: DT-S3-Backup (Version 3)

You’ll also need to have a number of things in place in order to utilize this script, specifically: gpg, duplicity, an Amazon S3 account, and (optionally) s3cmd. If you need help getting these pieces in place, I wrote another post about putting it all together. It’s not all that difficult, but does take a few pieces of the puzzle to be in order.

Once you have the script, you will need to fill out the <FOOBAR> variables with your own specific information.  I suggest testing the script on a small directory of files and a local directory for your destination first to make sure it is working.

the script (version 3 – Jan 31 2009)

#!/bin/bash
#
# Copyright (c) 2008-2009 Damon Timm.
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see .
#
# ---------------------------------------------------------------------
#
# Version 3 - Jan 31 2009
# Incremental Encrypted Backups with Duplicity and Amazon S3
#
# This bash script was designed to automate and simplify the remote backup
# process using duplicity and Amazon S3.  Hopefully, after the script is
# configured, you can easily backup, restore, verify and clean without
# having to remember lots of different command options.
#
# Furthermore, you can even automate the process of saving your script
# and the gpg key for your backups in a single password-protected file --
# this way, you know you have everything you need for a restore,
# in case your machine goes down.
#
# You can run the script from cron with no command-line options
# (all options set in the script itself); however, you can also run it
# outside of the cron with some variables for more control.
#
# --full: forces a full backup (instead of waiting specified number of days
# --verify: verifies the backup (no cleanup is run)
# --restore: restores the backup to the directory specified in the script
# --backup-this-script: let's you backup the script and secret key to the
#                       current working directory.
#
# See more info about the script online at:
# blog.damontimm.com/bash-script-incremental-encrypted-backups-duplicity-amazon-s3/

# TO DO:
#   - allow command line restore options (specific files, etc)
#   - allow command line cleanup options (# of days [30D] or full backups [2])
#   - allow restore to specific path from the command line

# AMAZON S3 INFORMATION
export AWS_ACCESS_KEY_ID=""
export AWS_SECRET_ACCESS_KEY=""

# GPG PASSPHRASE & GPG KEY (Automatic/Cron Usage)
# If you aren't running this from a cron, comment this line out
# and duplicity should prompt you for your password.
# I put my GPG passphrase in a text file at
# ~/.gnupg/.gpg-passphrase and chmod it 0600.
export PASSPHRASE=""
GPG_KEY=""

# The ROOT of your backup (where you want the backup to start);
# This can be / or somwhere else -- I use /home/ because all the
# directories start with /home/ that I want to backup.
ROOT="/home/"

# BACKUP DESTINATION INFORMATION
# In my case, I use Amazon S3 use this - so I made up a unique
# bucket name (you don't have to have one created, it will do it
# for you.  If you don't want to use Amazon S3, you can backup
# to a file or any of duplicity's supported outputs.
#
# NOTE: You do need to keep the "s3+http:///" format;
# even though duplicity supports "s3:///" this script
# needs to read the former.
#DEST="file:///home/damon/new-backup-test/"
DEST="s3+http://backup-bucket/backup-folder/"

# RESTORE FOLDER
# Being ready to restore is important to me, so I have this script
# setup to easily be able to restore a backup by adding the
# "--restore" flag.  Indicate where you want the fili to restore to
# here so you're ready to go.
RESTORE="/home/damon/restore-backup-01"

# INCLUDE LIST OF DIRECTORIES
# Here is a list of directories to include; if you want to include
# everything that is in root, you could leave this list empty, I think.
#INCLIST=( "/home/*/Documents" \
#    	  "/home/*/Projects" \
#	      "/home/*/logs" \
#	      "/home/www/mysql-backups" \
#        ) 

INCLIST=( "/home/damon/Documents/Scripts/" ) # small dir for testing

# EXCLUDE LIST OF DIRECTORIES
# Even though I am being specific about what I want to include,
# there is still a lot of stuff I don't need.
EXCLIST=( "/home/*/Trash" \
	      "/home/*/Projects/Completed" \
	      "/**.DS_Store" "/**Icon?" "/**.AppleDouble" \
           ) 

# FULL BACKUP & REMOVE OLDER THAN SETTINGS
# Because duplicity will continue to add to each backup as you go,
# it will eventually create a very large set of files.  Also, incremental
# backups leave room for problems in the chain, so doing a "full"
# backup every so often isn't not a bad idea.
#
# I set the default to do a full backup every 14 days and to remove all
# all files over 31 days old.  This should leave me at least two full
# backups available at any time, as well as a month's worth of incremental
# data.

FULL_IF_OLDER_THAN="14D"
CLEAN_UP_TYPE="remove-older-than"
CLEAN_UP_VARIABLE="31D"

# If you would rather keep a certain (n) number of full backups (rather
# than removing the files based on their age), uncomment the following
# two lines and select the number of full backups you want to keep.
# CLEAN_UP_TYPE="remove-all-but-n-full"
# CLEAN_UP_VARIABLE="5"

# LOGFILE INFORMATION DIRECTORY
# Provide directory for logfile, ownership of logfile, and verbosity level.
# I run this script as root, but save the log files under my user name --
# just makes it easier for me to read them and delete them as needed. 

# LOGDIR="/dev/null"
LOGDIR="/home/damon/logs/test2/"
LOG_FILE="duplicity-`date +%Y-%m-%d-%M`.txt"
LOG_FILE_OWNER="damon:damon"
VERBOSITY="-v3"

##############################################################
# Script Happens Below This Line - Shouldn't Require Editing #
##############################################################
LOGFILE="${LOGDIR}${LOG_FILE}"
DUPLICITY="$(which duplicity)"
S3CMD="$(which s3cmd)"

NO_S3CMD="WARNING: s3cmd is not installed, remote file \
size information unavailable."
NO_S3CMD_CFG="WARNING: s3cmd is not configured, run 's3cmd --configure' \
in order to retrieve remote file size information."

if [ ! -x "$DUPLICITY" ]; then
  echo "ERROR: duplicity not installed, that's gotta happen first!" >&2
  exit 1
elif  [ `echo ${DEST} | cut -c 1,2` = "s3" ]; then
  if [ ! -x "$S3CMD" ]; then
    echo $NO_S3CMD; S3CMD_AVAIL=FALSE
  elif [ ! -f "${HOME}/.s3cfg" ]; then
    echo $NO_S3CMD_CFG; S3CMD_AVAIL=FALSE
  else
    S3CMD_AVAIL=TRUE
  fi
fi

if [ ! -d ${LOGDIR} ]; then
  echo "Attempting to create log directory ${LOGDIR} ..."
  if ! mkdir -p ${LOGDIR}; then
    echo "Log directory ${LOGDIR} could not be created by this user: ${USER}"
    echo "Aborting..."
    exit 1
  else
    echo "Directory ${LOGDIR} successfully created."
  fi
elif [ ! -w ${LOGDIR} ]; then
  echo "Log directory ${LOGDIR} is not writeable by this user: ${USER}"
  echo "Aborting..."
  exit 1
fi

get_source_file_size()
{
  echo "---------[ Source File Size Information ]---------" >> ${LOGFILE}

  for exclude in ${EXCLIST[@]}; do
    DUEXCLIST="${DUEXCLIST}${exclude}\n"
  done

  for include in ${INCLIST[@]}
    do
      echo -e $DUEXCLIST | \
      du -hs --exclude-from="-" ${include} | \
      awk '{ print $2"\t"$1 }' \
      >> ${LOGFILE}
  done
  echo >> ${LOGFILE}
}

get_remote_file_size()
{
  echo "------[ Destination File Size Information ]------" >> ${LOGFILE}
  if [ `echo ${DEST} | cut -c 1,2` = "fi" ]; then
    TMPDEST=`echo ${DEST} | cut -c 6-`
    SIZE=`du -hs ${TMPDEST} | awk '{print $1}'`
  elif [ `echo ${DEST} | cut -c 1,2` = "s3" ] && [ -x "$S3CMD" ]; then
      TMPDEST=$(echo ${DEST} | cut -c 11-)
      SIZE=`s3cmd du -H s3://${TMPDEST} | awk '{print $1}'`
  fi
  echo "Current Remote Backup File Size: ${SIZE}" >> ${LOGFILE}
}

include_exclude()
{
  for include in ${INCLIST[@]}
    do
      TMP=" --include="$include
      INCLUDE=$INCLUDE$TMP
  done
  for exclude in ${EXCLIST[@]}
      do
      TMP=" --exclude "$exclude
      EXCLUDE=$EXCLUDE$TMP
    done
    EXCLUDEROOT="--exclude=**"
}

duplicity_cleanup()
{
  echo "-----------[ Duplicity Cleanup ]-----------" >> ${LOGFILE}
  ${DUPLICITY} ${CLEAN_UP_TYPE} ${CLEAN_UP_VARIABLE} --force \
	    --encrypt-key=${GPG_KEY} \
	    --sign-key=${GPG_KEY} \
	    ${DEST} >> ${LOGFILE}
  echo >> ${LOGFILE}
}

duplicity_backup()
{
  ${DUPLICITY} ${OPTION} ${FULL_IF_OLDER_THAN} \
   ${VERBOSITY} \
   --encrypt-key=${GPG_KEY} \
   --sign-key=${GPG_KEY} \
   ${EXCLUDE} \
   ${INCLUDE} \
   ${EXCLUDEROOT} \
   ${ROOT} ${DEST} \
   >> ${LOGFILE}

}

get_file_sizes()
{
  get_source_file_size
  get_remote_file_size

  sed -i '/-------------------------------------------------/d' ${LOGFILE}
  chown ${LOG_FILE_OWNER} ${LOGFILE}
}

backup_this_script()
{
  if [ `echo ${0} | cut -c 1` = "." ]; then
    SCRIPTFILE=$(echo ${0} | cut -c 2-)
    SCRIPTPATH=$(pwd)${SCRIPTFILE}
  else
    SCRIPTPATH=$(which ${0})
  fi
  TMPDIR=DT-S3-Backup-`date +%Y-%m-%d`
  TMPFILENAME=${TMPDIR}.tar.gpg

  echo "You are backing up: "
  echo "      1. ${SCRIPTPATH}"
  echo "      2. GPG Secret Key: $GPG_KEY"
  echo "Backup will be saved: `pwd`/${TMPFILENAME}"
  echo
  echo ">> Are you sure you want to do that ('yes' to continue)?"
  read ANSWER
  if [ "$ANSWER" != "yes" ]; then
    echo "You said << ${ANSWER} >> so I am exiting now."
    exit 1
  fi

  mkdir -p ${TMPDIR}
  cp $SCRIPTPATH ${TMPDIR}/
  gpg -a --export-secret-keys ${GPG_KEY} > ${TMPDIR}/s3-secret.key.txt
  echo
  echo "Encrypting tarball, choose a password you'll remember..."
  tar c ${TMPDIR} | gpg -aco ${TMPFILENAME}
  rm -Rf ${TMPDIR}
  echo
  echo ">> To restore files, run the following (remember your password!)"
  echo "gpg -d ${TMPFILENAME} | tar x"
}

if [ "$1" = "--backup-this-script" ]; then
  backup_this_script
  exit
elif [ "$1" = "--full" ]; then
  OPTION="full"
  FULL_IF_OLDER_THAN=
  include_exclude
  duplicity_backup
  duplicity_cleanup
  get_file_sizes

elif [ "$1" = "--verify" ]; then
  OLDROOT=${ROOT}
  ROOT=${DEST}
  DEST=${OLDROOT}
  OPTION="verify"
  FULL_IF_OLDER_THAN=

  echo "-------[ Verifying Source & Destination ]-------" >> ${LOGFILE}
  include_exclude
  duplicity_backup
  echo >> ${LOGFILE}
  get_file_sizes  

elif [ "$1" = "--restore" ]; then
  ROOT=$DEST
  DEST=$RESTORE
  FULL_IF_OLDER_THAN=
  OPTION=

  if [ "$2" != "yes" ]; then
    echo ">> You will restore to ${DEST}"
    echo ">> You can override this question by executing '--verify yes' next time"
    echo "Are you sure you want to do that ('yes' to continue)?"
    read ANSWER
    if [ "$ANSWER" != "yes" ]; then
      echo "You said << ${ANSWER} >> so I am exiting now."
      exit 1
    fi
    echo "Restoring now ..."
  fi
  duplicity_backup

elif [ "$1" = "--test" ]; then
  echo "This was a non-duplicity scripting test: check logfile for file sizes."
  get_file_sizes
else
  OPTION="--full-if-older-than"

  include_exclude
  duplicity_backup
  duplicity_cleanup
  get_file_sizes

fi

unset AWS_ACCESS_KEY_ID
unset AWS_SECRET_ACCESS_KEY
unset PASSPHRASE

change log

Here is a list of the changes so far.

Version Three (01/31/09)

  1. Added comment to explain why folks need to use s3+ and not s3: for Amazon buckets
  2. Used “unset” to remove the variables at end of the script (thanks: alvaro)
  3. Fixed a problem when the backup folder on S3 was nested inside another bucket (thanks John Kinsella)
  4. Changed the PASSPHRASE field to default to the actual passphrase, so one can easily backup the entire script and not have to worry about remembering the passphrase or where it’s kept.
  5. Added --backup-this-script option which will turn the script and the secret key into an encrypted tarball that can be kept somewhere safe for easy restores if the machine goes down.
  6. Cleaned up the get_file_size function so it wouldn’t run when it wasn’t supposed to.

Version Two (12/03/08) [Download Version 2]

  1. added GPL license
  2. changed the cleanup feature to automatically force a full backup after (n) number of days as well as automatically cleanup after (n) number of days
  3. added option to force cleanup after (n) number of full backups (rather than by days)
  4. option to changed log file ownership
  5. runtime checks for installed required software and write permissions on log directory
  6. fixed formatting of logfile to be a little more consistent
  7. setup everything in clever functions

Version One (11/24/08) [Download Version 1]

  1. Initial release.

54 Comments (newest first)

  1. Thank you for your script.

    I found some bugs on this script and I resolve it.
    I want to improve the script and I do some little change.

    Would you like to contact me by private email?

  2. Vlad says:

    How do I restore a backup that came from one machine, on another?
    I backed up from one machine, and now I want to get that backup on another machine, in case the first one completely goes belly up. So far I get this message, when I run ./DT-S3-Backup-v3.sh –restore

    ===== Begin GnuPG log =====
    gpg: encrypted with ELG-E key, ID 16C3509D
    gpg: decryption failed: secret key not available
    ===== End GnuPG log =====

    One way is to replace the entire .gnupg directory, which would contain the right key then…
    Thoughts?

    • Damon says:

      Hi – sorry for the delay (email notification was being treated as Spam) … It looks like you don’t have the GPG key installed on the second machine … I you run the script with the --backup-this-script option you can then add your key to the second machine’s key ring. I need to add clearer instructions for that but haven’t gotten to it yet.

      I can’t think of the gpg command off hand but it is something like --add-private-key

  3. i couldn’t get this to work, so i made something myself. this php script only backs up tables that have changed since the last backup. each table is backed up into a gz archive. i dump these archives straight into my web folder, which i backup with duplicity regularly. this is semi-incremental: the largest tables in my database hardly ever change, and this way, i don’t create new archives for those tables, so i also don’t have to push them to s3 every time.

    Details and download:
    http://www.netpresent.net/CMS/Products/Products/#mysqlincbak

    [hmmm, is this comment system even working? trying again. i hope i'm not posting multiple times, but i'm not seeing any result, nor an error message]

  4. Hi, firstly excellent script – it really is incredibly useful. However, I seem to be hitting intermittent issues (once every couple of days). The error is “duplicity.backends.BackendException: Error downloading”. It attempts 5 times, but then crashes out. Looking at my s3 repository, the file duplicity errors on is definitely there, so i’m not sure what could be the reason. has anyone else come across this?

    • Damon says:

      Hi Benjamin – I haven’t seen that error myself. Hmm – is strange. Seems to be an error with the S3 functionality. Sorry I don’t have much input, however, on how to go about fixing this! If you do find out the reason, let us know.

  5. Sam S says:

    Hey, nice script … so im running into an issue with the script not wanting to delete the old backup sets. I keep getting the following error:

    “Which can’t be deleted because newer sets depend on them”

    My variables are …

    OLDER_THAN=”14D”
    FULL=”7D”

    I’ve been logging into S3 using the S3 plugin for Firefox and deleting the old backups that way. Any ideas as to why it keeps failing?

    TIA

    • Damon says:

      How long will it go before it finally deletes a backup ? I had/have a similiar issue but I think it has to do with the timing/overlap of the full backup and the remove_older_than variable. That is, it seems to go one more “full backup” than I would expect. But they do get removed.

  6. Matthew says:

    Any idea what can be causing this? Gentoo, duplicity 0.4.11.

    Thanks.

    mail DT-S3-Backup-v3 # sh DT-S3-Backup-v3.sh       
    Traceback (most recent call last):
      File "/usr/bin/duplicity", line 463, in 
        with_tempdir(main)
      File "/usr/bin/duplicity", line 458, in with_tempdir
        fn()
      File "/usr/bin/duplicity", line 444, in main
        full_backup(col_stats)
      File "/usr/bin/duplicity", line 155, in full_backup
        bytes_written = write_multivol("full", tarblock_iter, globals.backend)
      File "/usr/bin/duplicity", line 87, in write_multivol
        globals.gpg_profile,globals.volsize)
      File "//usr/lib/python2.5/site-packages/duplicity/gpg.py", line 217, in GPGWriteFile
        file.write(data)
      File "//usr/lib/python2.5/site-packages/duplicity/gpg.py", line 125, in write
        return self.gpg_input.write(buf)
    IOError: [Errno 32] Broken pipe
    close failed in file object destructor:
    IOError: [Errno 32] Broken pipe
    No old backup sets found, nothing deleted.
  7. Andreas says:

    Thanks Damon, great script and instructions.

    For any users on Mac Leopard: for the sed-variant included in 10.5., line 253 needs to be like so:

    sed -i "" '/-------------------------------------------------/d' ${LOGFILE}

    Without the double quote, you get an error otherwise.

  8. Dalby says:

     

    ./backup.sh --full
    Command line error: Too many arguments
    See the duplicity manual page for instructions
    Command line error: Too many arguments
    See the duplicity manual page for instructions

    anybody an idea?

    (with and without --full)
    trying to ftp btw

    • Damon says:

      Hi – I would guess that one of settings you are using at the top of the script is incorrect or not parsing quite right. I have never tried this over ftp — perhaps one of the settings is S3 specific. If you want, post your settings and I can try and take a look. -D

  9. Alex says:

    Many thanks for your comments. To be honest before Amazon S3 I was never familir with duplicity, always a good old rsync…

  10. Alex says:

    Hi

    Excellent script, however, my question is if it is possible to resume the initial backup if it has been interrupted? In my case the script said “Warning found incomplete backup set, probably left from aborted session” and started from the very beginning. I have too many Gbs to backup monthly and I do not want to start it all from the beginning if the script is aborted due to network problems. Any ideas?

    • Damon says:

      That’s a very good question and I don’t have the answer to it — I think, though, the answer probably lies outside of the script and with duplicity itself. I would check their mailing list or just post a question there … not sure, actually! Sorry!

      • Alex says:

        Thanks Damon. Also another good question is if it is possible to verify the integrity of the backup without downloading it to the local machine? Is it possilbe to restore just a specific file/folder instead of restoring the whole backup?

        • Damon says:

          I think duplicity needs to do download the data in order to decode it, in order to then verify it’s integrity. However, using the --archive-dir option may fix that … I haven’t added it yet to the script but it’s an easy addition. See this comment.

          If you want to restore a specific file or folder, you can do that using traditional duplicity parlance — you may try to add it to the script, as well … check out this comment.

  11. AskApache says:

    Love the script, very nice! I’m recommending it to all my readers.

  12. sbeam says:

    Running this now and seems to be working nicely. Your hints on how to use s3cmd and gpg were very helpful.

    I will probably make some small changes to allow for more command line options – so I can split my stuff into different segments with different backup rules. For instance, I have sensitive personal/work data I’d like backed up daily, and encrypted (2Gb total). But my personal photos, videos and music collections can probably go in the clear and only once a week or so (80Gb total – lots of overhead to encrypt that…)

    thanks a lot for this and kudos.

  13. Steve says:

    Hi the script seems to be working for me except i’m getting the following errors:

    du: illegal option -- -
    usage: du [-H | -L | -P] [-a | -s | -d depth] [-c] [-h | -k | -m] [-n] [-x] [-I mask] [file ...]
    du: illegal option -- -
    usage: du [-H | -L | -P] [-a | -s | -d depth] [-c] [-h | -k | -m] [-n] [-x] [-I mask] [file ...]
    sed: 1: "/usr/local/www/apache22 ...": extra characters at the end of l command

    FYI i’m runnign on FreeBSD 7.0

    Any ideas? I’m pretty new to this sort of stuff. Thanks :)

    • Damon says:

      Your version of du might not take all the options I am giving it — check out line 189 … I would guess the “–exclude=” part is what is throwing it off (won’t work on the Mac, too, probably) … you might try erasing that part and seeing what happens … although, it won’t necessarily give you an accurate reading (depending on what you are excluding and including).

    • T D says:

      To make it work in FreeBSD, change line 194 from

      189
      
      du -hs --exclude-from="-" ${include} | \

      to

      189
      
      du -hs ${include} | \

      I guess the du syntax is a little different.

  14. lee says:

    I love the script, thanks. I was just wondering if there is a way to restore backups from say, 12 days ago. If I have a month of backups in s3, how do I restore a specific day? Thanks again.

    • Damon says:

      Hi Lee — there isn’t any way to do that yet without altering the script … duplicity does support this feature, however, so to make the change isn’t that hard … you could edit the script around line 321:

      elif [ "$1" = "--restore" ]; then
        ROOT=$DEST
        DEST=$RESTORE
        FULL_IF_OLDER_THAN=
        OPTION=

      Go ahead and put in an option … like this: OPTION="-t 12D" … that should work (though I haven’t tested it). Let me know if that solves the problem.

  15. Hey Damon – first, thx a ton for writing this. Very handy!

    One issue I managed to track down: If you define a DEST such as…

    DEST="s3+http://test.com/backuptest/"

    You’ll get an error which boils down to

    Problem: AttributeErr: S3Error instance has no attribute 'Code'

    What’s going on is after the backup, the script tries to run s3cmd du, but around line 185 the DEST variable gets piped through sed which strips off the slashes, so what gets run is actually

    s3cmd du -H s3://test.combackuptest

    …which generates the nonsensical error about ‘Code’…

    Not sure if it’s best to just document (If I missed docs, sorry) or have the code check the variable…

    John
    (btw would be nice to have comment markup help here)

    • Damon says:

      Hi John – Oh! I see, I guess I didn’t take into account that someone would have a folder within a bucket … yea, that would cause a problem with no forward slashes. Smile. Thanks for pointing that out.

      Obviously, if you uset DEST="s3+http://single-bucket-name/" it works without the error … because the slashes can all be stripped.

      Good idea about adding guidelines for comment markup … you can use most html stuff like: em, strong, code, … the only fancy one is that you can use ” pre lang=’bash’ ” to mark your code snippets so that it looks something like this:

        elif [ `echo ${DEST} | cut -c 1,2` = "s3" ] && [ -x "$S3CMD" ]; then
            TMPDEST=`echo ${DEST} | cut -c 10- | sed s:/::g`   
            SIZE=`s3cmd du -H s3://${TMPDEST} | awk '{print $1}'`
        fi

      Which shows the problem of stripping all the slashes … I’ll have to work on that!

      Thanks.

    • Damon says:

      I think I fixed it in Version 3 … let me know if it works for you.

  16. T D says:

    Thanks for the script. Took some fiddling to make it work with fbsd but now I’m golden. Finally, the backup solution I’ve been looking for.

    • Damon says:

      Hi TD – glad you got it to work. I’ve never used FreeBSD before … what kind of changes did you have to make?

      • T D says:

        I replied to Steve’s post below with the changes.

        I noticed you changed CLEAN_UP_VARIABLE to 31D from 14D. I made that same change a few weeks ago (without seeing your updated post) because my s3 bill was triple what it should have been due to outbound data transfer. Outbound?? Turns out, duplicity would download my data to check if newer sets depended on them. Telling it to delete items more than 14D old causes this because of the maintain-two-full-copies directive. So for the first two weeks everything is peachy…

        [user@machine /home/user/s3logs]$ cat duplicity-2009-01-24-06.txt 
        --------------[ Backup Statistics ]--------------
        StartTime 1232788342.33 (Sat Jan 24 01:12:22 2009)
        EndTime 1232788563.32 (Sat Jan 24 01:16:03 2009)
        ElapsedTime 220.99 (3 minutes 40.99 seconds)
        SourceFiles 9112
        SourceFileSize 4821175397 (4.49 GB)
        NewFiles 0
        NewFileSize 0 (0 bytes)
        DeletedFiles 0
        ChangedFiles 0
        ChangedFileSize 0 (0 bytes)
        ChangedDeltaSize 0 (0 bytes)
        DeltaEntries 0
        RawDeltaSize 9148781 (8.72 MB)
        TotalDestinationSizeChange 2105762 (2.01 MB)
        Errors 0
        -------------------------------------------------
         
        -----------[ Duplicity Cleanup ]-----------
        No old backup sets found, nothing deleted.

        Then it starts:

         
        [user@machine /home/cronuser/s3logs]$ cat duplicity-2009-02-22-06.txt 
        --------------[ Backup Statistics ]--------------
        StartTime 1235293877.43 (Sun Feb 22 01:11:17 2009)
        EndTime 1235294117.27 (Sun Feb 22 01:15:17 2009)
        ElapsedTime 239.84 (3 minutes 59.84 seconds)
        SourceFiles 9254
        SourceFileSize 4946798976 (4.61 GB)
        NewFiles 0
        NewFileSize 0 (0 bytes)
        DeletedFiles 0
        ChangedFiles 0
        ChangedFileSize 0 (0 bytes)
        ChangedDeltaSize 0 (0 bytes)
        DeltaEntries 0
        RawDeltaSize 9149797 (8.73 MB)
        TotalDestinationSizeChange 2030732 (1.94 MB)
        Errors 0
        -------------------------------------------------
         
        -----------[ Duplicity Cleanup ]-----------
        There are backup set(s) at time(s):
        Wed Jan 21 09:49:21 2009
        Wed Jan 21 10:26:44 2009
        Thu Jan 22 01:06:35 2009
        Which can't be deleted because newer sets depend on them.
        No old backup sets found, nothing deleted.
         
        ---------[ Source File Size Information ]---------
        /usr/home/xxx/s3xxx    4.0G
        /usr/local/xxx/xxx.net   1.7G
        /usr/local/xxx/xxxxx.com        303M
         
        ------[ Destination File Size Information ]------
        Current Remote Backup File Size: 2G

        And it goes on until there are 14 entries in that list, then they all get wiped and the process restarts.

        • Damon says:

          Hi – so, how did you change your variables to solve this ? Currently, I am using (at home):

          FULL_IF_OLDER_THAN="14D"
          CLEAN_UP_TYPE="remove-older-than"
          CLEAN_UP_VARIABLE="28D"

          I think this should keep two full backups — however, less than 28 days ago I switched to a new home server so I didn’t keep the old logs … will have to wait until I hit 28 days to see how it works. Am curious how you remedied this and what you would recommend.

          Thanks,
          Damon

          • T D says:

            I chose 31D because it was much longer than 14D. I don’t know if the ideal number is 28D, 31D, or anything else for that matter. I’m still waiting to see if the behavior appears again with the new variable.

            That said, I think the better solution here is to use the –archive-dir option so that duplicity need not download the remote files to compute hashes regardless. I just made that modification:
            line 228, before –encrypt-key:
            --archive-dir=${LOCAL_ARCHIVE_DIR} \
            line 238 (239 after the above addition), before –encrypt-key:
            --archive-dir=${LOCAL_ARCHIVE_DIR} \

            Finally, add this in your header…

            # Provide an optional local archive directory to store copies of your
            # backups. Duplicity uses these local copies in preference to remote ones
            # when calculating hashes. This is useful for minimizing your Amazon S3
            # outgoing bandwidth bill. This archive may be deleted or discarded at any
            # time without affecting your remote backups - it's only used for the sake of
            # minimizing data transfer.

            LOCAL_ARCHIVE_DIR="/usr/home/bos/s3archive"

            Just tested and it seems to work, although you’ll need some time to see the benefits of it (old files on the server do not get automatically mirrored, just new ones are copied over as they’re made)

            • T D says:

              I can confirm that using the –archive-dir option has solved the issue with duplicity downloading old chunks to compare / compute hashes. Now, the only outbound data transfer from S3 is the verification download duplicity does after uploading a new diff — meaning I transfer in and out about the same amount of data, and that amount is just the diff.

  17. John says:

    Hi Damon,

    I get the same errors without the –full option. Any idea on when you will have a version available that would display the commands? I may put some echo’s in there myself.

    Oh yeah i am trying to run this on CentOS 5.2. Any issues with that OS and this script?

    Thanks,

    • Damon says:

      Well – you could try to bump up the verbosity in the script to -V9 and see if it that has any more details … something odd is happening because you are getting errors both from s3cmd and duplicity … I would typically expect you to have one or the other. Have you expiremented with either command line utility outside of using it within the script? Like, just done some test runs yourself before plugging in the variables?

      I dont’ have CentOS and haven’t tried it; I’ve only run it on Mac OS X and Ubuntu. My guess is maybe something is going on non-related to the script itself … of course, I could be wrong. What version of duplicity are you using ?

      • John says:

        Hey Damon,

        the script works fine when backing up to a locale directory. I havnt tried the commands alone yet, I will try this tonight. I will also try the -v9.

        Thanks,

        • Damon says:

          My guess is that the S3 info isn’t correct — because you are getting errors both from s3cmd and duplicity … I would double check your key pairs and make sure you are selecting a very random bucket name (or it may already be taken.

          • John says:

            For the bucket name does it have to have http:// DEST="s3+http://group-backup-01/"
            or would DEST="s+.images" work for this script?

            thanks,

            • Damon says:

              Hi John – sorry, your comment got held for moderation … I don’t know why … maybe because it thinks the http:// stuff is you spamming me with links. Heh. Anyway – the duplicity man page gives two options:

              s3://host/bucket_name[/prefix]
              s3+http://bucket_name[/prefix]

              I have to assume either one will work … not sure where you are headed with the .images bit … needs to be a name of your bucket.

  18. John says:

    Hey,

    anyone happen to know what is going on with the below?

    ./DT-S3-Backup-v2.sh --full
    Traceback (most recent call last):
      File "/usr/bin/duplicity", line 463, in ?
        with_tempdir(main)
      File "/usr/bin/duplicity", line 458, in with_tempdir
        fn()
      File "/usr/bin/duplicity", line 444, in main
        full_backup(col_stats)
      File "/usr/bin/duplicity", line 155, in full_backup
        bytes_written = write_multivol("full", tarblock_iter, globals.backend)
      File "/usr/bin/duplicity", line 87, in write_multivol
        globals.gpg_profile,globals.volsize)
      File "/usr/lib64/python2.4/site-packages/duplicity/gpg.py", line 213, in GPGWriteFile
        data = block_iter.next(bytes_to_go).data
      File "/usr/lib64/python2.4/site-packages/duplicity/diffdir.py", line 407, in next
        result = self.process(self.input_iter.next(), size)
      File "/usr/lib64/python2.4/site-packages/duplicity/diffdir.py", line 284, in get_delta_iter_w_sig
        sigTarFile.close()
      File "/usr/lib64/python2.4/site-packages/duplicity/tarfile.py", line 508, in close
        self.fileobj.write("" * (RECORDSIZE - remainder))
      File "/usr/lib64/python2.4/site-packages/duplicity/dup_temp.py", line 101, in write
        return self.fileobj.write(buf)
      File "/usr/lib64/python2.4/site-packages/duplicity/gpg.py", line 125, in write
        return self.gpg_input.write(buf)
    IOError: [Errno 32] Broken pipe
    close failed: [Errno 32] Broken pipe
    No old backup sets found, nothing deleted.
     
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
        An unexpected error has occurred.
      Please report the following lines to:
      s3tools-general@lists.sourceforge.net
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
     
    S3cmd:  0.9.8.4
    Python: 2.4.3 (#1, May 24 2008, 13:57:05)  [GCC 4.1.2 20070626 (Red Hat 4.1.2-14)]
     
    Traceback (most recent call last):
      File "/usr/bin/s3cmd", line 1070, in ?
        main()
      File "/usr/bin/s3cmd", line 1049, in main
        cmd_func(args)
      File "/usr/bin/s3cmd", line 47, in cmd_du
        subcmd_bucket_usage(s3, uri)
      File "/usr/bin/s3cmd", line 73, in subcmd_bucket_usage
        if S3.codes.has_key(e.Code):
    AttributeError: S3Error instance has no attribute 'Code'
     
     
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
        An unexpected error has occurred.
        Please report the above lines to:
      s3tools-general@lists.sourceforge.net
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    Thanks,

    • Damon says:

      Hi John – are you able to successfully run the script without using the --full option? I am not on my home machine so can’t double check, but maybe there is an issue when --full is run without any previous “normal” backups? Not sure, to be honest …

      One thing I would like to add is the ability to capture, via standard out, the exact command used to run duplicity via the script … that way we can see if something funny went got mixed in there … and also, then the real duplicity folks could help troubleshoot it.

  19. Damon says:

    @ Alvaro: that’s an interesting suggestion … I, obviously, didn’t know that! I guess, in part, my thinking on this script was that if someone is on my box (physically or has hacked into the system) they have pretty direct access to all my files anyway!

    But, maybe if someone were being more clever than me, they could hide the files and these unset variables would tip off a would-be (or already-be) intruder. Anyway, I will change that on the script and when I get some more fixes, will include it in version 3. Thanks!

  20. Alvaro says:

    Hey, i gotta say it’s a very nice script.

    But just for the record, i usually do “unset var” instead of “export var=”. You may ask what’s the difference.

    (Maybe there’re associated problems; in my environment works great)

    If anyone hacks into your system (or it’s a multiuser box), some one could see you have FOO,BAR and FOOBAR vars empty. Then check what time do you use them, what values do they have, etc. In this case, it seems pretty valuable info there, so (me, personally) i prefer to be paranoic.

    Anyway, keep up the good work!

  21. Charlie says:

    If I understand it right, the hash data is stored locally (rather than remotely) to speed-up the process?

    It stores it locally *and* remotely, but it tries to retrieve it locally first. If it’s not found, then yes, it will go out and get the one in the repository. Now, if they somehow get out of sync, I’m not sure what would happen.

    So far my “implementation” is just commenting out the GPG lines and hard-coding in the additional options inside duplicity_backup(). When I get it working, I’ll try to clean it up.

    The real challenge I’m having is trying to get duplicity to follow symlinks when backing up.

    What I’m trying to implement is a usb thumbdrive backup that I can run from whichever PC I happen to be plugged into. The challenge is that the drive mappings change. So I added a drive letter parameter so that the script can remove and create the proper link, say from /usb -> /cygdrive/f to /usb -> /cygdrive/h

    but… it’s not working. it’s possible that it’s just not going to work unfortunately.

  22. Damon says:

    @ Charlie: would be interested to see your implementation if you wanted to share it … I looked at the --archive-dir option on the man page and it states:

    When backing up or restoring, specify the local archive directory. This option is not necessary, but if hash data is found locally in path it will be used in preference to the remote hash data. Use of this option does not imply that the archive data is no longer stored in the backup destination, nor that the local archive directory need be kept safe. The local archive directory is a performance optimization only, and may safely be discarded at any time.

    If I understand it right, the hash data is stored locally (rather than remotely) to speed-up the process ? Seems though (as would make sense) that if no local version were found it would revert to the remote copy? Maybe?

  23. Charlie says:

    Very nice script. A few of the modifications that I’m working on/would like to see:

    1) Ability to use straight pass-phrase (symmetric key) rather than GPG
    2) Use of the –archive-dir option to help with performance
    3) Use of –time-separator=. which is required for cygwin use

    Still, very clean, nice script!

  24. Ian Ward says:

    Hi Damon, thanks for adding the license and the updates. If I ever end up making any significant changes for my own need I’ll let you know what I do.

    cheers,
    Ian

  25. Damon says:

    Just uploaded an updated version of the script … added the GPL license and some other stuff.

  26. Damon says:

    @ Ian: you know, I hadn’t really thought about a license, but now that you brought it up, maybe I should go ahead and apply one. Have a preference?

    When I started the script, it seemed so small that a license didn’t seem necessary — I actually was thinking of posting a question to slashdot: when should you apply a license to a script?

    I didn’t know when it became necessary. I wanted to add a couple changes to the script (especially as to how it handles the “full backup” schedule as well as the “cleaning” schedule) … maybe I will do that tonight or tomorrow and add the license then.

    Thanks for bringing it up — that will get me to do it.

  27. Ian Ward says:

    Nice script. I was just wondering if there is any kind of license on the script. Thanks,

    Ian

  28. Damon says:

    @ Betrand: You know, that is a good question. I would try to just run it again and see if it recovers; if not, it’s sure to throw an error or say “switching to full” — in which case it may make sense to start over with a full backup.

  29. Bertrand says:

    Very useful script, thank you !
    One question I have : I have launched a first (full backpup) of 3 GO to Amazon S3 and at around 80% done, it stopped because of a problem on my server. Do you know if I have to redo the whole full backup or I can just restart duplicity to finish it ?

Start a new comment thread

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">