Yet another bash backup script, using rsync --link-dest

Question 1

I’ve written this bash backup script. It uses the --link-dest option of rsync; that way, the user have access to the backed data at any time stamped with relatively affordable data overhead. Any duplicated data should be hard linked; the overhead mostly come from the directory structure.

It’s mostly based on this very nice guide by Mike Rubel and various other contributors, as well as a few answers from Unix SE and other web reference.

The script is meant to be run at regular (typically, hourly) intervals with cron and other scripts are in charge of safely keeping daily/weekly backups.

Of course, I want to minimize the size of the backups. I also want to delete older backups before newer ones. To do so, I build ${backups}, an array of every backup¹ sorted by modification date from newest to oldest. Thus ${backups[0]} (if it exists) is the latest complete backup and ${backups[@:$n]} for some integer $n lists every backup but the $n newest (from 0 to $n - 1).

As always with bash scripts, I’m especially afraid of quoting issues, but any remark is welcome.

I in particular quite dislike how I use find with both -mindepth and maxdepth, but couldn’t find any way around it.

Most of the "standard" commands, such as cut, sort or grep, are provided by BusyBox 1.16.1 and may not have every option available on most recent Linux distribution. cut, in particular, does not understand the -d option, hence the ugly tr trick.

#!/bin/bash
# Check if we are root (no one else should run this)
# ==================================================
if (( $(id -u) != 0 )); then
 echo "/ ! \ Only root can run this script. Backup cancelled." >&2
 exit 3
fi
# Functions
# =========
# Given a date (or a placeholder), returns the corresponding hourly backup name
function scheme {
 local token="$@"
 echo "hourly ${token}"
}
# Parameters
# ==========
# password_file=/etc/backup/passwd # Network yet untested
backup_directory=/path/to/backups # ABSOLUTE PATH required
source_directory=/path/to/data
backup_count=24 # number of backups to keep
# Name new daily backup
# =====================
new_backup="${backup_directory}"/$(scheme $(date +"%-d-%m-%Y a %Hh%M") )
# Check that we can run
# =====================
# Check that we don’t overwrite anything
if [[ -e "${new_backup}" ]]; then
 echo "/ ! \ ${new_backup} already exists! We don’t want to overwrite it; backup cancelled." >&2
 exit 4
fi
# Create the directory which contains all the backups if it doesn’t exist yet
if [[ ! -e "${backup_directory}" ]]; then
 echo "Creating directory ${backup_directory}"
 mkdir -p "${backup_directory}"
elif [[ ! -d "${backup_directory}" ]]; then
 echo "/ ! \ Destination ${backup_directory} already exists but is not a directory! Backup cancelled." >&2
 exit 4
fi
# Create a temporary working directory
# ====================================
temp_backup=$(mktemp -d -p "${backup_directory}")
# Manage previous backups
# =======================
# List every previous backup and put it into an array
backups=()
while read -r -d ''; do
 backups+=("${REPLY}")
done < <( find "${backup_directory}" -mindepth 1 -maxdepth 1 -name "$(scheme \*)" -printf "%A@:%p0円" | \
 sort -z -t: -n -r | \
 tr '\n0円' '0円\n' | cut -d: -f2 - | tr '\n0円' '0円\n' \
 )
# If it exists, select the latest backup as a reference for rsync --link-dest
if (( ${#backups[@]} > 0 )); then
 latest_backup="${backups[0]}"
else
 latest_backup=""
fi
# Compute the backups to remove
# We add one backup before cleaning up
# Thus, we keep $backup_count - 1 from the ${backups[@]}
old_backups=("${backups[@]:${backup_count} - 1}")
# Cleanup function
# ================
# We now have everything we need to define a cleanup function
# It will be called only if the backup succeeds
function cleanup {
 echo
 echo "Cleaning up"
 echo "==========="
 echo
 if (( ${#old_backups[@]} > 0 )); then
 echo "Deleting ${#old_backups[@]} backup(s)!"
 echo
 # echo rm -rf "${old_backups[@]}"
 (set -x; rm -rf "${old_backups[@]}")
 else
 echo "There is nothing to delete."
 fi
}
# User feedback
# =============
echo "Backing up ${source_directory}"
echo "Backing up ${source_directory}" | sed "s/./=/g"
echo
echo "New backup: ${new_backup}"
# Setting up rsync options
# ========================
RSYNC_FLAGS=("--archive" "--stats")
# Set rsync --password-file if the matching variable is defined and
# we are using rsync (::) **YET UNTESTED**
if [[ "${password_file}" != "" && "${source_directory}" =~ "::" ]]; then
 RSYNC_FLAGS+=("--password-file=${password_file}")
fi
# Use rsync to backup. If a previous backup exists,
# uses --link-dest to hard link to it.
if [[ "${latest_backup}" != "" ]]; then
 echo "Previous backup: ${latest_backup}"
 RSYNC_FLAGS+=("--link-dest=${latest_backup}")
else
 echo "This is the first backup ever, it might take a while."
fi
echo
# Backing-up
# ==========
# TODO Check if something was actually written before creating a new backup
# TODO Add an exclusion file
(set -x; rsync "${RSYNC_FLAGS[@]}" "${source_directory}" "${temp_backup}") && \
 (set -x; mv "${temp_backup}" "${new_backup}") && cleanup
echo

Actually the "name of the repository which contains the backup", of course.

Question 2

Been writing shell script for 20 years and only use find when there is no other way. The syntax is just f***ing bizarre to me and thus highly error-prone. Good job on thorough use of double-quotes. A | alone will also do line continuation in bash.

Question 3

@DocSalvager Any alternative to find is very welcome

Question 4

Edited to precise that the order of ${backup} is meaningful at that wasn’t clear in the previous version.

Question 5

The script is nicely written. I only have minor suggestions that are barely more than nitpicks.

Function declaration style

Instead of this:

function scheme {

The generally preferred style for declaring functions is this:

scheme() {

Redundant local variable

The local variable token is redundant here:

function scheme {
 local token="$@"
 echo "hourly ${token}"
}

You could simplify to:

echo "hourly $@"

Simplify condition

This condition can be simplified:

if (( ${#backups[@]} > 0 )); then
 latest_backup="${backups[0]}"
else
 latest_backup=""
fi

To just this:

latest_backup="${backups[0]}"

Instead of this:

if [[ "${password_file}" != "" ]]; then

You can omit the != "":

if [[ "${password_file}" ]]; then

Don't repeat yourself

The echo statement is duplicated for the sake of underlining:

echo "Backing up ${source_directory}"
echo "Backing up ${source_directory}" | sed "s/./=/g"

It would be good to create a helper function for this purpose:

print_heading() {
 echo "$@"
 echo "$@" | sed "s/./=/g"
}

Question 6

This looks exceptionally good. But per your request, I see a few improvement possibilities...

Backup file name

The scheme() function is not necessary unless you need it to do several other operations not shown.

The 'Command Substitution' used to build the string should also be within the quotes to avoid unexpected interpretation by the shell.

Spaces in filenames require total accuracy in quoting to keep straight, which is the most confusing part of bash scripting, so your life will be a lot easier if you can avoid them.

Note too that each Command Substition $(...) is a new context. So we can use double-quotes within them without escaping. Don't be confused by the IDE reversing the colors at each level. That's just the way they work.

So these lines...

backup_directory="/path/to/backups"
 :
new_backup="${backup_directory}"/$(scheme $(date +"%-d-%m-%Y a %Hh%M") )

Would be more reliable like this...

backup_directory="/path/to/backups"
 :
new_backup="${backup_directory}/hourly_$(date +"%Y-%m-%d_a_%Hh%M")"

Running this snippet and echoing $new_backup gives me...

path/to/backups/hourly_2016年05月03日_a_05h26

Alternative to `find`

A better solution here relies on two features of bash that are not well understood...

Pathname Expansion - Pattern Matching
Wildcard expansion is done by bash before it is sent to any command preceding it. We thus don't need find or ls or anything else to get a list of the files in a directory. If we need the full path though, we do need to prefix it on each one with something like printf.
printf applies format to all arguments
Printf has an odd feature that's just the thing we need here. From the manpage...

The format is reused as necessary to consume all of the arguments.

Printf will reuse the format string on each filename returned by Pathname Expansion.

Thus this code...

backups=()
while read -r -d ''; do
 backups+=("${REPLY}")
done < <( find "${backup_directory}" -mindepth 1 -maxdepth 1 -name "$(scheme \*)" -printf "%A@:%p0円" | \
 sort -z -t: -n -r | \
 tr '\n0円' '0円\n' | cut -d: -f2 - | tr '\n0円' '0円\n' \
 )

Could be replaced with...

backups=()
while read -r -d ''; do
 backups+=("${REPLY}")
done < <( printf "%s\n" "${backup_directory}"/* | sort -r )

The input to the while loop should look something like this...

> printf "%s\n" "${backup_directory}"/* | sort -r
path/to/backups/hourly_2016年05月03日_a_05h29
path/to/backups/hourly_2016年05月03日_a_05h26
path/to/backups/hourly_2016年05月03日_a_05h25

Question 7

This wasn’t clear in the original question so I edited it in: I want ${backup} to contain the names of the backups sorted by from newest to oldest. I could use the file names to do that if I used a date s.t. alphabetical order is chronological order by reversing ${backup}, but I want to allow more user friendly names. I tried using stat --printf kind of like your printf, but its behaviour with no argument is not to do nothing.

Question 8

I've modified the date format and added sort -r to present old backups in reverse order as requested.

janos janos 113k15 gold badges154 silver badges396 bronze badges · Accepted Answer · 2016-04-29 20:23:00Z

The script is nicely written. I only have minor suggestions that are barely more than nitpicks.

Function declaration style

Instead of this:

function scheme {

The generally preferred style for declaring functions is this:

scheme() {

Redundant local variable

The local variable token is redundant here:

function scheme {
 local token="$@"
 echo "hourly ${token}"
}

You could simplify to:

echo "hourly $@"

Simplify condition

This condition can be simplified:

if (( ${#backups[@]} > 0 )); then
 latest_backup="${backups[0]}"
else
 latest_backup=""
fi

To just this:

latest_backup="${backups[0]}"

Instead of this:

if [[ "${password_file}" != "" ]]; then

You can omit the != "":

if [[ "${password_file}" ]]; then

Don't repeat yourself

The echo statement is duplicated for the sake of underlining:

echo "Backing up ${source_directory}"
echo "Backing up ${source_directory}" | sed "s/./=/g"

It would be good to create a helper function for this purpose:

print_heading() {
 echo "$@"
 echo "$@" | sed "s/./=/g"
}

Stack Exchange Network

Yet another bash backup script, using rsync --link-dest

2 Answers 2

Function declaration style

Redundant local variable

Simplify condition

Don't repeat yourself

Backup file name

Alternative to `find`

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Yet another bash backup script, using rsync --link-dest

2 Answers 2

Function declaration style

Redundant local variable

Simplify condition

Don't repeat yourself

Backup file name

Alternative to find

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Alternative to `find`