3
\$\begingroup\$

I'm running a simple Bash script that uses rsync to do an incremental backup of my web server every hour. What I'm looking for is an efficient algorithm to delete the proper backups so that in the end I keep:

  • Hourly backups for 24 hours.

  • Daily backups for 1 week.

  • Weekly backups for 1 month.

  • Monthly backups from that point on.

I'll figure out when to delete the monthly backups based upon when I run out of space. So we're not worried about that. I'm also familiar with the various wrappers for rsync like rdiff-backup and rsnapshot but those aren't necessary. I prefer to write code myself whenever possible even if it means reinventing the wheel sometimes. At least that way if I get a flat tire I know how to fix it :)

Here's the actual commented code that runs every hour:

#if it's not Sunday and it's not midnight
if [ $(date +%w) -ne 0 ] && [ $(date +%H) -ne 0 ]; then 
 #remove the backup from one day ago and one week ago
 rm -rf $TRG1DAYAGO
 rm -rf $TRG1WEEKAGO
fi
#if it's Sunday
if [ $(date +%w) -eq 0 ]; then
 #if it's midnight
 if [ $(date +%H) -eq 0 ]; then
 #if the day of the month is greater than 7 
 # we know it's not the first Sunday of the month
 if [ $(date +%d) -gt 7 ]; then
 #delete the previous week's files
 rm -rf $TRG1WEEKAGO
 fi
 #if it's not midnight
 else
 #delete the previous day and week
 rm -rf $TRG1DAYAGO
 rm -rf $TRG1WEEKAGO
 fi
fi

Basically:

If it's not Sunday and it's not 3am:
 - delete the backup from one day ago
 - delete the backup from one week ago
If it is Sunday:
 If it is Midnight:
 If the day of the month is greater than 7:
 -delete the backup from one week ago
 Else (if it's not Midnight)
 - delete the backup from one day ago
 - delete the backup from one week ago

That seems to work but I'm wondering if anyone can come up with a simpler, more efficient algorithm or add any ideas for a better way of accomplishing this.

janos
113k15 gold badges154 silver badges396 bronze badges
asked Jan 9, 2015 at 17:10
\$\endgroup\$
0

1 Answer 1

2
\$\begingroup\$

That seems to work but I'm wondering if anyone can come up with a simpler, more efficient algorithm or add any ideas for a better way of accomplishing this.

I think so. Use a naming scheme with a common prefix, and a variable suffix depending on the period, for example:

  • Hourly backups for 24 hours: hourly-$(date +%H).gz, results in:

    • hourly-00.gz
    • hourly-01.gz
    • hourly-02.gz
    • ... and so on until hourly-23.gz after which it starts over from hourly-00.gz
  • Daily backups for 1 week: daily-$(date +%a).gz, results in:

    • daily-Sun.gz
    • daily-Mon.gz
    • daily-Tue.gz
    • ... and so on until daily-Sat.gz, after which it starts over
  • Weekly backups for 1 month: weekly-$(($(date +%W) % 4)).gz, results in:

    • weekly-00.gz
    • weekly-01.gz
    • weekly-02.gz
    • weekly-03.gz after which it starts over from weekly-00.gz

You'll never have to delete anything. You will have the same set of files, and rsync (with appropriate parameters) will copy only the changed ones.

As for the posted code, the single biggest problem is the duplicate calls to $(date ...) with the same parameters:

  • Inefficient: multiple unnecessary process executions
  • Bad practice: duplicated logic
  • Unnecessary and error prone: the multiple calls to $(date +%w) (for example) probably expect to get the same result. So there should be only one call, saved in a variable. If the day happens to turn between two calls, may have a nasty bug, and in any case it's completely unintended situation.
answered Jan 10, 2015 at 22:55
\$\endgroup\$
2
  • 1
    \$\begingroup\$ I think you're on to something here. I was using a naming convention based on date and time but not in the way you've done it. And just yesterday a guy mentioned his deletes were taking way too long because he backs up 3TB of data and there were tons of hard links. It seems this would solve that problem also, or help tremendously. \$\endgroup\$ Commented Jan 11, 2015 at 12:42
  • \$\begingroup\$ @user2044510 I've been doing it this way for years, for daily, weekly and monthly backups. The files are self-rotated this way, every day/week/month precisely one file gets updated, which then gets picked up by rsync. The setup is easy to understand, simple, and robust. \$\endgroup\$ Commented Jan 11, 2015 at 12:50

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.