I'm running a simple Bash script that uses rsync
to do an incremental backup of my web server every hour. What I'm looking for is an efficient algorithm to delete the proper backups so that in the end I keep:
Hourly backups for 24 hours.
Daily backups for 1 week.
Weekly backups for 1 month.
Monthly backups from that point on.
I'll figure out when to delete the monthly backups based upon when I run out of space. So we're not worried about that. I'm also familiar with the various wrappers for rsync
like rdiff-backup
and rsnapshot
but those aren't necessary. I prefer to write code myself whenever possible even if it means reinventing the wheel sometimes. At least that way if I get a flat tire I know how to fix it :)
Here's the actual commented code that runs every hour:
#if it's not Sunday and it's not midnight
if [ $(date +%w) -ne 0 ] && [ $(date +%H) -ne 0 ]; then
#remove the backup from one day ago and one week ago
rm -rf $TRG1DAYAGO
rm -rf $TRG1WEEKAGO
fi
#if it's Sunday
if [ $(date +%w) -eq 0 ]; then
#if it's midnight
if [ $(date +%H) -eq 0 ]; then
#if the day of the month is greater than 7
# we know it's not the first Sunday of the month
if [ $(date +%d) -gt 7 ]; then
#delete the previous week's files
rm -rf $TRG1WEEKAGO
fi
#if it's not midnight
else
#delete the previous day and week
rm -rf $TRG1DAYAGO
rm -rf $TRG1WEEKAGO
fi
fi
Basically:
If it's not Sunday and it's not 3am: - delete the backup from one day ago - delete the backup from one week ago If it is Sunday: If it is Midnight: If the day of the month is greater than 7: -delete the backup from one week ago Else (if it's not Midnight) - delete the backup from one day ago - delete the backup from one week ago
That seems to work but I'm wondering if anyone can come up with a simpler, more efficient algorithm or add any ideas for a better way of accomplishing this.
1 Answer 1
That seems to work but I'm wondering if anyone can come up with a simpler, more efficient algorithm or add any ideas for a better way of accomplishing this.
I think so. Use a naming scheme with a common prefix, and a variable suffix depending on the period, for example:
Hourly backups for 24 hours:
hourly-$(date +%H).gz
, results in:hourly-00.gz
hourly-01.gz
hourly-02.gz
- ... and so on until
hourly-23.gz
after which it starts over fromhourly-00.gz
Daily backups for 1 week:
daily-$(date +%a).gz
, results in:daily-Sun.gz
daily-Mon.gz
daily-Tue.gz
- ... and so on until
daily-Sat.gz
, after which it starts over
Weekly backups for 1 month:
weekly-$(($(date +%W) % 4)).gz
, results in:weekly-00.gz
weekly-01.gz
weekly-02.gz
weekly-03.gz
after which it starts over fromweekly-00.gz
You'll never have to delete anything. You will have the same set of files,
and rsync
(with appropriate parameters) will copy only the changed ones.
As for the posted code,
the single biggest problem is the duplicate calls to $(date ...)
with the same parameters:
- Inefficient: multiple unnecessary process executions
- Bad practice: duplicated logic
- Unnecessary and error prone: the multiple calls to
$(date +%w)
(for example) probably expect to get the same result. So there should be only one call, saved in a variable. If the day happens to turn between two calls, may have a nasty bug, and in any case it's completely unintended situation.
-
1\$\begingroup\$ I think you're on to something here. I was using a naming convention based on date and time but not in the way you've done it. And just yesterday a guy mentioned his deletes were taking way too long because he backs up 3TB of data and there were tons of hard links. It seems this would solve that problem also, or help tremendously. \$\endgroup\$user2044510– user20445102015年01月11日 12:42:19 +00:00Commented Jan 11, 2015 at 12:42
-
\$\begingroup\$ @user2044510 I've been doing it this way for years, for daily, weekly and monthly backups. The files are self-rotated this way, every day/week/month precisely one file gets updated, which then gets picked up by
rsync
. The setup is easy to understand, simple, and robust. \$\endgroup\$janos– janos2015年01月11日 12:50:29 +00:00Commented Jan 11, 2015 at 12:50