Can I speed up this simple versioning/backup script?

Question 1

I'm writing a simple automatic backup/versioning bash script.

It should basically mirror a directory structure somewhere, and then copy files over there when they've been altered. The ghost files should then be named filename.YYYYMMDD-HHMMSS.ext (the included date/time being the time of last modification).

This is what I came up with - it seems to work already, sort of. but since I've never written shell scripts before, I suspect I might be doing some things fundamentally wrong and inefficiënt.

How can I make this faster? The iterating over files seems to be really slow. Is this robust? It seems to "lock up" without error messages sometimes, and I can't explain or understand why. What am I doing wrong?

I intend to run this as a cron job, every 30 minutes or so.

#!/bin/sh
# 1ドル : backup "root"
# 2ドル : from: directory to be backed up
# 3ドル : to: destination
rpath=1ドル
for f in $(find 2ドル); do
 r=$(./rel.sh $rpath $f)
 if [ -f $f ]
 then
 basename=$(basename $f)
 dir=$(dirname $r)
 name=$(echo $basename | cut -d'.' -f1)
 ext=$(echo $basename | cut -d'.' -f2)
 mod=$(stat --format=%y $f | awk -F'.' '{printf 1ドル}' | sed 's/[-:]//g' | sed 's/ /-/g')
 if [ ! -f "3ドル/$dir/$name.$mod.$ext" ]
 then
 echo "+f $mod $r"
 # cp $f "3ドル/$dir/$name.$mod.$ext"
 fi
 elif [ -d $f ]
 then
 if [ ! -d "3ドル/$r" ]
 then
 echo "+dir $r"
 mkdir "3ドル/$r"
 fi
 fi
done
exit

Question 2

The previous answers addressed some alternative ways of accomplishing the backup and versioning goals; in this answer I'll comment on three or four possible improvements to your script.

• For clarity, I prefer at the start of a script like this to copy all of the 1,ドル 2,ドル 3ドル parameters to named variables, as you did for the first of them.
• The find 2ドル lists all the files and directories in 2ドル and below, but presumably the bulk of those files will have been treated already in previous runs. For example, to avoid processing files older than second-previous run, in each run write a time-marker file and use -newer:

mv Mark1 Mark0; mv Mark2 Mark1; touch Mark2
find -newer Mark0 ...

• Your script runs four new processes while setting basename, dir, name, and ext. To avoid all those separate processes, use various shell parameter expansions as below. (Eg, when f=/home/tx.7/xyz.axi.pan, these produce xyz.axi.pan, /home/tx.7, xyz.axi, and pan, respectively. Note, these expansions should work for all the files in your directories as listed by find, but if used in other scripts will stumble when given names like . or / or some other edge cases.)

 basename=${f##*/}
 dir=${f%/*}
 name=${basename%.*}
 ext=${basename##*.}

• The for f in $(find 2ドル) structure is likely to produce a large list of file names and then process it. That list need not be stored if you instead use (eg)

find 2ドル -exec filescript '{}' 3ドル \;

where filescript represents a separate script that does the stuff found inside your current for loop. Of course you can also add -newer Mark2 to the find command:

find 2ドル-newer Mark2 -exec filescript '{}' 3ドル \;

I would expect using -exec to be faster than using the shell for loop, but it might be worthwhile to run a timing test.

Question 3

Thank you very much for your constructive feedback and suggestions. I'll definetly look into them and see how they affect the script. I'm really glad you took the time to correct this script, instead of offering alternative versioning tools. It allows me to learn and better understand shell scripting.

Question 4

Implementing solutions to problems like this is good for exercise purposes. It is good for learning.

But it is almost never good to re-invent the wheel in production systems. Please have a look at well-established data synchronization software such as rsync and perform some research around it and other data synchronization/backup/snapshot techniques.

So, your question is good until "I intend to run this as a cron job, every 30 minutes or so." :-) I am not that bash expert myself, to others might point out the potential weaknesses of your approach.

Question 5

Thank you for your feedback! It isn't really a production environment, and I'm definetly interested in learning.

Question 6

why not just tar up the entire folder:

tar -cvzf backup-`date +%Y-%m-%d`.tar.gz /path/to/backup

or use find to do it all

# in script go to path
cd /path/to/folder;
# find all folders and make them in /tmp/backup-todays-date
find . -type d -exec mkdir /tmp/backup`date +%Y-%m-%d`/{} \; 
# cp all files from current path to /tmp/backup-todays-date
find . -type f -exec cp {} /tmp/backup`date +%Y-%m-%d`/{} \;

ls -l /tmp/backup2012-09-23/

using top method produced 1 file for all content and then you could use logrotate to rotate tar files after 10 backups or something. so you don't have backup's filling up disk

The 2nd method will continue on creating /tmp/backup-date folders no easy way of managing this unless you wrote another script to monitor them that did something like

find /tmp/backup* -mtime +10 -exec rm -rf {} \;

the initial rsync suggestion is great for server to server copy and could be used for local copying too, the thing that stood out - is your script adding dates to files which means you wish to be able to look back on same file X amount days ago, doing rsync version control visit http://www.howtoforge.com/backing-up-with-rsync-and-managing-previous-versions-history

The alternative if files are being backed up for version control purposes would be to use something like svn(subversion) or git to check in/out files this way there is an external process managing changes.

and finally using tar to do same thing

mkdir /tmp/backup`date +%Y-%m-%d`; (cd /path/to/backup; tar -cvzf - .) | (cd /tmp/backup`date +%Y-%m-%d`; tar -xvzf -)

all the best

Question 7

Thank you for your time and answer. My script isn't just creating periodic copies of the whole folder, and it's not what I'm intending to do. It's storing ghost copies of files only when they have been modified. I know and use svn and git and it's also definetly not what I want to be using here. I'd just like to know how to make something like this. The environment I'll be using it in is not professional or in production.

Question 8

cp -p preserves permissions and file dates, tar will also perserve this too. so either the final option of using tar or the outlined find but use -p. But if you are only interested in when files have been modified then rsync is your best bet

Question 9

how can i forget also if you were re-inventing the wheel a much easier way is to md5sum both files and if different then to copy.

James Waldby - jwpat7 James Waldby - jwpat7 3212 silver badges5 bronze badges · Accepted Answer · 2012-09-23 22:27:32Z

The previous answers addressed some alternative ways of accomplishing the backup and versioning goals; in this answer I'll comment on three or four possible improvements to your script.

• For clarity, I prefer at the start of a script like this to copy all of the 1,ドル 2,ドル 3ドル parameters to named variables, as you did for the first of them.
• The find 2ドル lists all the files and directories in 2ドル and below, but presumably the bulk of those files will have been treated already in previous runs. For example, to avoid processing files older than second-previous run, in each run write a time-marker file and use -newer:

mv Mark1 Mark0; mv Mark2 Mark1; touch Mark2
find -newer Mark0 ...

• Your script runs four new processes while setting basename, dir, name, and ext. To avoid all those separate processes, use various shell parameter expansions as below. (Eg, when f=/home/tx.7/xyz.axi.pan, these produce xyz.axi.pan, /home/tx.7, xyz.axi, and pan, respectively. Note, these expansions should work for all the files in your directories as listed by find, but if used in other scripts will stumble when given names like . or / or some other edge cases.)

 basename=${f##*/}
 dir=${f%/*}
 name=${basename%.*}
 ext=${basename##*.}

• The for f in $(find 2ドル) structure is likely to produce a large list of file names and then process it. That list need not be stored if you instead use (eg)

find 2ドル -exec filescript '{}' 3ドル \;

where filescript represents a separate script that does the stuff found inside your current for loop. Of course you can also add -newer Mark2 to the find command:

find 2ドル-newer Mark2 -exec filescript '{}' 3ドル \;

I would expect using -exec to be faster than using the shell for loop, but it might be worthwhile to run a timing test.

Thank you very much for your constructive feedback and suggestions. I'll definetly look into them and see how they affect the script. I'm really glad you took the time to correct this script, instead of offering alternative versioning tools. It allows me to learn and better understand shell scripting.

Stack Exchange Network

Can I speed up this simple versioning/backup script?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Can I speed up this simple versioning/backup script?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions