I'm writing a simple automatic backup/versioning bash script.
It should basically mirror a directory structure somewhere, and then copy files over there when they've been altered. The ghost files should then be named filename.YYYYMMDD-HHMMSS.ext
(the included date/time being the time of last modification).
This is what I came up with - it seems to work already, sort of. but since I've never written shell scripts before, I suspect I might be doing some things fundamentally wrong and inefficiënt.
How can I make this faster? The iterating over files seems to be really slow. Is this robust? It seems to "lock up" without error messages sometimes, and I can't explain or understand why. What am I doing wrong?
I intend to run this as a cron job, every 30 minutes or so.
#!/bin/sh
# 1ドル : backup "root"
# 2ドル : from: directory to be backed up
# 3ドル : to: destination
rpath=1ドル
for f in $(find 2ドル); do
r=$(./rel.sh $rpath $f)
if [ -f $f ]
then
basename=$(basename $f)
dir=$(dirname $r)
name=$(echo $basename | cut -d'.' -f1)
ext=$(echo $basename | cut -d'.' -f2)
mod=$(stat --format=%y $f | awk -F'.' '{printf 1ドル}' | sed 's/[-:]//g' | sed 's/ /-/g')
if [ ! -f "3ドル/$dir/$name.$mod.$ext" ]
then
echo "+f $mod $r"
# cp $f "3ドル/$dir/$name.$mod.$ext"
fi
elif [ -d $f ]
then
if [ ! -d "3ドル/$r" ]
then
echo "+dir $r"
mkdir "3ドル/$r"
fi
fi
done
exit
3 Answers 3
The previous answers addressed some alternative ways of accomplishing the backup and versioning goals; in this answer I'll comment on three or four possible improvements to your script.
• For clarity, I prefer at the start of a script like this to copy all of the 1,ドル 2,ドル 3ドル
parameters to named variables, as you did for the first of them.
• The find 2ドル
lists all the files and directories in 2ドル and below, but presumably the bulk of those files will have been treated already in previous runs. For example, to avoid processing files older than second-previous run, in each run write a time-marker file and use -newer
:
mv Mark1 Mark0; mv Mark2 Mark1; touch Mark2
find -newer Mark0 ...
• Your script runs four new processes while setting basename, dir, name
, and ext
. To avoid all those separate processes, use various shell parameter expansions as below. (Eg, when f=/home/tx.7/xyz.axi.pan, these produce xyz.axi.pan
, /home/tx.7
, xyz.axi
, and pan
, respectively. Note, these expansions should work for all the files in your directories as listed by find
, but if used in other scripts will stumble when given names like .
or /
or some other edge cases.)
basename=${f##*/}
dir=${f%/*}
name=${basename%.*}
ext=${basename##*.}
• The for f in $(find 2ドル)
structure is likely to produce a large list of file names and then process it. That list need not be stored if you instead use (eg)
find 2ドル -exec filescript '{}' 3ドル \;
where filescript
represents a separate script that does the stuff found inside your current for
loop. Of course you can also add -newer Mark2
to the find
command:
find 2ドル-newer Mark2 -exec filescript '{}' 3ドル \;
I would expect using -exec
to be faster than using the shell for
loop, but it might be worthwhile to run a timing test.
-
\$\begingroup\$ Thank you very much for your constructive feedback and suggestions. I'll definetly look into them and see how they affect the script. I'm really glad you took the time to correct this script, instead of offering alternative versioning tools. It allows me to learn and better understand shell scripting. \$\endgroup\$jorenl– jorenl2012年09月23日 22:44:41 +00:00Commented Sep 23, 2012 at 22:44
Implementing solutions to problems like this is good for exercise purposes. It is good for learning.
But it is almost never good to re-invent the wheel in production systems. Please have a look at well-established data synchronization software such as rsync and perform some research around it and other data synchronization/backup/snapshot techniques.
So, your question is good until "I intend to run this as a cron job, every 30 minutes or so." :-) I am not that bash expert myself, to others might point out the potential weaknesses of your approach.
-
2\$\begingroup\$ Thank you for your feedback! It isn't really a production environment, and I'm definetly interested in learning. \$\endgroup\$jorenl– jorenl2012年09月23日 20:27:07 +00:00Commented Sep 23, 2012 at 20:27
why not just tar up the entire folder:
tar -cvzf backup-`date +%Y-%m-%d`.tar.gz /path/to/backup
or use find to do it all
# in script go to path
cd /path/to/folder;
# find all folders and make them in /tmp/backup-todays-date
find . -type d -exec mkdir /tmp/backup`date +%Y-%m-%d`/{} \;
# cp all files from current path to /tmp/backup-todays-date
find . -type f -exec cp {} /tmp/backup`date +%Y-%m-%d`/{} \;
ls -l /tmp/backup2012-09-23/
using top method produced 1 file for all content and then you could use logrotate to rotate tar files after 10 backups or something. so you don't have backup's filling up disk
The 2nd method will continue on creating /tmp/backup-date folders no easy way of managing this unless you wrote another script to monitor them that did something like
find /tmp/backup* -mtime +10 -exec rm -rf {} \;
the initial rsync suggestion is great for server to server copy and could be used for local copying too, the thing that stood out - is your script adding dates to files which means you wish to be able to look back on same file X amount days ago, doing rsync version control visit http://www.howtoforge.com/backing-up-with-rsync-and-managing-previous-versions-history
The alternative if files are being backed up for version control purposes would be to use something like svn(subversion) or git to check in/out files this way there is an external process managing changes.
and finally using tar to do same thing
mkdir /tmp/backup`date +%Y-%m-%d`; (cd /path/to/backup; tar -cvzf - .) | (cd /tmp/backup`date +%Y-%m-%d`; tar -xvzf -)
all the best
-
\$\begingroup\$ Thank you for your time and answer. My script isn't just creating periodic copies of the whole folder, and it's not what I'm intending to do. It's storing ghost copies of files only when they have been modified. I know and use svn and git and it's also definetly not what I want to be using here. I'd just like to know how to make something like this. The environment I'll be using it in is not professional or in production. \$\endgroup\$jorenl– jorenl2012年09月23日 22:30:32 +00:00Commented Sep 23, 2012 at 22:30
-
\$\begingroup\$ cp -p preserves permissions and file dates, tar will also perserve this too. so either the final option of using tar or the outlined find but use -p. But if you are only interested in when files have been modified then rsync is your best bet \$\endgroup\$vahid– vahid2012年09月24日 08:38:15 +00:00Commented Sep 24, 2012 at 8:38
-
\$\begingroup\$ how can i forget also if you were re-inventing the wheel a much easier way is to md5sum both files and if different then to copy. \$\endgroup\$vahid– vahid2012年09月24日 19:50:41 +00:00Commented Sep 24, 2012 at 19:50