I work on ubuntu machine and my backup requirements are straightforward. My only requirement is the usual copy paste, but only changed files (ie. the files whose modification-time OR size has changed) should get replaced.
Since I could not find such option in the default nautilus copy-paste (it only allows a merge with a blanket replace), I decided to write a backup script in python myself that I would like to get reviewed. Here is the script for backup.py:
#!/usr/bin/env python
#@module: backup.py
#@description: Script to take backup to a fixed location
#@author: Prahlad Yeri
#@copyright: MIT Licensed
#from __future__ import print_function
import os
import os.path
import sys
import time
from datetime import datetime
import shutil
backup_loc = '/media/username/1tera/backup'
#backup_loc = '/tmp/backup'
locations = ['/home/username/docs',
'/home/username/source',
'/home/username/scripts',
'/home/username/library',
'/home/username/programs',
'/home/username/staging',
'/home/username/soft',
'/home/username/Desktop',
'/home/username/Downloads',
'/home/username/movies',
'/home/username/songs',
]
if __name__ == "__main__":
#loop thru the folders
start = time.clock()
num=0
for s in locations: #[0:1]:
#print s + "\n"
#files = os.listdir(s)
print 'listing for ' + s
for (root, dirs, files) in os.walk(s):
subpath = root.replace('/home/prahlad','')
for f in files:
filename = os.path.join(root, f)
dfilename = backup_loc + subpath + os.sep + f
link = ''
if os.path.islink(filename):
link = os.readlink(filename)
if not os.path.exists(dfilename):
#check dirs
if not os.path.exists(backup_loc + subpath):
os.makedirs(backup_loc + subpath)
print 'creating directory: ' + backup_loc + subpath
#just copy the files
print 'copying from: ' + filename
print 'to: ' + dfilename
if link == '':
shutil.copy2(filename, dfilename)
else:
os.symlink(link, dfilename)
num+=1
else:
sz = os.path.getsize(filename); lm = datetime.fromtimestamp(os.path.getmtime(filename)).timetuple()
dsz = os.path.getsize(dfilename); dlm = datetime.fromtimestamp(os.path.getmtime(dfilename)).timetuple()
if (sz == dsz and lm == dlm):
print 'skipped: ' + dfilename
#time.sleep(3)
else:
#copy the files
print 'copying from: ' + filename
print 'to: ' + dfilename
if link == '':
shutil.copy2(filename, dfilename)
else:
os.symlink(link, dfilename)
num+=1
mins = (time.clock() - start)
#print "All files copied in %d minutes" % mins
print "{0} files copied in {1} minutes".format(int(num), round(mins))
-
2\$\begingroup\$ Why not just use rsync? It automatically doesn't copy files that are the same at the destination. \$\endgroup\$whereswalden– whereswalden2014年06月26日 14:28:21 +00:00Commented Jun 26, 2014 at 14:28
-
\$\begingroup\$ @whereswalden - As I've mentioned I want a more customized solution. For instance, a recent requirement I've thought of is that I'm into habit of renaming my folders for aesthetics (eg: apache-mysql to lamp, etc.). In those cases, I want the old folder in corresponding backup to be deleted first, otherwise, it would be a disk-wastage and disorganization. Can rsync do that? \$\endgroup\$Prahlad Yeri– Prahlad Yeri2014年07月01日 14:34:04 +00:00Commented Jul 1, 2014 at 14:34
1 Answer 1
I can't think of any way the file size could change without the mtime changing too. On the other hand, it does no harm to check, aside from making the code a little more complex.
You use the expression backup_loc + subpath
quite often, so I would do
this:
subpath = root.replace('/home/prahlad/','') # note extra slash
backup_path = os.path.join(backup_loc, subpath)
...
dfilename = os.path.join(backup_path, f)
...
if not os.path.exists(backup_path):
os.makedirs(backup_path)
print 'creating directory: ' + backup_path
For these lines, I would do one thing per line, and move the complex expression into a function:
sz = os.path.getsize(filename); lm = datetime.fromtimestamp(os.path.getmtime(filename)).timetuple()
dsz = os.path.getsize(dfilename); dlm = datetime.fromtimestamp(os.path.getmtime(dfilename)).timetuple()
As follows:
def file_mtime(path):
return datetime.fromtimestamp(os.path.getmtime(path)).timetuple()
...
sz = os.path.getsize(filename)
lm = file_mtime(filename)
dsz = os.path.getsize(dfilename);
dlm = file_mtime(dfilename)
However, since you're only comparing one timestamp to another, and not doing anything else with the timestamps, I don't see why you couldn't just do this:
lm = os.path.getmtime(filename)
...
dlm = os.path.getmtime(dfilename)
time.clock()
returns time in seconds, not minutes; and num
is
already an int
; so:
mins = round((time.clock() - start) / 60)
print "{0} files copied in {1} minutes".format(num, int(mins))
-
\$\begingroup\$ Thanks. I liked your suggestions regarding backup_loc + subpath. I'm accepting this answer. \$\endgroup\$Prahlad Yeri– Prahlad Yeri2014年07月01日 14:35:19 +00:00Commented Jul 1, 2014 at 14:35