1
\$\begingroup\$

I work on ubuntu machine and my backup requirements are straightforward. My only requirement is the usual copy paste, but only changed files (ie. the files whose modification-time OR size has changed) should get replaced.

Since I could not find such option in the default nautilus copy-paste (it only allows a merge with a blanket replace), I decided to write a backup script in python myself that I would like to get reviewed. Here is the script for backup.py:

 #!/usr/bin/env python
#@module: backup.py
#@description: Script to take backup to a fixed location
#@author: Prahlad Yeri
#@copyright: MIT Licensed
#from __future__ import print_function
import os
import os.path
import sys
import time
from datetime import datetime
import shutil
backup_loc = '/media/username/1tera/backup'
#backup_loc = '/tmp/backup'
locations = ['/home/username/docs',
 '/home/username/source',
 '/home/username/scripts',
 '/home/username/library',
 '/home/username/programs',
 '/home/username/staging',
 '/home/username/soft',
 '/home/username/Desktop',
 '/home/username/Downloads',
 '/home/username/movies',
 '/home/username/songs',
 ]
if __name__ == "__main__":
 #loop thru the folders
 start = time.clock()
 num=0
 for s in locations: #[0:1]:
 #print s + "\n"
 #files = os.listdir(s)
 print 'listing for ' + s
 for (root, dirs, files) in os.walk(s):
 subpath = root.replace('/home/prahlad','')
 for f in files:
 filename = os.path.join(root, f)
 dfilename = backup_loc + subpath + os.sep + f
 link = ''
 if os.path.islink(filename):
 link = os.readlink(filename)
 if not os.path.exists(dfilename):
 #check dirs
 if not os.path.exists(backup_loc + subpath):
 os.makedirs(backup_loc + subpath)
 print 'creating directory: ' + backup_loc + subpath
 #just copy the files
 print 'copying from: ' + filename
 print 'to: ' + dfilename
 if link == '':
 shutil.copy2(filename, dfilename)
 else:
 os.symlink(link, dfilename)
 num+=1
 else:
 sz = os.path.getsize(filename); lm = datetime.fromtimestamp(os.path.getmtime(filename)).timetuple()
 dsz = os.path.getsize(dfilename); dlm = datetime.fromtimestamp(os.path.getmtime(dfilename)).timetuple()
 if (sz == dsz and lm == dlm):
 print 'skipped: ' + dfilename
 #time.sleep(3)
 else:
 #copy the files
 print 'copying from: ' + filename
 print 'to: ' + dfilename
 if link == '':
 shutil.copy2(filename, dfilename)
 else:
 os.symlink(link, dfilename)
 num+=1
 mins = (time.clock() - start)
 #print "All files copied in %d minutes" % mins
 print "{0} files copied in {1} minutes".format(int(num), round(mins))
asked Jun 22, 2014 at 22:10
\$\endgroup\$
2
  • 2
    \$\begingroup\$ Why not just use rsync? It automatically doesn't copy files that are the same at the destination. \$\endgroup\$ Commented Jun 26, 2014 at 14:28
  • \$\begingroup\$ @whereswalden - As I've mentioned I want a more customized solution. For instance, a recent requirement I've thought of is that I'm into habit of renaming my folders for aesthetics (eg: apache-mysql to lamp, etc.). In those cases, I want the old folder in corresponding backup to be deleted first, otherwise, it would be a disk-wastage and disorganization. Can rsync do that? \$\endgroup\$ Commented Jul 1, 2014 at 14:34

1 Answer 1

2
\$\begingroup\$

I can't think of any way the file size could change without the mtime changing too. On the other hand, it does no harm to check, aside from making the code a little more complex.

You use the expression backup_loc + subpath quite often, so I would do this:

subpath = root.replace('/home/prahlad/','') # note extra slash
backup_path = os.path.join(backup_loc, subpath)
...
dfilename = os.path.join(backup_path, f)
...
if not os.path.exists(backup_path):
 os.makedirs(backup_path)
 print 'creating directory: ' + backup_path

For these lines, I would do one thing per line, and move the complex expression into a function:

sz = os.path.getsize(filename); lm = datetime.fromtimestamp(os.path.getmtime(filename)).timetuple()
dsz = os.path.getsize(dfilename); dlm = datetime.fromtimestamp(os.path.getmtime(dfilename)).timetuple()

As follows:

def file_mtime(path):
 return datetime.fromtimestamp(os.path.getmtime(path)).timetuple()
...
sz = os.path.getsize(filename)
lm = file_mtime(filename)
dsz = os.path.getsize(dfilename);
dlm = file_mtime(dfilename)

However, since you're only comparing one timestamp to another, and not doing anything else with the timestamps, I don't see why you couldn't just do this:

lm = os.path.getmtime(filename)
...
dlm = os.path.getmtime(dfilename)

time.clock() returns time in seconds, not minutes; and num is already an int; so:

mins = round((time.clock() - start) / 60)
print "{0} files copied in {1} minutes".format(num, int(mins))
answered Jun 23, 2014 at 0:39
\$\endgroup\$
1
  • \$\begingroup\$ Thanks. I liked your suggestions regarding backup_loc + subpath. I'm accepting this answer. \$\endgroup\$ Commented Jul 1, 2014 at 14:35

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.