I've written this Python script to make regular backups of a Linux server. It works pretty well except that the script sometimes backups files more than one time, which is pretty strange and I don't know why that happens.
But that's not the key point of my posting here (just someone may see a little error I've made which explains this behaviour). Since I've learned all I know about Python by myself I'm a little bit nervous to use my own code in a productive environment, so I need a code review.
In short the script should check all files in the filesystem of a Linux server except a small collection of directories to avoid. Every 'walked' file should be checked for last modification date. If it's been modified within the last 24 hours it should backup and log into its logfile. Further more it should check after each run how many backups exist on the network share and if a certain number is reached it should delete the oldest one.
import datetime
import re
import os
import subprocess
import socket
import glob
from stat import *
import time
from collections import defaultdict
host = socket.gethostname()
date = datetime.datetime.today()
date = date.strftime("%Y-%m-%d")
mountPoint = "/mnt/linuxbackup/"
nas = '//172.19.3.5/backup/linuxserver'
credentials = 'sec=ntlm,credentials=/root/.backup.secret'
dirPath = '/'
past = time.time() - 86400
# Global settings
fullBackup_dict = {
'backupdir' : mountPoint + host + '/',
'backupname' : host + '.' + date,
'backupstokeep' : 4,
'excludelist' : ('/proc',
'/lost+found',
'/sys',
'/media',
'/var/cache',
'/var/log',
'/mnt',
'/dev',
'/run',
'/tmp'
),
}
dailyBackupDict = defaultdict(list)
includelist = ''
excludelist = ''
for a in fullBackup_dict['excludelist']:
excludelist = excludelist + '--exclude=%s ' %a
fullBackup_dict['excludelist'] = excludelist
if os.path.exists(mountPoint):
if not os.path.ismount(mountPoint):
subprocess.check_call(["mount", "-t", "cifs", "-o", credentials, nas, mountPoint])
if not os.path.exists(fullBackup_dict['backupdir']):
os.makedirs(fullBackup_dict['backupdir'])
else:
os.makedirs(mountPoint)
subprocess.check_call(["mount", "-t", "cifs", "-o", credentials, nas, mountPoint])
class backup:
def __init__(self, backupdir, backupname):
self.backupdir = backupdir
self.backupname = backupname
self.run()
def run(self,):
# Getting a list of previous backups
fnames = [os.path.basename(x) for x in
glob.glob(self.backupdir + host + '.[0-9][0-9][0-9][0-9]' +
'-[0-9][0-9]' + '-[0-9][0-9]' + '.tar.bz2')]
#print self.backupdir + host + '[0-9][0-9][0-9][0-9]' + '-[0-9][0-9]' + '-[0-9][0-9]' + '.tar.bz2'
print 'fnames = %s' %fnames
print 'Backing up the system'
# Full Backup
if datetime.date.today().strftime("%A") == "Friday":
command = 'nice -n 10 tar cvpjf ' + self.backupdir + self.backupname + '_FULL' + '.tar.bz2 ' \
+ fullBackup_dict['excludelist'] + ' / ' + '--index-file ' + self.backupdir + self.backupname + '.log' + ' 2>&1'
# daily Backup
else:
includelist = ''
try:
for path, dirs, files in os.walk(dirPath, topdown=True):
files = [f for f in files if not f[0] == '.']
dirs[:] = [d for d in dirs if (d not in fullBackup_dict['excludelist']) and (d[0] != '.')]
for fn in files:
filePath = os.path.join(path, fn)
try:
if os.path.getmtime(filePath) >= past:
if os.path.islink(filePath):
continue
else:
dailyBackupDict['includelist'].append(filePath)
for addFiles in dailyBackupDict['includelist']:
includelist = includelist + '--add-file="%s" ' %addFiles
except OSError:
pass
except OSError:
pass
dailyBackupDict['includelist'] = includelist
command = 'nice -n 10 tar cvpjf ' + self.backupdir + self.backupname + '.tar.bz2 ' + fullBackup_dict['excludelist'] \
+ dailyBackupDict['includelist'] + ' > ' + self.backupdir + self.backupname + '.log' + ' 2>&1'
print command
os.system(command)
while len(fnames) > fullBackup_dict['backupstokeep']:
print 'more than %s file(s) here!' %fullBackup_dict['backupstokeep']
print 'Removing oldest file'
filetimes = {}
accesstimes = []
for filename in fnames:
print self.backupdir + filename
mode = os.stat(self.backupdir + filename)[ST_MTIME]
filetimes[mode] = filename
accesstimes.append(mode)
print mode
accesstimes.sort()
print filetimes
print accesstimes
fileToDelete = filetimes[accesstimes[0]]
print 'Deleting file %s' %fileToDelete
try:
os.remove(self.backupdir + fileToDelete)
os.remove(self.backupdir + fileToDelete[:-7] + 'log')
except Exception, inst:
print inst
fnames = [os.path.basename(x) for x in
glob.glob(self.backupdir + host + '-' + '[0-9][0-9][0-9][0-9]' + '-' + '[0-9][0-9]' +
'-' + '[0-9][0-9]' + '.tar.bz2')]
print '%s%s' %(self.backupdir,self.backupname)
if os.path.ismount(mountPoint):
subprocess.check_call(["umount", mountPoint])
def main():
backup(fullBackup_dict['backupdir'], fullBackup_dict['backupname'])
if __name__ == '__main__':
main()
1 Answer 1
First of all, you should definitely read the Python style guide (PEP0008). It contains tons of good formatting and style info to make code more readable and neat for you and other users. I'll reference some of the points relevant to you but the whole page is good to commit to memory.
You should rearrange the imports so they're grouped for readability. Plain imports together and relative imports in one block.
import datetime
import re
import os
import subprocess
import socket
import glob
import time
from collections import defaultdict
from stat import *
Also though, import *
is almost always a bad idea. It means I have no idea what you're importing from stat
. It's a good thing it's a builtin module, because if it was third party or custom then I'd be at a loss to understand what some of the functions imported from there do. If you just do import stat
, having to call stat.function
is much more readable. But if you want to avoid that, from stat import function1, function2, CONSTANT
is still easy to trace.
I prefer to combine calls like your date
assignments. It would be quite readable and not too long a line.
date = datetime.datetime.today().strftime("%Y-%m-%d")
Try to avoid mixing single and double quotes. It can get confusing. You may need to on occasion, but avoid it as much as possible.
mountPoint = "/mnt/linuxbackup/"
nas = '//172.19.3.5/backup/linuxserver'
Also mountPoint
isn't Pythonic naming convention. Python uses snake_case for variables and functions. mount_point
would be a better name. Speaking of names, you use good ones for these variables, but names aren't always enough. past
is not a self evident name that requires no comment to be understood. Because of your description of the program I know that you're basically checking for 24 hours worth of seconds, as that's the time into the past your script checks. But past
does not communicate this. Change it to indicate its use, like
backup_period = time.time() - 86400 # 24 hours
much clearer. Still on names, don't name a variable based on its type. fullBackup_dict
is clearly a dictionary when you define it. The syntax is unmistakeable. Likewise when you call it with keywords Python programmers will know what kind of data it is. You only need to make types explicit in variable names when you're using unusual types that might be mistaken for others. Like a data type that might be derived from a dictionary. But even in that case, consider whether or not it's actually worth noting in the name.
I also don't think you should refer to your settings as 'Global'. Python has a global
keyword but that's not what you're trying to say here. Something like "Configuration" makes more sense and wont accidentally imply any Python keyword.
Unless I'm missing something here, you're making a file path with string concatenation. Don't do that, use os.path.join
. It's an intelligent function that knows what OS the user is running and can properly account for slashes whether they're in the strings or not. In other words, it wont care if you have a /
at the end of mountPoint
. It can create the correct path in either case. Of course if you're intentionally not using that, you should comment on why this is a special case.
'backupdir' : os.path.join(mountPoint, host) + '/',
Again comments would help here but what's includelist
for? I can guess at the intent, but it's first set to an empty string and then only changed in an else condition halfway down your script. Context is hard to discern without clear explanations. Also, the note about not naming variables after their types goes double for when it's not actually their type. Naming a string with list
is a recipe for confusion. Instead call it something like exclude_files
or exclude_folders
, to avoid accidentally indicating the wrong thing.
As for excludelist
, you could shorten this with the str.join
method. You can use it to concatenate all the elements of a list into one string, but helpfully in your case you can also specify a string to go between each element, --exclude=
in your case. There is one slight drawback, you need to bookend it with --exclude=
and to make it exactly as before, but I still think it looks neater.
excludelist = ('--exclude=' + ' --exclude='.join(fullBackup_dict['excludelist']) + ' ')
What's odd is that you then assign this back to your dictionary, why not assign it there in the first place? Either put the expression directly into the dictionary or create the value before you build the dictionary. I'd opt for the latter in this case, just because it's a long expression to embed in a dictionary.
These checks you perform here seem important, I think they'd be worth clarifying what you're doing. You're clearly checking on the mount point's existence but I don't understand the check_call
s. Maybe if I was familiar with Linux I would but it can't hurt to comment.
# Check the mount point exists as a mount and is cifs-y
if os.path.exists(mountPoint):
if not os.path.ismount(mountPoint):
subprocess.check_call(["mount", "-t", "cifs", "-o", credentials, nas, mountPoint])
if not os.path.exists(fullBackup_dict['backupdir']):
os.makedirs(fullBackup_dict['backupdir'])
else:
os.makedirs(mountPoint)
subprocess.check_call(["mount", "-t", "cifs", "-o", credentials, nas, mountPoint])
Now I get to backup
, and I don't think this should be a class. It takes two attributes, and then __init__
immediately calls the only other function that this object implements. This sounds much more like a candidate to be a plain function to me. I don't see anything here that having a class provides and a function doesn't. Speaking of functions, you could use more. backup.run
(which I'm just going to now call backup
) has over 70 lines and so much nesting. Splitting this out into multiple functions would make it far easier to read, use and improve.
Firstly, getting a list of previous back ups sounds like a good idea for a function. It can just return the fnames
list for you, and then it's far easier to see the criteria for how previous backups are found.
def get_previous_backups(backupdir):
# Getting a list of previous backups
return [os.path.basename(x) for x in
glob.glob(self.backupdir + host + '.[0-9][0-9][0-9][0-9]' +
'-[0-9][0-9]' + '-[0-9][0-9]' + '.tar.bz2')]
def backup(backupdir, backupname):
fnames = get_previous_backups(backupdir)
See how much neater that is? Now, that comment for get_previous_backups
could be more useful if it explained how it detected them. The function name is clear in indicating what it does, but the how might be unclear. The comment could also be made a docstring, too. You can read about them here, but basically they're programmatically accessible comments. Useful for interpreters and users alike.
def get_previous_backups(backupdir):
"""Returns list of back up .tar.bz2 files that fit the date format"""
For your prints, you're actually using the old string formatting syntax. You should instead use the new way, str.format
as it's cleaner and more useful. It implicitly coerces variables to strings so you don't need to explicitly type them at all.
You could also do with more whitespace separating out your blocks. # Full Backup
could have a blank line above it so that it's distinct from the previous back ups that have been grabbed. Same with # Daily Backup
.
print 'fnames = {}'.format(fnames)
print 'Backing up the system'
# Full Backup
if datetime.date.today().strftime("%A") == "Friday":
command = 'nice -n 10 tar cvpjf ' + self.backupdir + self.backupname + '_FULL' + '.tar.bz2 ' \
+ fullBackup_dict['excludelist'] + ' / ' + '--index-file ' + self.backupdir + self.backupname + '.log' + ' 2>&1'
# Daily Backup
else:
includelist = ''
Instead of using \\
to break up long lines, use parentheses. This command is also a great indication of when you should use str.format
too, it cleans up the whole thing:
command = ('nice -n 10 tar cvpjf {}_FULL.tar.bz2 {} / --index-file {}.log 2>&1'
.format(os.path.join(backupdir, backupname),
fullBackup_dict['excludelist'],
os.path.join(backupdir, backupname))
Now it's much easier to parse the command and the parameters being passed. I also again opted to use os.path.join
You have tons of nesting here. Sometimes it's unavoidable, and all you can do is try to improve it. I have a couple notes for here. You could flip your islink
test to remove the need for a continue
:
if os.path.getmtime(filePath) >= past:
if not os.path.islink(filePath):
dailyBackupDict['includelist'].append(filePath)
for addFiles in dailyBackupDict['includelist']:
includelist = includelist + '--add-file="%s" ' %addFiles
Notice, you can now combine the two conditions into one:
if os.path.getmtime(filePath) >= past and not os.path.islink(filePath):
dailyBackupDict['includelist'].append(filePath)
for addFiles in dailyBackupDict['includelist']:
includelist = includelist + '--add-file="%s" ' %addFiles
Also, you don't need to use [:]
when assigning to dirs
. Python always evaluates everything on the right of an assignment operator before assigning any values to the names on the left of it. So you can just use dirs
without clashing.
Also to test if a string starts with a character it's more idiomatic to use f.startswith('.')
which returns a boolean. You don't need parentheses around your conditions. Though I'd personally add them around the whole test, to make it visually clear what part is the test.
Don't use the Exception, inst
syntax, it's deprecated in later Python versions for being confusing. Instead use except Exception as inst
. It works the same, but is more readable and would allow multiple exceptions to be caught by doing except (BaseException, OtherException) as inst:
except Exception, inst:
print inst
Oh look, getting fnames
again? This becomes very easy if we have a function to do this.
fnames = [os.path.basename(x) for x in
glob.glob(self.backupdir + host + '-' + '[0-9][0-9][0-9][0-9]' + '-' + '[0-9][0-9]' +
'-' + '[0-9][0-9]' + '.tar.bz2')]
-
\$\begingroup\$ Thank you for all the feedback! Pretty much to process but really helpful! I could adapt most of your suggestions and fixed alot to make it work better. I even found a more efficient way for my backups and could delete some code thanks to it. I add my improvements. \$\endgroup\$user86193– user861932015年10月08日 11:20:18 +00:00Commented Oct 8, 2015 at 11:20
-
\$\begingroup\$ @Mortorq Glad to help! And good to know it wasn't too much. :P On Code Review, instead of editing questions to update with new versions, it's actually encouraged to post a new question and just link back to the old version for reference. You can read about why this is here. \$\endgroup\$SuperBiasedMan– SuperBiasedMan2015年10月08日 17:12:28 +00:00Commented Oct 8, 2015 at 17:12