Briefly, I'm looking at getting the code below to execute faster. I have 100k images to go through. I'm running a query against MySQL, looping through results and then running exiftool against an image, then moving it.
I started running it and it quickly became evident it wouldn't be a quick thing :-(
import mysql.connector
import os
cnx = mysql.connector.connect(user='root',database='database', password='password')
cursor = cnx.cursor()
query = ("SELECT post_title,Event,File,Name from a order by File")
cursor.execute(query)
def shellquote(s):
return s.replace("'", "")
for (post_title, Event,File,Name) in cursor:
olddir = r'/home/alan/Downloads/OLD/'
newdir = r'/home/alan/Downloads/NEW/' + post_title
oldfile = olddir + File
newfile = newdir + "/"+File
if not os.path.exists(newfile):
os.makedirs(newfile)
if os.path.isfile(oldfile):
print " > PROCESSING: " + oldfile
os.system("exiftool -q "+shellquote(oldfile)+" -xmp:title='"+shellquote(post_title)+"'")
os.system("exiftool -q "+shellquote(oldfile)+" -xmp:description='"+shellquote(Name)+" courtesy of https://www.festivalflyer.com'")
os.system("exiftool -q "+shellquote(oldfile)+" -description='"+shellquote(Name)+" courtesy of https://www.festivalflyer.com'")
os.rename(oldfile, newfile)
cursor.close()
cnx.close()
I tried using subprocess but for whatever reason, I didn't get it to run. Any advice is welcome.
I suppose I could move the 3 lines of exiftool
commands to just one and pass multiple arguments. I also saw -stay_open
as an option to exiftool
but not sure how to apply it
2 Answers 2
- Close your connection to your database, even if there's an error. Use a try-finally to do this.
- Make some functions, moving the database stuff into it's own function makes it much easier to read.
- From
os.system
docs:The
subprocess
module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function. See the Replacing Older Functions with the subprocess Module section in thesubprocess
documentation for some helpful recipes. - You may want to use
os.path
for various file related things, such asos.path.join
to join path sections. print
is slow. Try removing it for a massive speed up.
And so you may want to start changing your code to look like:
import mysql.connector
import os
import subprocess
def read_database():
cnx = mysql.connector.connect(user='root', database='database', password='password')
cursor = cnx.cursor()
try:
query = ("SELECT post_title,Event,File,Name from a order by File")
cursor.execute(query)
for item in cursor:
yield item
finally:
cursor.close()
cnx.close()
def main():
path = os.path
old_dir = r'/home/alan/Downloads/OLD/'
new_dir = r'/home/alan/Downloads/NEW/'
for (post_title, event, file_name, name) in read_database():
old_file = path.join(old_dir, file_name)
new_file = path.join(new_dir, post_title, file_name)
if not path.exists(new_file):
os.makedirs(new_file)
if path.isfile(old_file):
subprocess.call(["exiftool", "-q", old_file, "-xmp:title='" + post_title.replace("'", "") + "'"])
subprocess.call(["exiftool", "-q", old_file, "-xmp:description='" + name.replace("'", "") + " courtesy of https://www.festivalflyer.com'"])
subprocess.call(["exiftool", "-q", old_file, "-description='" + name.replace("'", "") + " courtesy of https://www.festivalflyer.com'"])
os.rename(old_file, new_file)
if __name__ == '__main__':
main()
Just two pointers in addition to the answer by Peilonrayz:
Why the repetition of
os.makedirs(new_file)
? – Your code as it stands has a fixed new directory, and then you test for a presumably non-existently file withos.path.exists(newfile)
, before calling themakedirs()
. Given thatFile
doesn't have any directory parts, you'll callmakedirs()
trying to recreate the directory all over again.In other words, this could be done once in front of the loop, and never done again. And triggering os calls are expensive, so this could save a lot of time. This would reduce from trying to create the directory a 100K times, to 1 time.
Reduce subprocess calls whenever possible – Initiating new subprocesses triggers quite a lot of work, so it's usually well worth to reduce this as much as possible.
In your case there are two possible options, which might drastically reduce the overhead cost of initiating subprocesses.
- Join the three
exiftool
commands into one command. This could potentially reduce your cost of number of started subprocesses from 300K to 100K times. Depending on the cost of actually running theexiftool
command, this could reduce running times upto a third. - Consider gathering all the
exiftool
into a batch file, which you execute after processing all the database output. This wouldn't reduce the running time of theexiftool
command but you wouldn' start a new subprocess a 100K times, but just once when running the batch file.
- Join the three
So in general, always try reducing the number of subprocesses started, as they are quite expensive. And os.system
calls usually triggers subprocesses as well as the subprocess.call
.
Explore related questions
See similar questions with these tags.
with exiftool.ExifTool() as et: for ...: et.execute("-q " + oldfile, "-xmp:title='"+shellquote(post_title)+"'", "-xmp:description='{}' courtesy of ...".format(shellquote(Name)), "-description='{}' courtesy of ...".format(shellquote(Name)))
\$\endgroup\$