Loop through database and run shell commands with Python and exiftool

Question 1

Briefly, I'm looking at getting the code below to execute faster. I have 100k images to go through. I'm running a query against MySQL, looping through results and then running exiftool against an image, then moving it.

I started running it and it quickly became evident it wouldn't be a quick thing :-(

import mysql.connector
import os
cnx = mysql.connector.connect(user='root',database='database', password='password')
cursor = cnx.cursor()
query = ("SELECT post_title,Event,File,Name from a order by File")
cursor.execute(query)
def shellquote(s):
 return s.replace("'", "")
for (post_title, Event,File,Name) in cursor:
 olddir = r'/home/alan/Downloads/OLD/'
 newdir = r'/home/alan/Downloads/NEW/' + post_title
 oldfile = olddir + File
 newfile = newdir + "/"+File
 if not os.path.exists(newfile):
 os.makedirs(newfile)
 if os.path.isfile(oldfile): 
 print " > PROCESSING: " + oldfile
 os.system("exiftool -q "+shellquote(oldfile)+" -xmp:title='"+shellquote(post_title)+"'")
 os.system("exiftool -q "+shellquote(oldfile)+" -xmp:description='"+shellquote(Name)+" courtesy of https://www.festivalflyer.com'")
 os.system("exiftool -q "+shellquote(oldfile)+" -description='"+shellquote(Name)+" courtesy of https://www.festivalflyer.com'")
 os.rename(oldfile, newfile)
cursor.close()
cnx.close()

I tried using subprocess but for whatever reason, I didn't get it to run. Any advice is welcome.

I suppose I could move the 3 lines of exiftool commands to just one and pass multiple arguments. I also saw -stay_open as an option to exiftool but not sure how to apply it

Question 2

maybe github.com/smarnach/pyexiftool can be of some help

Question 3

I looked at that but it was mostly getting tags rather than writing

Question 4

I’d use something like

with exiftool.ExifTool() as et: for ...: et.execute("-q " + oldfile, "-xmp:title='"+shellquote(post_title)+"'", "-xmp:description='{}' courtesy of ...".format(shellquote(Name)), "-description='{}' courtesy of ...".format(shellquote(Name)))

Question 5

@pee2pee please undelete codereview.stackexchange.com/questions/208118/… . I was in the middle of typing up my answer when it was suddenly deleted.

Question 6

Close your connection to your database, even if there's an error. Use a try-finally to do this.
Make some functions, moving the database stuff into it's own function makes it much easier to read.
From os.system docs:

The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function. See the Replacing Older Functions with the subprocess Module section in the subprocess documentation for some helpful recipes.
You may want to use os.path for various file related things, such as os.path.join to join path sections.
print is slow. Try removing it for a massive speed up.

And so you may want to start changing your code to look like:

import mysql.connector
import os
import subprocess
def read_database():
 cnx = mysql.connector.connect(user='root', database='database', password='password')
 cursor = cnx.cursor()
 try:
 query = ("SELECT post_title,Event,File,Name from a order by File")
 cursor.execute(query)
 for item in cursor:
 yield item
 finally:
 cursor.close()
 cnx.close()
def main():
 path = os.path
 old_dir = r'/home/alan/Downloads/OLD/'
 new_dir = r'/home/alan/Downloads/NEW/'
 for (post_title, event, file_name, name) in read_database():
 old_file = path.join(old_dir, file_name)
 new_file = path.join(new_dir, post_title, file_name)
 if not path.exists(new_file):
 os.makedirs(new_file)
 if path.isfile(old_file):
 subprocess.call(["exiftool", "-q", old_file, "-xmp:title='" + post_title.replace("'", "") + "'"])
 subprocess.call(["exiftool", "-q", old_file, "-xmp:description='" + name.replace("'", "") + " courtesy of https://www.festivalflyer.com'"])
 subprocess.call(["exiftool", "-q", old_file, "-description='" + name.replace("'", "") + " courtesy of https://www.festivalflyer.com'"])
 os.rename(old_file, new_file)
if __name__ == '__main__':
 main()

Question 7

Just two pointers in addition to the answer by Peilonrayz:

Why the repetition of os.makedirs(new_file)? – Your code as it stands has a fixed new directory, and then you test for a presumably non-existently file with os.path.exists(newfile), before calling the makedirs(). Given that File doesn't have any directory parts, you'll call makedirs() trying to recreate the directory all over again.

In other words, this could be done once in front of the loop, and never done again. And triggering os calls are expensive, so this could save a lot of time. This would reduce from trying to create the directory a 100K times, to 1 time.
Reduce subprocess calls whenever possible – Initiating new subprocesses triggers quite a lot of work, so it's usually well worth to reduce this as much as possible.

In your case there are two possible options, which might drastically reduce the overhead cost of initiating subprocesses.
1. Join the three exiftool commands into one command. This could potentially reduce your cost of number of started subprocesses from 300K to 100K times. Depending on the cost of actually running the exiftool command, this could reduce running times upto a third.
2. Consider gathering all the exiftool into a batch file, which you execute after processing all the database output. This wouldn't reduce the running time of the exiftool command but you wouldn' start a new subprocess a 100K times, but just once when running the batch file.

So in general, always try reducing the number of subprocesses started, as they are quite expensive. And os.system calls usually triggers subprocesses as well as the subprocess.call.

Peilonrayz ♦ 44.6k7 gold badges80 silver badges158 bronze badges · Answer 1 · 2017-04-19 08:32:53Z

Close your connection to your database, even if there's an error. Use a try-finally to do this.
Make some functions, moving the database stuff into it's own function makes it much easier to read.
From os.system docs:

The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function. See the Replacing Older Functions with the subprocess Module section in the subprocess documentation for some helpful recipes.
You may want to use os.path for various file related things, such as os.path.join to join path sections.
print is slow. Try removing it for a massive speed up.

And so you may want to start changing your code to look like:

import mysql.connector
import os
import subprocess
def read_database():
 cnx = mysql.connector.connect(user='root', database='database', password='password')
 cursor = cnx.cursor()
 try:
 query = ("SELECT post_title,Event,File,Name from a order by File")
 cursor.execute(query)
 for item in cursor:
 yield item
 finally:
 cursor.close()
 cnx.close()
def main():
 path = os.path
 old_dir = r'/home/alan/Downloads/OLD/'
 new_dir = r'/home/alan/Downloads/NEW/'
 for (post_title, event, file_name, name) in read_database():
 old_file = path.join(old_dir, file_name)
 new_file = path.join(new_dir, post_title, file_name)
 if not path.exists(new_file):
 os.makedirs(new_file)
 if path.isfile(old_file):
 subprocess.call(["exiftool", "-q", old_file, "-xmp:title='" + post_title.replace("'", "") + "'"])
 subprocess.call(["exiftool", "-q", old_file, "-xmp:description='" + name.replace("'", "") + " courtesy of https://www.festivalflyer.com'"])
 subprocess.call(["exiftool", "-q", old_file, "-description='" + name.replace("'", "") + " courtesy of https://www.festivalflyer.com'"])
 os.rename(old_file, new_file)
if __name__ == '__main__':
 main()

holroy 11.8k1 gold badge27 silver badges59 bronze badges · Answer 2 · 2017-04-19 21:16:16Z

Just two pointers in addition to the answer by Peilonrayz:

Why the repetition of os.makedirs(new_file)? – Your code as it stands has a fixed new directory, and then you test for a presumably non-existently file with os.path.exists(newfile), before calling the makedirs(). Given that File doesn't have any directory parts, you'll call makedirs() trying to recreate the directory all over again.

In other words, this could be done once in front of the loop, and never done again. And triggering os calls are expensive, so this could save a lot of time. This would reduce from trying to create the directory a 100K times, to 1 time.
Reduce subprocess calls whenever possible – Initiating new subprocesses triggers quite a lot of work, so it's usually well worth to reduce this as much as possible.

In your case there are two possible options, which might drastically reduce the overhead cost of initiating subprocesses.
1. Join the three exiftool commands into one command. This could potentially reduce your cost of number of started subprocesses from 300K to 100K times. Depending on the cost of actually running the exiftool command, this could reduce running times upto a third.
2. Consider gathering all the exiftool into a batch file, which you execute after processing all the database output. This wouldn't reduce the running time of the exiftool command but you wouldn' start a new subprocess a 100K times, but just once when running the batch file.

So in general, always try reducing the number of subprocesses started, as they are quite expensive. And os.system calls usually triggers subprocesses as well as the subprocess.call.

Stack Exchange Network

Loop through database and run shell commands with Python and exiftool

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Loop through database and run shell commands with Python and exiftool

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions