Simple Batch Templating Utility in Python

Question 1

I would like to present for review my (much) revised batch templating utility which had it's humble beginnings here in a previous post. As I mentioned there, this program is my entry into python programming. I am trying to grow the simple script from my previous post into a more robust utility.

Questions I am hoping to answer with this post:

Is the overall structure sound?
Is my use of exceptions correct?
Is my documentation OK? This was my first intro into docstrings and I have worked hard to make them as complete as possible.
One part that bugs me is the try block in the main() function.
- First, this whole block should probably be a separate function? I left it in main() since it's the meat of the program.
- Secondly, there seems to be a lot of code between the try: and the except: and I know this should be minimized, but I couldn't come up with a better method.

Program Inputs The program takes as inputs two required files, a CSV data file and a template file and an optional appended file. The output of the program is a set of rendered files, one for each data row in the CSV file. The CSV data file contains a header row which is used as a set keys mapped to tags inside of the template file. For every row in the data file, each data item associated with the keys is substituted with the tag in the template file, the appended file added (with some added tags for *.js files) and the rendered file written to disk. Pretty straight forward I think. The main docstring illustrates a quick example.

Template Syntax The program uses Python's string.Template() string substitution method which utilizes the $ replacement syntax with the added requirement of mandating the optional (to the method) { and } curly braces. So, for a particular Key from the data file header row, the template tag would be ${Key}.

Wall of Code I think the docstrings explain pretty well what all is going on...

"""
A simple batch templating utility for Python.
Initially conceived as a patch to quickly generate small HTML files from
catalog data. The program takes as inputs two (2) required files, a CSV
data file and a template file (see below) and the option to append a
third file. Output of the program is a set of rendered files, one for
each data row in the CSV data file.
USAGE:
 Current rendition of program uses a simple guided prompt interface
 to walk user through process.
**SPECIAL WARNING**
 This program copies template file and appended file to strings which
 means they will both be loaded fully into memory. Common sense
 should be exercised when dealing with extremely large files.
CSV DATA FILE:
 Data File shall contain a header row. Header row contains the keys
 that will be used to render the output files. Keys shall not
 contain spaces. There shall be a corresponding tag in the template
 file for each key in the CSV Data File.
 File can contain any (reasonable) number of data rows and columns.
 Each item in a row is swapped out with the tag in the template file
 which corresponds to appropriate key from the header row. There
 will be one output file generated for each row in the data file.
TEMPLATE FILE:
 The template file is basically a copy of the desired output file
 with tags placed wherever a particular piece of data from the CSV
 Data File should be placed in the output.
 Syntax:
 The program uses Python's string.Template() string substitution
 method which utilizes the `$` replacement syntax. The program
 further restricts the syntax requiring the use of the optional `{`
 and `}` curly braces surrounding tags. So, for a particular 'Key'
 from the data file header row, the template tag would be ${Key}.
APPENDED FILE:
 The appended file is strictly copied _ver batum_ to the end of the
 rendered output file. There is really no restriction on the
 appended file other than special warning above.
 Special Feature:
 If the appended file is a Javascript file (detected using the *.js
 file extension), the program will add appropriate opening and
 closing HTML tags.
QUICK EXAMPLE:
 Assume CSV Data File: <some_file.csv>
 stockID,color,material,url
 340,Blue,80% Wool / 20% Acrylic,http://placehold.it/400
 275,brown,100% Cotton,http://placehold.it/600
 Assume Template File: <another_file.html>
 <h1>Stock ID: ${stockID}</h1>
 <ul>
 <li>${color}</li>
 <li>${material}</li>
 </ul>
 <img src='${url}'>
 Assume ...Appended File? --> No
 Output file 1 = 'listing-340.html'
 <h1>Stock ID: 340</h1>
 <ul>
 <li>Blue</li>
 <li>80% Wool / 20% Acrylic</li>
 </ul>
 <img src='http://placehold.it/400'>
 Output file 2 = 'listing-340.html'
 <h1>Stock ID: 275</h1>
 <ul>
 <li>brown</li>
 <li>100% Cotton</li>
 </ul>
 <img src='http://placehold.it/600'>
Author: Chris E. Pearson (christoper.e.pearson.1 at gmail dot com)
Copyright (c) Chris E. Pearson, 2015
License: TBD
"""
import os
import re
import csv
import string
def main():
 """
 A simple batch templating utility for Python.
 See main docstring for details.
 """
 # Collect input file names and contents for text files.
 fname_data = prompt_filename('Data File')
 fname_template = prompt_filename('Template File')
 fcontents_template = get_contents(fname_template)
 fname_appended, fcontents_appended = get_appended()
 # Validate the inputs
 tag_set = set(re.findall('\${(\S+)}', fcontents_template))
 primary_key, key_set = get_keys(fname_data)
 validate_inputs(tag_set, key_set)
 validated_template = string.Template(fcontents_template)
 # Generate the output
 try:
 # This seems like a lot to put in a try statement...?
 with open(fname_data) as f:
 reader = csv.DictReader(f)
 f_count = 0
 for row in reader:
 # Create output filename
 output_filename = ('Listing_{}.html'.format(row[primary_key]))
 f_count += 1
 print('File #{}: {}'.format(f_count, output_filename))
 # Prep string
 output_main = validated_template.substitute(row)
 write_string = '{}{}'.format(output_main, fcontents_appended)
 # Write File
 with open(output_filename, 'w') as f_out:
 f_out.write(write_string)
 except OSError:
 print('No such file {!r}. Check file name and path and try again.'
 .format(fname))
 raise
 else:
 print('{} of {} files created'.format(str(f_count),
 str(reader.line_num-1)))
def prompt_filename(fclass):
 """
 Prompt user for a filename for given file classification.
 Args:
 fclass (string):
 A descriptive string describing the type of file for which the
 filename is requested. _e.g._ 'Template File'
 Returns:
 filename (string)
 """
 while True:
 filename = input('Enter {0} --> '.format(fclass))
 if os.path.isfile(filename):
 return filename
 else:
 print('No such file: {!r}.'.format(filename))
 print('Please enter a valid file name')
 continue
def get_contents(fname):
 """
 Return contents of file `fname` as a string if file exists.
 Args:
 fname (string):
 Name of the file to be opened and returned as a string.
 Returns:
 text_file (string):
 The entire contents of `fname` read in as a string.
 Exceptions:
 OSError: informs user that fname is invalid.
 """
 try:
 with open(fname) as f:
 text_file = f.read()
 except OSError:
 print('No such file {!r}. Check file name and path and try again.'
 .format(fname))
 raise
 else:
 return text_file
def get_appended():
 """
 Ask user if appended file and prompt filename if so.
 Returns:
 fname_appended (string)
 Filename for appended file.
 fcontents_appended (string)
 The entire contents of `fname_appended` as a string.
 Exceptions:
 OSError: Raised by function prompt_filename informs user that
 fname is invalid.
 See Also:
 Function: prompt_filename
 Function: get_contents
 """
 prompt_for_appended = input('Is there an appended file? --> ')
 if prompt_for_appended.lower().startswith('y'):
 fname_appended = prompt_filename('Appended File')
 fcontents_appended = get_contents(fname_appended)
 if fname_appended.lower().endswith('.js'):
 open_tag = '<script type="text/javascript">'
 close_tag = '</script>'
 fcontents_appended = '\n{0}\n{1}\n{2}'.format(open_tag,
 fcontents_appended,
 close_tag)
 else:
 fname_appended = None
 fcontents_appended = ''
 return fname_appended, fcontents_appended
def get_keys(fname):
 """
 Get key set as header row of given CSV file and get primary key.
 Given a CSV data file `fname`, return the header row from file
 as a set of "keys". Also return the primary key for the data file.
 The primary key is simply the header for the first column.
 Args:
 fname (string):
 Name of the CSV file for which the keys are needed.
 Returns:
 primary_key (string)
 Header value of first column in given CSV file.
 key_set (set of strings)
 A set comprised of all header row values for given CSV file.
 Exceptions:
 OSError: informs user that fname is invalid.
 """
 try:
 with open(fname) as f:
 key_list = f.readline().strip().split(',')
 except OSError:
 print('No such file {!r}. Check file name and path and try again.'
 .format(fname))
 raise
 else:
 primary_key = key_list[0]
 key_set = set(key_list)
 return primary_key, key_set
def validate_spaces(item_set):
 """
 Read through a set of strings and checks for spaces.
 The function takes a set of strings and searches through each string
 looking for spaces. If a space is found, string is appended to a
 list. Once all strings are searched, if any spaces found, print
 error with generated list and terminate program.
 Args:
 item_set (set of strings)
 Returns:
 None
 Exceptions:
 A `KeyingError` is raised if any spaces are detected in the data
 file key set.
 """
 bad_items = []
 for item in item_set:
 if ' ' in item:
 bad_items.append(item)
 if bad_items != []:
 try:
 raise KeyingError('Keys cannot contain spaces.')
 except KeyingError as e:
 print(e)
 print('Please correct these keys:\n', bad_items)
 # quit()
 raise
def validate_inputs(tag_set, key_set):
 """
 Validate template tag_set against data file key_set.
 Validates the key_set from a given data file against the tag_set
 from the corresponding template file, first checking the key set for
 lack of spaces and then checking if the two sets are equivalent. If
 either condition is not met, an exception will be raised and the
 program will terminate.
 Args:
 tag_set (set of strings)
 key_set (set of strings)
 Returns:
 None
 Exceptions:
 A `KeyingError` is raised by function `validate_spaces` if any
 spaces are detected in the data file key set.
 A `MisMatchError` is raised if the two input sets are not
 equivalent.
 See also:
 Function: validate_spaces
 """
 try:
 validate_spaces(key_set)
 except KeyingError as e:
 print('Goodbye')
 quit()
 if key_set != tag_set:
 try:
 raise MisMatchError('Tags and keys do not match')
 except MisMatchError as e:
 print(e)
 if tag_set - key_set == set():
 print('missing tags for key(s):', key_set - tag_set)
 print('(or tag(s) contains spaces)')
 else:
 print('Check template file tags for key(s):',
 key_set - tag_set)
 print('Template shows:', tag_set - key_set)
 print('Goodbye')
 quit()
class KeyingError(Exception):
 def __init__(self, arg):
 self.arg = arg
class MisMatchError(Exception):
 def __init__(self, arg):
 self.arg = arg
if __name__ == '__main__':
 main()

Question 2

Regarding the code inside the try block in main, rather than trying to minimize that code, I would think about what other exceptions you might be able to catch and handle at that point. Try doing some of the other calls from that block of code in the interactive prompt, with invalid data, and see what they toss at you. I don't think it's too much code anyway. I wouldn't put it in a separate function; while it may be several lines of code, it feels "mainy". As you say, it's the meat, or maybe more like the backbone: it provides the structure that binds together all the other code. I think it's fine where it is.

I see some issues with your use of exceptions. Regarding the code inside prompt_filename:

def prompt_filename(fclass):
 """
 Prompt user for a filename for given file classification.
 Args:
 fclass (string):
 A descriptive string describing the type of file for which the
 filename is requested. _e.g._ 'Template File'
 Returns:
 filename (string)
 """
 while True:
 filename = input('Enter {0} --> '.format(fclass))
 if os.path.isfile(filename):
 return filename
 else:
 print('No such file: {!r}.'.format(filename))
 print('Please enter a valid file name')
 continue

There's a precept in Python, EAFP, which stands for "Easier to Ask Forgiveness than Permission". What it means is that Python programmers tend not to check things with conditionals, like doing if os.path.isfile(filename). The style in Python is more to assume everything is good, and let the program throw an exception if it's not good. In this case, if there's no such file, you throw IOError and complain. Sometimes you do want to ask permission, but I think this is a case where it's easier to ask forgiveness.

This is something I also see elsewhere in your code. It's good to be safe, but in Python, people tend to really lean more heavily on exceptions than on explicit conditional checks in most cases. The case where you do use a conditional is when there are multiple possibilities, all of which are valid, and you need to figure out which case you're in. But if something is wrong or invalid or unexpected, like the passed file name not being a real file, I recommend exceptions.

(By the way, you don't need continue in your else clause. What continue does is skip over any code that comes after it to go on to the next iteration of the loop. In this case, there is no code after continue, so it would always go to the next iteration anyway.)

In both get_keys and get_contents, you have some code like

try:
 with open(fname) as f:
 text_file = f.read()
except OSError:
 print('No such file {!r}. Check file name and path and try again.'
 .format(fname))
 raise

Rather than print a message from inside this function, I would probably re-raise with a new message:

except OSError:
 raise OSError('No such file {!r}. Check file name and path and try again.'.format(fname))

For the most part, I don't believe in catching exceptions unless you're going to do something about them. But re-throwing with more specific info is a perfectly valid thing to do. Also, I don't like to have functions other than main printing to the console. I'd prefer to re-throw with a new message, catch the exception in main, and print the message.

Related to this, I see the following code in validate_inputs:

try:
 validate_spaces(key_set)
except KeyingError as e:
 print('Goodbye')
 quit()

I would prefer not to catch the KeyingError here. Letting an exception go uncaught will just stop the whole program, which seems to be what you wanted. Some other languages force you to catch or declare every exception, but Python will just bring down the whole program around you. That's not what you want for production code, but that is absolutely what you want for development: anything anomalous will make the program crash and die right away, with a reference to the line number where the crashing and dying occurred. As a bonus, it's quicker and easier to write the code that way, because you don't have to add try/except blocks around everything. If you can do something about the invalid input, then definitely catch the exception and do something. But if all you can do is say "You screwed up, fix it", then why not just let the exception be thrown?

This piece of code from validate_spaces could be a lot shorter and cleaner:

if bad_items != []:
 try:
 raise KeyingError('Keys cannot contain spaces.')
 except KeyingError as e:
 print(e)
 print('Please correct these keys:\n', bad_items)
 # quit()
 raise

I think it would look better like this:

if bad_items:
 raise KeyingError("Keys cannot contain spaces. Please correct: {}".format(bad_items))

The empty list is falsey, while the non-empty list is truthy, so writing if bad_items is equivalent to testing if the bad_items list is non-empty.
There's just not much point in throwing and catching an exception in the same function. If you really wanted to print a message and die inside this function, just do it, without throwing an exception. But in an application like this, I prefer my error handling to be mostly in main. It's just easier if your application has a single valid exit point. For example, if the application has multiple exit points and you have a bug where it's exiting anomalously, then to figure out why, you have to monitor all of those exit points. You might not even realize right away that it exited anomalously.

A program like this, that seems to be a command line utility, is probably better served by taking command line arguments than by interactively reading filenames. That's the next direction I'd go in. The simple way to do this is to read sys.argv. If you've ever done bash, sys.argv[0] is the script name, just like 0ドル, and sys.argv[1:] are the positional arguments passed on the command line, just like 1ドル, 2ドル, etc.:

python templater.py some_file.csv another_file.html

would call templater.py with "some_file.csv" as the value of sys.argv[1] and "another_file.html" as the value of sys.argv[2].

The more complex way to do this is to use the argparse module from the standard library. If you're sticking with all positional arguments, then reading sys.argv directly is probably fine. You can do something like

try: 
 append_file = sys.argv[3]
except IndexError:
 pass # Optional argument not passed

to check whether optional arguments were passed. But argparse really shows its value if you want to have options and switches, which are a pain with sys.argv and near impossible with interactive input. (You have to either have a config file somewhere, or annoy the user every time with "Do you want gold-plating? (y/n)".)

To end on a positive note, I have nothing but good things to say about your use of docstrings, especially the ones on your functions. This is exactly the kind of excellent docstring that Clojure has and that Python mostly lacks, at least in the standard library.

Question 3

This is great stuff. Just what I needed. I am definitely going to work on the error handling. Also, I think the argv will work great. Hadn't gottent to that section yet in the library (it's section 29). I am also going to look at the argparse module as I may want to add some switches like a -v verbose mode or maybe change syntax for specific file types (the ${} syntax certainly doesn't play nice with javascript). Thx!

tsleyson 1,0005 silver badges18 bronze badges · Answer 1 · 2015-03-20 08:12:29Z

Regarding the code inside the try block in main, rather than trying to minimize that code, I would think about what other exceptions you might be able to catch and handle at that point. Try doing some of the other calls from that block of code in the interactive prompt, with invalid data, and see what they toss at you. I don't think it's too much code anyway. I wouldn't put it in a separate function; while it may be several lines of code, it feels "mainy". As you say, it's the meat, or maybe more like the backbone: it provides the structure that binds together all the other code. I think it's fine where it is.

I see some issues with your use of exceptions. Regarding the code inside prompt_filename:

def prompt_filename(fclass):
 """
 Prompt user for a filename for given file classification.
 Args:
 fclass (string):
 A descriptive string describing the type of file for which the
 filename is requested. _e.g._ 'Template File'
 Returns:
 filename (string)
 """
 while True:
 filename = input('Enter {0} --> '.format(fclass))
 if os.path.isfile(filename):
 return filename
 else:
 print('No such file: {!r}.'.format(filename))
 print('Please enter a valid file name')
 continue

There's a precept in Python, EAFP, which stands for "Easier to Ask Forgiveness than Permission". What it means is that Python programmers tend not to check things with conditionals, like doing if os.path.isfile(filename). The style in Python is more to assume everything is good, and let the program throw an exception if it's not good. In this case, if there's no such file, you throw IOError and complain. Sometimes you do want to ask permission, but I think this is a case where it's easier to ask forgiveness.

This is something I also see elsewhere in your code. It's good to be safe, but in Python, people tend to really lean more heavily on exceptions than on explicit conditional checks in most cases. The case where you do use a conditional is when there are multiple possibilities, all of which are valid, and you need to figure out which case you're in. But if something is wrong or invalid or unexpected, like the passed file name not being a real file, I recommend exceptions.

(By the way, you don't need continue in your else clause. What continue does is skip over any code that comes after it to go on to the next iteration of the loop. In this case, there is no code after continue, so it would always go to the next iteration anyway.)

In both get_keys and get_contents, you have some code like

try:
 with open(fname) as f:
 text_file = f.read()
except OSError:
 print('No such file {!r}. Check file name and path and try again.'
 .format(fname))
 raise

Rather than print a message from inside this function, I would probably re-raise with a new message:

except OSError:
 raise OSError('No such file {!r}. Check file name and path and try again.'.format(fname))

For the most part, I don't believe in catching exceptions unless you're going to do something about them. But re-throwing with more specific info is a perfectly valid thing to do. Also, I don't like to have functions other than main printing to the console. I'd prefer to re-throw with a new message, catch the exception in main, and print the message.

Related to this, I see the following code in validate_inputs:

try:
 validate_spaces(key_set)
except KeyingError as e:
 print('Goodbye')
 quit()

I would prefer not to catch the KeyingError here. Letting an exception go uncaught will just stop the whole program, which seems to be what you wanted. Some other languages force you to catch or declare every exception, but Python will just bring down the whole program around you. That's not what you want for production code, but that is absolutely what you want for development: anything anomalous will make the program crash and die right away, with a reference to the line number where the crashing and dying occurred. As a bonus, it's quicker and easier to write the code that way, because you don't have to add try/except blocks around everything. If you can do something about the invalid input, then definitely catch the exception and do something. But if all you can do is say "You screwed up, fix it", then why not just let the exception be thrown?

This piece of code from validate_spaces could be a lot shorter and cleaner:

if bad_items != []:
 try:
 raise KeyingError('Keys cannot contain spaces.')
 except KeyingError as e:
 print(e)
 print('Please correct these keys:\n', bad_items)
 # quit()
 raise

I think it would look better like this:

if bad_items:
 raise KeyingError("Keys cannot contain spaces. Please correct: {}".format(bad_items))

The empty list is falsey, while the non-empty list is truthy, so writing if bad_items is equivalent to testing if the bad_items list is non-empty.
There's just not much point in throwing and catching an exception in the same function. If you really wanted to print a message and die inside this function, just do it, without throwing an exception. But in an application like this, I prefer my error handling to be mostly in main. It's just easier if your application has a single valid exit point. For example, if the application has multiple exit points and you have a bug where it's exiting anomalously, then to figure out why, you have to monitor all of those exit points. You might not even realize right away that it exited anomalously.

A program like this, that seems to be a command line utility, is probably better served by taking command line arguments than by interactively reading filenames. That's the next direction I'd go in. The simple way to do this is to read sys.argv. If you've ever done bash, sys.argv[0] is the script name, just like 0ドル, and sys.argv[1:] are the positional arguments passed on the command line, just like 1ドル, 2ドル, etc.:

python templater.py some_file.csv another_file.html

would call templater.py with "some_file.csv" as the value of sys.argv[1] and "another_file.html" as the value of sys.argv[2].

The more complex way to do this is to use the argparse module from the standard library. If you're sticking with all positional arguments, then reading sys.argv directly is probably fine. You can do something like

try: 
 append_file = sys.argv[3]
except IndexError:
 pass # Optional argument not passed

to check whether optional arguments were passed. But argparse really shows its value if you want to have options and switches, which are a pain with sys.argv and near impossible with interactive input. (You have to either have a config file somewhere, or annoy the user every time with "Do you want gold-plating? (y/n)".)

To end on a positive note, I have nothing but good things to say about your use of docstrings, especially the ones on your functions. This is exactly the kind of excellent docstring that Clojure has and that Python mostly lacks, at least in the standard library.

This is great stuff. Just what I needed. I am definitely going to work on the error handling. Also, I think the argv will work great. Hadn't gottent to that section yet in the library (it's section 29). I am also going to look at the argparse module as I may want to add some switches like a -v verbose mode or maybe change syntax for specific file types (the ${} syntax certainly doesn't play nice with javascript). Thx!

Stack Exchange Network

Simple Batch Templating Utility in Python

1 Answer 1

You must log in to answer this question.

Linked

Hot Network Questions