I am new to Python and I am writing my first utility as a way to learn about strings, files, etc. I am writing a simple utility using string replacement to batch output HTML files. The program takes as inputs a CSV file and an HTML template file and will output an HTML file for each data row in the CSV file.
CSV Input File: test1.csv
The CSV file, which has header row, contains some catalog data, one product per row, like below:
stockID,color,material,url 340,Blue and magenta,80% Wool / 20% Acrylic,http://placehold.it/400 275,Purple,100% Cotton,http://placehold.it/600 318,Blue,100% Polyester,http://placehold.it/400x600
HTML Template Input File: testTemplate.htm
The HTML template file is simply a copy of the desired output with string replace tags %s
placed at the appropriate locations:
<h1>Stock ID: %s</h1>
<ul>
<li>%s</li>
<li>%s</li>
</ul>
<img src='%s'>
The Python is pretty straight forward I think. I open the template file and store it as a string. I then open the CSV file using the csv.dictreader()
command. I then iterate through the rows of the CSV, build the file names and then write the output files using string replacement on the template string using the dictionary keys.
import csv
# Open template file and pass string to 'data'. Should be in HTML format except with string replace tags.
with open('testTemplate.htm', 'r') as myTemplate:
data = myTemplate.read()
# print template for visual cue.
print('Template passed:\n' + '-'*30 +'\n' + data)
print('-'*30)
# open CSV file that contains the data and store to a dictyionary 'inputFile'.
with open('test1.csv') as csvfile:
inputFile = csv.DictReader(csvfile)
x = 0 # counter to display file count
for row in inputFile:
# create filenames for the output HTML files
filename = 'listing'+row['stockID']+'.htm'
# print filenames for visual cue.
print(filename)
x = x + 1
# create output HTML file.
with open(filename, 'w') as outputFile:
# run string replace on the template file using items from the data dictionary
# HELP--> this is where I get nervous because chaos will reign if the tags get mixed up
# HELP--> is there a way to add identifiers to the tags? like %s1 =row['stockID'], %s2=row['color'] ... ???
outputFile.write(data %(row['stockID'], row['color'], row['material'], row['url']))
# print the number of files created as a cue program has finished.
print('-'*30 +'\n' + str(x) + ' files created.')
The program works as expected with the test files I have been using (which is why I am posting here and not on SO). My concern is that it seems pretty fragile. In 'production' the CSV file will contain many more columns (around 30-40) and the HTML will be much more complex, so the chances of one of the tags in the string replace getting mixed seems pretty high. is there a way to add identifiers to the tags? like %s1 =row['stockID'], %s2=row['color'] ...
? that could be placed either in the template file or in the write()
statement (or both)? Any method alternatives or improvements I could learn would be great (note I am well aware of the Makos and Mustaches of the world and plan to learn a couple of template packages soon.)
-
\$\begingroup\$ Look into proper html templating engine. \$\endgroup\$CodesInChaos– CodesInChaos2015年03月01日 13:36:12 +00:00Commented Mar 1, 2015 at 13:36
-
\$\begingroup\$ Thanks @Codes I do plan to learn a couple of templating packages like I mentioned. Any recommendations? Right now I am thinking of learning Mako and Mustache both just for fun. \$\endgroup\$Christopher Pearson– Christopher Pearson2015年03月01日 16:47:50 +00:00Commented Mar 1, 2015 at 16:47
3 Answers 3
Python has a number of templating options, but the simplest to start is probably the string.Template one described in https://docs.python.org/3/library/string.html#template-strings
This supports targets such as $StockId and is used as below
>>> from string import Template
>>> s = Template('$who likes $what')
>>> s.substitute(who='tim', what='kung pao')
'tim likes kung pao'
If you need more output options, look at the string.format functionality, but this is probably best for starting with.
-
\$\begingroup\$ Yep, this is exactly what I needed. I found the great resource PEP 292. I am currently rewriting code per this PEP and will post my solution when complete (and give you the green checkmark of course!). \$\endgroup\$Christopher Pearson– Christopher Pearson2015年03月01日 17:47:44 +00:00Commented Mar 1, 2015 at 17:47
Style
Python has a style guide called PEP8. Among many other great things, it gives guidelines about spacing that you do not follow. Indeed, your spacing seems to be quite inconsistent. You'll find tools such as pep8
to check your compliancy to PEP8 and other tools such as ``autopep8 to fix your code automatically.
It can be a good habit to move the part of your program doing things (by opposition to the part of your program defining things) behind an if __name__ == "__main__"
guard.
You can also use tools such as pylint
to check your code. Among other things, Python naming convention are now followed.
Don't repeat yourself / avoid magic numbers
I can see 30
in multiples places. This is usually a bad sign : if you ever want to change the value to something else, you'll have to change it in multiple places. You probably should define a constant to hold that value behind a meaningful name.
Even better, you could define a function to perform the particular behavior that you want :
Getting the length the right way
At the moment, you are keeping track of the number of rows in input_file by incrementing a variable x
. It is much clearer to simply use len(intput_file)
. Also, x = x + 1
can simply be written : x += 1
.
Taking these various comments into account, you get :
import csv
SIZE_LINE = 30
def print_with_line(s):
print(s)
print('-' * SIZE_LINE)
if __name__ == '__main__':
# Open template file and pass string to 'data'.
# Should be in HTML format except with string replace tags.
with open('testTemplate.htm', 'r') as my_template:
data = my_template.read()
# print template for visual cue.
print_with_line('Template passed:')
print_with_line(data)
# open CSV file that contains the data and
# store to a dictyionary 'input_file'.
with open('test1.csv') as csv_file:
input_file = csv.DictReader(csv_file)
for row in input_file:
# create filenames for the output HTML files
filename = 'listing' + row['stockID'] + '.htm'
# print filenames for visual cue.
print(filename)
# create output HTML file.
with open(filename, 'w') as output_file:
# run string replace on the template file
# using items from the data dictionary
# HELP--> this is where I get nervous because
# chaos will reign if the tags get mixed up
# HELP--> is there a way to add identifiers to
# the tags? like %s1 =row['stockID'], %s2=row['color'] ... ???
output_file.write(data % (
row['stockID'],
row['color'],
row['material'],
row['url']))
# print the number of files created as a cue program has finished.
print_with_line(str(len(input_file)) + ' files created.')
-
\$\begingroup\$ Thanks for the useful info @Josay. Considering this was my FIRST program, I was more focused on figuring things out than worrying about style. Your len(input_file) does not work: Traceback (most recent call last): File "C:\Code\anotherTry.py", line 45, in <module> print_with_line(str(len(input_file)) + ' files created.') TypeError: object of type 'DictReader' has no len() \$\endgroup\$Christopher Pearson– Christopher Pearson2015年03月01日 02:16:18 +00:00Commented Mar 1, 2015 at 2:16
-
\$\begingroup\$ Ah! I should have tried >_< I'll try and have a look in a few hours. Sorry for the inconvenience \$\endgroup\$SylvainD– SylvainD2015年03月01日 02:18:34 +00:00Commented Mar 1, 2015 at 2:18
-
\$\begingroup\$ Can you explain what the
if __name__ == '__main__':
does? \$\endgroup\$Christopher Pearson– Christopher Pearson2015年03月01日 02:21:11 +00:00Commented Mar 1, 2015 at 2:21 -
1\$\begingroup\$ There is a link to an explanation. Basically, what's behind only gets executed when your file is used as a script (and not imported as a middle for instance). If you want to write reusable code, you have to use this to be able to import modules without interferences. \$\endgroup\$SylvainD– SylvainD2015年03月01日 02:25:20 +00:00Commented Mar 1, 2015 at 2:25
-
\$\begingroup\$ The proper way to get the length @Josay is going to be
str(input_file.line_num - 1)
with the-1
to account for the header row. \$\endgroup\$Christopher Pearson– Christopher Pearson2015年03月02日 22:08:34 +00:00Commented Mar 2, 2015 at 22:08
Fixing the style
After reading through PEP8 and the supporting docs, you will gain a deeper insight into the importance of style in your Python code. As @Josay mentioned in his thoughtful answer, the style of the code is poor (spacing issues, line lengths, naming conventions) leading to poor readability. In addition to using the autopep8
tool mention in @Josay's answer, some of the style issues can be fixed with changes to methods as explained below.
I am using Sublime Text 3 as my editor, so there are also several features available to ensure proper styling on future code. Some quick changes to the user settings in ST3 include can include:
{
...
// ruler at 72 for docstrings and block notes
// ruler at 79 for code
"rulers": [72,79]
"translate_tabs_to_spaces": true,
"draw_white_space": "all"
"trim_trailing_white_space_on_save": true
...
}
In addition, a linting package for ST3 can be installed and used to highlight style issues as you code. I selected the SublimeLinter3 package with the pep8 plug-in.
A more robust string substitution method:
As mentioned in the original post, the simple string replace method of using the string formatting operator like:
>>> print('A %s runs into a %s' % ('foo', 'bar'))
A foo runs into a bar
is very fragile. It works fine for very short strings like this example, but not for longer strings (like documents). Mix up your list and your bar suddenly runs into a foo!
- A slightly better approach:
The data in the program is being read by cvs.DictReader()
into a dictionary with column heads as the keys, so a slightly better approach would be to at least use the optional mapping key on the string formatting operator to take advantage of that. The above example becomes:
>>> row = {'item1': "foo", 'item2': "bar"}
>>> print('A %(item1)s runs into a %(item2)s' % (row))
A foo runs into a bar
This is certainly better than the original post and solves my concern about list mixing (or more likely columns being moved in native Excel file). However, it still has some weaknesses. Forget one of the s
's and Python hands you an error:
>>> print('A %(item1)s runs into a %(item2)' % (row))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: incomplete format
or worse (in my opinion), you put too many s
's in or put it in the wrong place and you can end up with a semantic error, i.e. you do not get an error from Python, but the output is wrong:
>>> print('A %s(item1)s runs into a %(item2)s' % (row))
A {'item1': 'foo', 'item2': 'bar'}(item1)s runs into a bar
Pretty easy pitfalls for a beginning programmer to fall into.
- A MUCH better approach:
The most elegant solution (possibly short of using a proper template engine package) was proposed by @Gwyn Evans. The string
module provides us with a Template
class which allows for $
-based string replacement rather than the normal %
-based format and also eliminates the need for the pesky s
conversion types. The document PEP 292 provides an excellent in-depth explanation of the rationale behind this method (which I discovered the hard way as explained above). You define your string as a Template and then perform a .substitute()
using your key mapped data. The above example becomes:
>>> import string
>>> s = string.Template('A $item1 runs into a $item2.')
>>> row = {'item1': "foo", 'item2': "bar"}
>>> print(s.substitute(row))
A foo runs into a bar.
You can also change the $
delimiter character or make other changes to the template behavior if desired as described in the docs.
In Summary
So, correcting the original post using the string.Template
method results in a template file:
<h1>Stock ID: $stockID</h1>
<ul>
<li>$color</li>
<li>$material</li>
</ul>
<img src='$url'>
Rolling all of the above into the code thus yields:
import csv
import string
if __name__ == '__main__':
# Open template file and pass string to 'data'.
# Will be in HTML format except with the string.Template replace
# tags with the format of '$var'. The 'var' MUST correspond to the
# items in the heading row of the input CSV file.
with open('testTemplate2.htm', 'r') as my_template:
data = my_template.read()
# Print template for visual cue.
print('Template loaded:')
print(data)
# Pass 'data' to string.Template object data_template.
data_template = string.Template(data)
# Open the input CSV file and pass to dictionary 'input_file'
with open('test1.csv') as csv_file:
input_file = csv.DictReader(csv_file)
for row in input_file:
# Create filenames for the output HTML files
filename = 'listing' + row['stockID'] + '.htm'
# Print filenames for visual cue.
print(filename)
# Create output HTML file.
with open(filename, 'w') as output_file:
# Run string.Template substitution on data_template
# using data from 'row' as source and write to
# 'output_file'.
output_file.write(data_template.substitute(row))
# Print the number of files created as a cue program has finished.
print(str(input_file.line_num - 1) + ' files were created.')