String replace templating utility

Question 1

I am new to Python and I am writing my first utility as a way to learn about strings, files, etc. I am writing a simple utility using string replacement to batch output HTML files. The program takes as inputs a CSV file and an HTML template file and will output an HTML file for each data row in the CSV file.

CSV Input File: test1.csv

The CSV file, which has header row, contains some catalog data, one product per row, like below:

stockID,color,material,url
340,Blue and magenta,80% Wool / 20% Acrylic,http://placehold.it/400
275,Purple,100% Cotton,http://placehold.it/600
318,Blue,100% Polyester,http://placehold.it/400x600

HTML Template Input File: testTemplate.htm

The HTML template file is simply a copy of the desired output with string replace tags %s placed at the appropriate locations:

<h1>Stock ID: %s</h1>
<ul>
 <li>%s</li>
 <li>%s</li>
</ul>
<img src='%s'>

The Python is pretty straight forward I think. I open the template file and store it as a string. I then open the CSV file using the csv.dictreader() command. I then iterate through the rows of the CSV, build the file names and then write the output files using string replacement on the template string using the dictionary keys.

import csv
# Open template file and pass string to 'data'. Should be in HTML format except with string replace tags.
with open('testTemplate.htm', 'r') as myTemplate:
 data = myTemplate.read()
 # print template for visual cue.
 print('Template passed:\n' + '-'*30 +'\n' + data)
 print('-'*30)
# open CSV file that contains the data and store to a dictyionary 'inputFile'.
with open('test1.csv') as csvfile:
 inputFile = csv.DictReader(csvfile)
 x = 0 # counter to display file count
 for row in inputFile:
 # create filenames for the output HTML files
 filename = 'listing'+row['stockID']+'.htm'
 # print filenames for visual cue.
 print(filename)
 x = x + 1 
 # create output HTML file.
 with open(filename, 'w') as outputFile:
 # run string replace on the template file using items from the data dictionary
 # HELP--> this is where I get nervous because chaos will reign if the tags get mixed up
 # HELP--> is there a way to add identifiers to the tags? like %s1 =row['stockID'], %s2=row['color'] ... ???
 outputFile.write(data %(row['stockID'], row['color'], row['material'], row['url']))
# print the number of files created as a cue program has finished.
print('-'*30 +'\n' + str(x) + ' files created.')

The program works as expected with the test files I have been using (which is why I am posting here and not on SO). My concern is that it seems pretty fragile. In 'production' the CSV file will contain many more columns (around 30-40) and the HTML will be much more complex, so the chances of one of the tags in the string replace getting mixed seems pretty high. is there a way to add identifiers to the tags? like %s1 =row['stockID'], %s2=row['color'] ...? that could be placed either in the template file or in the write() statement (or both)? Any method alternatives or improvements I could learn would be great (note I am well aware of the Makos and Mustaches of the world and plan to learn a couple of template packages soon.)

Question 2

Look into proper html templating engine.

Question 3

Thanks @Codes I do plan to learn a couple of templating packages like I mentioned. Any recommendations? Right now I am thinking of learning Mako and Mustache both just for fun.

Question 4

Python has a number of templating options, but the simplest to start is probably the string.Template one described in https://docs.python.org/3/library/string.html#template-strings

This supports targets such as $StockId and is used as below

>>> from string import Template
>>> s = Template('$who likes $what')
>>> s.substitute(who='tim', what='kung pao')
'tim likes kung pao'

If you need more output options, look at the string.format functionality, but this is probably best for starting with.

Question 5

Yep, this is exactly what I needed. I found the great resource PEP 292. I am currently rewriting code per this PEP and will post my solution when complete (and give you the green checkmark of course!).

Question 6

Style

Python has a style guide called PEP8. Among many other great things, it gives guidelines about spacing that you do not follow. Indeed, your spacing seems to be quite inconsistent. You'll find tools such as pep8 to check your compliancy to PEP8 and other tools such as ``autopep8 to fix your code automatically.

It can be a good habit to move the part of your program doing things (by opposition to the part of your program defining things) behind an if __name__ == "__main__" guard.

You can also use tools such as pylint to check your code. Among other things, Python naming convention are now followed.

Don't repeat yourself / avoid magic numbers

I can see 30 in multiples places. This is usually a bad sign : if you ever want to change the value to something else, you'll have to change it in multiple places. You probably should define a constant to hold that value behind a meaningful name.

Even better, you could define a function to perform the particular behavior that you want :

Getting the length the right way

At the moment, you are keeping track of the number of rows in input_file by incrementing a variable x. It is much clearer to simply use len(intput_file). Also, x = x + 1 can simply be written : x += 1.

Taking these various comments into account, you get :

import csv
SIZE_LINE = 30
def print_with_line(s):
 print(s)
 print('-' * SIZE_LINE)
if __name__ == '__main__':
 # Open template file and pass string to 'data'.
 # Should be in HTML format except with string replace tags.
 with open('testTemplate.htm', 'r') as my_template:
 data = my_template.read()
 # print template for visual cue.
 print_with_line('Template passed:')
 print_with_line(data)
 # open CSV file that contains the data and
 # store to a dictyionary 'input_file'.
 with open('test1.csv') as csv_file:
 input_file = csv.DictReader(csv_file)
 for row in input_file:
 # create filenames for the output HTML files
 filename = 'listing' + row['stockID'] + '.htm'
 # print filenames for visual cue.
 print(filename)
 # create output HTML file.
 with open(filename, 'w') as output_file:
 # run string replace on the template file
 # using items from the data dictionary
 # HELP--> this is where I get nervous because
 # chaos will reign if the tags get mixed up
 # HELP--> is there a way to add identifiers to
 # the tags? like %s1 =row['stockID'], %s2=row['color'] ... ???
 output_file.write(data % (
 row['stockID'],
 row['color'],
 row['material'],
 row['url']))
 # print the number of files created as a cue program has finished.
 print_with_line(str(len(input_file)) + ' files created.')

Question 7

Thanks for the useful info @Josay. Considering this was my FIRST program, I was more focused on figuring things out than worrying about style. Your len(input_file) does not work: Traceback (most recent call last): File "C:\Code\anotherTry.py", line 45, in <module> print_with_line(str(len(input_file)) + ' files created.') TypeError: object of type 'DictReader' has no len()

Question 8

Ah! I should have tried >_< I'll try and have a look in a few hours. Sorry for the inconvenience

Question 9

Can you explain what the if __name__ == '__main__':does?

Question 10

There is a link to an explanation. Basically, what's behind only gets executed when your file is used as a script (and not imported as a middle for instance). If you want to write reusable code, you have to use this to be able to import modules without interferences.

Question 11

The proper way to get the length @Josay is going to be str(input_file.line_num - 1) with the -1 to account for the header row.

Question 12

Fixing the style
After reading through PEP8 and the supporting docs, you will gain a deeper insight into the importance of style in your Python code. As @Josay mentioned in his thoughtful answer, the style of the code is poor (spacing issues, line lengths, naming conventions) leading to poor readability. In addition to using the autopep8 tool mention in @Josay's answer, some of the style issues can be fixed with changes to methods as explained below.

I am using Sublime Text 3 as my editor, so there are also several features available to ensure proper styling on future code. Some quick changes to the user settings in ST3 include can include:

{
 ...
 // ruler at 72 for docstrings and block notes
 // ruler at 79 for code
 "rulers": [72,79]
 "translate_tabs_to_spaces": true,
 "draw_white_space": "all"
 "trim_trailing_white_space_on_save": true
 ...
}

In addition, a linting package for ST3 can be installed and used to highlight style issues as you code. I selected the SublimeLinter3 package with the pep8 plug-in.

A more robust string substitution method:
As mentioned in the original post, the simple string replace method of using the string formatting operator like:

>>> print('A %s runs into a %s' % ('foo', 'bar'))
A foo runs into a bar

is very fragile. It works fine for very short strings like this example, but not for longer strings (like documents). Mix up your list and your bar suddenly runs into a foo!

- A slightly better approach: The data in the program is being read by cvs.DictReader() into a dictionary with column heads as the keys, so a slightly better approach would be to at least use the optional mapping key on the string formatting operator to take advantage of that. The above example becomes:

>>> row = {'item1': "foo", 'item2': "bar"}
>>> print('A %(item1)s runs into a %(item2)s' % (row))
A foo runs into a bar

This is certainly better than the original post and solves my concern about list mixing (or more likely columns being moved in native Excel file). However, it still has some weaknesses. Forget one of the s's and Python hands you an error:

>>> print('A %(item1)s runs into a %(item2)' % (row))
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
ValueError: incomplete format

or worse (in my opinion), you put too many s's in or put it in the wrong place and you can end up with a semantic error, i.e. you do not get an error from Python, but the output is wrong:

>>> print('A %s(item1)s runs into a %(item2)s' % (row))
A {'item1': 'foo', 'item2': 'bar'}(item1)s runs into a bar

Pretty easy pitfalls for a beginning programmer to fall into.

- A MUCH better approach: The most elegant solution (possibly short of using a proper template engine package) was proposed by @Gwyn Evans. The string module provides us with a Template class which allows for $-based string replacement rather than the normal %-based format and also eliminates the need for the pesky s conversion types. The document PEP 292 provides an excellent in-depth explanation of the rationale behind this method (which I discovered the hard way as explained above). You define your string as a Template and then perform a .substitute() using your key mapped data. The above example becomes:

>>> import string
>>> s = string.Template('A $item1 runs into a $item2.')
>>> row = {'item1': "foo", 'item2': "bar"}
>>> print(s.substitute(row))
A foo runs into a bar.

You can also change the $ delimiter character or make other changes to the template behavior if desired as described in the docs.

In Summary
So, correcting the original post using the string.Template method results in a template file:

<h1>Stock ID: $stockID</h1>
<ul>
 <li>$color</li>
 <li>$material</li>
</ul>
<img src='$url'>

Rolling all of the above into the code thus yields:

import csv
import string
if __name__ == '__main__':
 # Open template file and pass string to 'data'.
 # Will be in HTML format except with the string.Template replace
 # tags with the format of '$var'. The 'var' MUST correspond to the
 # items in the heading row of the input CSV file.
 with open('testTemplate2.htm', 'r') as my_template:
 data = my_template.read()
 # Print template for visual cue.
 print('Template loaded:')
 print(data)
 # Pass 'data' to string.Template object data_template.
 data_template = string.Template(data)
 # Open the input CSV file and pass to dictionary 'input_file'
 with open('test1.csv') as csv_file:
 input_file = csv.DictReader(csv_file)
 for row in input_file:
 # Create filenames for the output HTML files
 filename = 'listing' + row['stockID'] + '.htm'
 # Print filenames for visual cue.
 print(filename)
 # Create output HTML file.
 with open(filename, 'w') as output_file:
 # Run string.Template substitution on data_template
 # using data from 'row' as source and write to
 # 'output_file'.
 output_file.write(data_template.substitute(row))
 # Print the number of files created as a cue program has finished.
 print(str(input_file.line_num - 1) + ' files were created.')

Gwyn Evans 2361 silver badge4 bronze badges · Accepted Answer · 2015-03-01 07:49:25Z

Python has a number of templating options, but the simplest to start is probably the string.Template one described in https://docs.python.org/3/library/string.html#template-strings

This supports targets such as $StockId and is used as below

>>> from string import Template
>>> s = Template('$who likes $what')
>>> s.substitute(who='tim', what='kung pao')
'tim likes kung pao'

If you need more output options, look at the string.format functionality, but this is probably best for starting with.

Yep, this is exactly what I needed. I found the great resource PEP 292. I am currently rewriting code per this PEP and will post my solution when complete (and give you the green checkmark of course!).

Stack Exchange Network

String replace templating utility

3 Answers 3

You must log in to answer this question.

Linked

Hot Network Questions

String replace templating utility

3 Answers 3

You must log in to answer this question.

Linked

Related

Hot Network Questions