I've implemented the beginnings of a Python preprocesser I plan to implement sometime in the future, and the below code is my simple prototype. At the moment, it only converts until
and unless
statements.
py_preproc.py
"""
A simple Python utility that converts
non-valid Python code, like `until`,
and `unless` blocks to valid Python
code.
"""
from sys import argv
try:
file_path = argv[1]
execute_results = argv[2]
except IndexError:
raise Exception("Two argv arguments are required, file_path, and execute_results.")
def open_code_file():
"""
Opens the code file to do the conversion
on, and returns the read version as a
string.
"""
with open(file_path, "r+") as code_file:
return code_file.read()
def replace_items(file_string):
"""
Replace specific pieces of the code_file
with valid Python code. Currently only
`until`, and `unless` blocks are replaced.
"""
return file_string.replace(
"until", "while not"
).replace(
"unless", "if not"
)
def evaluate_result(result_string):
"""
Evaluates the converted result, after
the code has been converted to valid
Python code.
"""
py_string_compiled = compile(result_string, "fakemodule", "exec")
exec(py_string_compiled)
def main():
file_string = open_code_file()
new_file_string = replace_items(file_string)
if execute_results == "-true":
evaluate_result(new_file_string)
elif execute_results == "-false":
with open(file_path) as file_to_write:
file_to_write.truncate()
file_to_write.write(new_file_string)
else:
raise Exception("Invalid argument \"{argument}\".".format(argument=execute_results))
if __name__ == '__main__':
main()
For reference, here's an example of a code file using until
and unless
, before it's preprocessed.
iterator = 10
until iterator == 0:
print(iterator)
iterator -= 1
unless iterator != 0:
print("iterator == 0")
The above example is then converted to this:
iterator = 10
while not iterator == 0:
print(iterator)
iterator -= 1
if not iterator != 0:
print("iterator == 0")
Finally, here's the command line syntax for running the command, where file_to_execute_or_convert.py
is the Python file that you want to run or convert, and execute_or_write
is -true
if you want to just execute it, or -false
if you want to convert it.
python py_preproc.py [file_to_execute_or_convert.py] [execute_or_write]
3 Answers 3
Remco Gerlich is quite right to point out that this preprocessor does not work on general Python code, because it uses string replacement. Thus
unless
will be changed toif not
in strings (such as regular expressions, templates, and docstrings), potentially breaking the code. The preprocessor itself is just one example of the kind of code which will be broken: it is not good enough to respond that this concern is "silly".Additionally, Bakuriu is quite right to point out that the transformation of
unless
toif not
is inadequate: it only works on very simple examples. But consider:a unless b or c else d
which would be changed to:
a if not b or c else d
where the condition is wrong, because
not b or c
parses as(not b) or c
. The correct transformation is:a if not (b or c) else d
This problem makes it hopeless to attempt this preprocessing at the level of text, or even of tokens: you need a parser to find the end of the condition. Consider an expression like:
a unless b or (c unless d and e else f) else g
which needs to be transformed into:
a if not (b or (c if not (d and e) else f)) else g
I can see you've tried to guard against module execution by using if __name__ == '__main__'
, but I believe the try except block at the beginning of the file will be executed on module import. This may go awry if you import your module and try to use/test its functions in a library fashion.
You could consider using the Python argparse
module to handle arguments within your main()
. It will handle common idioms like -argumentname value
for you (or simply map first argument to inputfile
, second argument to outputfile
for example if you prefer), provide a default help text, allow you to sanitise values with callbacks, allow you to refer to the arguments in code by field name rather than index and so on.
Your code looks really good, and there's very little I can see here that can be improved, nonetheless, the show goes on.
with open(file_path, "r+") as code_file: return code_file.read()
I can't see that you re-use code_file
again to save, meaning plain r
is acceptable, over r+
, but as @QPaysTaxes pointed out in the comments, if provided an empty file path, the empty file would be created, and that would be run, so displaying an error here would be a good idea.
The .replace().replace()
can be improved, I had the following solution in mind:
return file_string.replace( "until", "while not" ).replace( "unless", "if not" )
into:
changes = {
"until": "while not",
"unless": "if not",
}
return [file_string.replace(k, v) for k, v in changes.items()]
I think it'd be best if you set execute_results
by default to false
, as it's better than returning an error with a flag, if the flag is empty, because those kinda flags are usually optional, right?
execute_results = argv[2]
Possibly into the following, if you can accept it's malformed appearance.
execute_results = argv[2] if argv[2] else "-false"
(Thanks @frerich-raabe for the suggestion)
I can't really see anything other than that, well done!
-
1\$\begingroup\$ ...Doesn't
r+
create the file if it's not there? \$\endgroup\$anon– anon2015年06月29日 05:10:29 +00:00Commented Jun 29, 2015 at 5:10 -
2\$\begingroup\$ Well, my point is that reuse doesn't matter --
r+
shouldn't be used, but it shouldn't be used because you should throw an error. \$\endgroup\$anon– anon2015年06月29日 05:40:59 +00:00Commented Jun 29, 2015 at 5:40 -
\$\begingroup\$ @Quill My point is that in order to test whether a string is empty, you can just use the string itself: empty strings evaluate to
False
, i.e. you can useexecute_results = argv[2] if argv[2] else "-false"
; PEP8 explains: "For sequences, (strings, lists, tuples), use the fact that empty sequences are false.". \$\endgroup\$Frerich Raabe– Frerich Raabe2015年06月29日 14:42:58 +00:00Commented Jun 29, 2015 at 14:42
while not (iterator == 0)
andif not (iterator == 0)
. Otherwise if the user uses a condition likeunless x and y
it then becomeswhile not x and y
which is equivalent towhile (not x) and y
. But then you can't simply usereplace
anymore... \$\endgroup\$"until"
inside a string literal, and the code itself was an example of that. That it is reserved doesn't matter either, because this script will also change the variable namerun_until_completed
intorun_while not_completed
. And what Bakuriu said. \$\endgroup\$