I start coding a simple application: I transform markdown files in folders into documentation.
My architecture is very flat. My code looks like a series of unit function called one after the other.
def main():
config = get_user_config(os.getcwd())
sources_dir = config['sources_dir']
for root, dirnames, filenames in os.walk(sources_dir):
for filename in fnmatch.filter(filenames, '*.md'):
markdown_file_path = os.path.join(root, filename)
html_block = transform_html_into_markdown(markdown_file_path)
wrapped_html = '{{% extends "base.html" %}}{{% block content %}}{0}{{% endblock %}}'.format(html_block)
html = render_html(wrapped_html, config)
html_file_path = get_html_file_path(markdown_file_path, sources_dir)
write_html_file(html_file_path, html)
What I really like:
- I can unit test every function. Code is simple to understand, no fancy stuff.
- Each function is in its own module.
What I do not like:
- My
main()
method is growing until I create two functions to rule them all. I have the impression of hiding the dust under the carpet. - If I want for instance increase the number of process doing some calculation, I will probably need some hard refactoring.
- Everything is synchronous.
Do you have some idea to improve my code? Should I try something else?
1 Answer 1
First change - extract iteration logic to its own generator:
def markdown_files(config):
sources_dir = config['sources_dir']
for root, dirnames, filenames in os.walk(sources_dir):
for filename in fnmatch.filter(filenames, '*.md'):
yield os.path.join(root, filename)
def main():
config = get_user_config(os.getcwd())
for path in markdown_files(config):
html_block = transform_html_into_markdown(path)
# ...
For one, this eliminates a level of nesting, which is always a good thing
Also you can add things like exclude paths to your config at a later date, without having to change your processing code. Note that this might require you to rethink passing sources_dir into get_html_file_path
, as any filtering logic might end up duplicated...
Second change: wrap the entire processing routine in a single function:
def process_file(config, path):
html_block = transform_html_into_markdown(path)
# ...
def main():
config = get_user_config(os.getcwd())
for md_path in markdown_files(config):
process_file(config, md_path)
So that later you can parallelize it, if you really need to:
import multiprocessing
import functools # we can't directly pass a lambda into map
pool = multiprocessing.Pool()
pool.map(
functools.partial(process_file, config=config),
markdown_files(config)
)
-
\$\begingroup\$ nice answer, thank you. So I should only make my code more "functionnal" ? \$\endgroup\$Guillaume Vincent– Guillaume Vincent2015年07月27日 06:46:37 +00:00Commented Jul 27, 2015 at 6:46
Explore related questions
See similar questions with these tags.
main(){}
function become huge, I used to separate in two functionsmain(){a(); b();}
and so on. \$\endgroup\$