1
\$\begingroup\$

I start coding a simple application: I transform markdown files in folders into documentation.

My architecture is very flat. My code looks like a series of unit function called one after the other.

def main():
 config = get_user_config(os.getcwd())
 sources_dir = config['sources_dir']
 for root, dirnames, filenames in os.walk(sources_dir):
 for filename in fnmatch.filter(filenames, '*.md'):
 markdown_file_path = os.path.join(root, filename)
 html_block = transform_html_into_markdown(markdown_file_path)
 wrapped_html = '{{% extends "base.html" %}}{{% block content %}}{0}{{% endblock %}}'.format(html_block)
 html = render_html(wrapped_html, config)
 html_file_path = get_html_file_path(markdown_file_path, sources_dir)
 write_html_file(html_file_path, html)

What I really like:

  • I can unit test every function. Code is simple to understand, no fancy stuff.
  • Each function is in its own module.

What I do not like:

  • My main() method is growing until I create two functions to rule them all. I have the impression of hiding the dust under the carpet.
  • If I want for instance increase the number of process doing some calculation, I will probably need some hard refactoring.
  • Everything is synchronous.

Do you have some idea to improve my code? Should I try something else?

200_success
145k22 gold badges190 silver badges478 bronze badges
asked Jul 26, 2015 at 19:15
\$\endgroup\$
2
  • \$\begingroup\$ "two functions to rule them all" - which is the second? \$\endgroup\$ Commented Jul 26, 2015 at 21:22
  • \$\begingroup\$ @Eric If my main(){} function become huge, I used to separate in two functions main(){a(); b();} and so on. \$\endgroup\$ Commented Jul 27, 2015 at 6:48

1 Answer 1

3
\$\begingroup\$

First change - extract iteration logic to its own generator:

def markdown_files(config):
 sources_dir = config['sources_dir']
 for root, dirnames, filenames in os.walk(sources_dir):
 for filename in fnmatch.filter(filenames, '*.md'):
 yield os.path.join(root, filename)
def main():
 config = get_user_config(os.getcwd())
 for path in markdown_files(config): 
 html_block = transform_html_into_markdown(path)
 # ...

For one, this eliminates a level of nesting, which is always a good thing

Also you can add things like exclude paths to your config at a later date, without having to change your processing code. Note that this might require you to rethink passing sources_dir into get_html_file_path, as any filtering logic might end up duplicated...


Second change: wrap the entire processing routine in a single function:

def process_file(config, path): 
 html_block = transform_html_into_markdown(path)
 # ...
def main():
 config = get_user_config(os.getcwd())
 for md_path in markdown_files(config):
 process_file(config, md_path)

So that later you can parallelize it, if you really need to:

import multiprocessing
import functools # we can't directly pass a lambda into map
pool = multiprocessing.Pool()
pool.map(
 functools.partial(process_file, config=config),
 markdown_files(config)
)
answered Jul 26, 2015 at 21:26
\$\endgroup\$
1
  • \$\begingroup\$ nice answer, thank you. So I should only make my code more "functionnal" ? \$\endgroup\$ Commented Jul 27, 2015 at 6:46

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.