Transforming markdown files in folders into a documentation

Question 1

I start coding a simple application: I transform markdown files in folders into documentation.

My architecture is very flat. My code looks like a series of unit function called one after the other.

def main():
 config = get_user_config(os.getcwd())
 sources_dir = config['sources_dir']
 for root, dirnames, filenames in os.walk(sources_dir):
 for filename in fnmatch.filter(filenames, '*.md'):
 markdown_file_path = os.path.join(root, filename)
 html_block = transform_html_into_markdown(markdown_file_path)
 wrapped_html = '{{% extends "base.html" %}}{{% block content %}}{0}{{% endblock %}}'.format(html_block)
 html = render_html(wrapped_html, config)
 html_file_path = get_html_file_path(markdown_file_path, sources_dir)
 write_html_file(html_file_path, html)

What I really like:

I can unit test every function. Code is simple to understand, no fancy stuff.
Each function is in its own module.

What I do not like:

My main() method is growing until I create two functions to rule them all. I have the impression of hiding the dust under the carpet.
If I want for instance increase the number of process doing some calculation, I will probably need some hard refactoring.
Everything is synchronous.

Do you have some idea to improve my code? Should I try something else?

Question 2

"two functions to rule them all" - which is the second?

Question 3

@Eric If my main(){} function become huge, I used to separate in two functions main(){a(); b();} and so on.

Question 4

First change - extract iteration logic to its own generator:

def markdown_files(config):
 sources_dir = config['sources_dir']
 for root, dirnames, filenames in os.walk(sources_dir):
 for filename in fnmatch.filter(filenames, '*.md'):
 yield os.path.join(root, filename)
def main():
 config = get_user_config(os.getcwd())
 for path in markdown_files(config): 
 html_block = transform_html_into_markdown(path)
 # ...

For one, this eliminates a level of nesting, which is always a good thing

Also you can add things like exclude paths to your config at a later date, without having to change your processing code. Note that this might require you to rethink passing sources_dir into get_html_file_path, as any filtering logic might end up duplicated...

Second change: wrap the entire processing routine in a single function:

def process_file(config, path): 
 html_block = transform_html_into_markdown(path)
 # ...
def main():
 config = get_user_config(os.getcwd())
 for md_path in markdown_files(config):
 process_file(config, md_path)

So that later you can parallelize it, if you really need to:

import multiprocessing
import functools # we can't directly pass a lambda into map
pool = multiprocessing.Pool()
pool.map(
 functools.partial(process_file, config=config),
 markdown_files(config)
)

Question 5

nice answer, thank you. So I should only make my code more "functionnal" ?

Eric Eric 3401 silver badge7 bronze badges · Accepted Answer · 2015-07-26 21:26:17Z

First change - extract iteration logic to its own generator:

def markdown_files(config):
 sources_dir = config['sources_dir']
 for root, dirnames, filenames in os.walk(sources_dir):
 for filename in fnmatch.filter(filenames, '*.md'):
 yield os.path.join(root, filename)
def main():
 config = get_user_config(os.getcwd())
 for path in markdown_files(config): 
 html_block = transform_html_into_markdown(path)
 # ...

For one, this eliminates a level of nesting, which is always a good thing

Also you can add things like exclude paths to your config at a later date, without having to change your processing code. Note that this might require you to rethink passing sources_dir into get_html_file_path, as any filtering logic might end up duplicated...

Second change: wrap the entire processing routine in a single function:

def process_file(config, path): 
 html_block = transform_html_into_markdown(path)
 # ...
def main():
 config = get_user_config(os.getcwd())
 for md_path in markdown_files(config):
 process_file(config, md_path)

So that later you can parallelize it, if you really need to:

import multiprocessing
import functools # we can't directly pass a lambda into map
pool = multiprocessing.Pool()
pool.map(
 functools.partial(process_file, config=config),
 markdown_files(config)
)

nice answer, thank you. So I should only make my code more "functionnal" ?

Stack Exchange Network

Transforming markdown files in folders into a documentation

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Transforming markdown files in folders into a documentation

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions