Source formatting of Markdown Table

Question 1

I use MultiMarkdown tables more than I thought I ever would. In my .md files, I would like the source table code to be neat, evenly spaced, and resemble (in an alignment sense) the final HTML rendered table.

Solution

Select the source of the table, press a hotkey on my computer, replace the untidy table with the tidy table.

Notes:

This script (macro?) is for my own use; I will always write the source table the same way
The script is written for Python 2.7
I use a Mac app that allows me to assign a hot key to an action (e.g. a script), and the following seems to work with that at the moment!

Example

I write something like the following:

| Header 1 | Header 2 | ... | Header m | 
| :--- | :--: | :--: | --: | 
| $a_{11}$ | $a_{12}$ | ... | $a_{1m}$ | 
| ... | ... | ... | ... | 
| $a_{n1}$ | $a_{n2}$ | ... | $a_{nm} |

select it, press my hot key, and it is replaced by something like this (in the source):

| Header 1 | Header 2 | ... | Header m |
| :--- | :--: | :--: | --: |
| $a_{11}$ | $a_{12}$ | ... | $a_{1m}$ |
| ... | ... | ... | ... |
| $a_{n1}$ | $a_{n2}$ | ... | $a_{nm}$ |

As mentioned, the script seems to work.

How can I improve the script?

I would really appreciate some criticism of the script I've written. I am not a coder, but try on occasion to write little tool type things like this.

The Script

#!/usr/bin/env python
"""MMD Table Formatter.
Silly script that takes a MMD table as a 
string and returns a tidied version of the table
"""
import sys
import StringIO
query = sys.argv[1]
# For "cleaned" table entries:
rows = []
# This NEEDS TO BE CLOSED AFTER!!
s_obj = StringIO.StringIO(query)
# Clean the entries:
for line in s_obj:
 l = line.split('|')
 rows.append([entry.strip() for entry in l])`enter code here`
# CLOSE
s_obj.close()
# Max length of each "entry" is what we'll use
# to evenly space "columns" in the final table
cols = zip(*rows)
col_widths = []
for columns in cols:
 row_widths = map(lambda x: len(x), columns)
 col_widths.append(max(row_widths))
# Let's align entries as per intended formatting.
# Second line of input string contains alignment commmands:
# ":---" left aligned
# "---:" right aligned
# ":--:" centered (also accepts "---")
alignment = []
for r in rows[1]:
 if r.startswith(":") and not r.endswith(":"):
 alignment.append("lalign")
 elif r.endswith(":") and not r.startswith(":"):
 alignment.append("ralign")
 else:
 alignment.append("centered")
# Prepare for output string:
out = []
for row in rows:
 for entry_and_width in zip(row, col_widths, alignment):
 if entry_and_width[1] == 0:
 continue
 if entry_and_width[2] == "centered":
 outstring = "| " + entry_and_width[0].center(entry_and_width[1]) + ' '
 out.append(outstring)
 if entry_and_width[2] == "lalign":
 outstring = "| " + entry_and_width[0].ljust(entry_and_width[1]) + ' '
 out.append(outstring)
 if entry_and_width[2] == "ralign":
 outstring = "| " + entry_and_width[0].rjust(entry_and_width[1]) + ' '
 out.append(outstring)
 out.append("|\n")
query = "".join(out)
sys.stdout.write(query)

Question 2

Note that a similar question was asked a few days ago. You might get some insights there.

Question 3

As far as I can see, there is no need for the StringIO (?). You can just use query.split('\n'). Regardless, that for loop can be condensed into a list comprehension:

rows = [[el.strip() for el in row.split('|')] for row in query.splitlines()]

If the StringIO is really needed, I would use with..as:

with StringIO.StringIO(query) as s_obj:
 rows = [[el.strip() for el in row.split('|')] for row in s_obj]

For the column widths you can use that len is already a function, so there is no need for lambda x: len(x). This way you can also inline all of it into one list comprehension:

col_widths = [max(map(len, column)) for column in zip(*rows)]

For the alignments, I would define a function that returns the alignment, given the content of a cell. First I had this function return your strings. But then I realized that you are already using entry.ljust further down. I then changed it to return the function to use here. Note that str.ljust("ab", 2) and "ab".ljust(2) are equivalent, so later we just call align(entry, width).

def get_alignment(cell):
 """"
 :---" left aligned
 "---:" right aligned
 ":--:" centered (also accepts "---"), default
 """
 if cell.startswith(":") and not cell.endswith(":"):
 return str.ljust
 elif cell.endswith(":") and not cell.startswith(":"):
 return str.rjust
 return str.center

To get all alignments, we just use map again:

alignments = map(get_alignment, rows[1])

Finally, the output part. Since you already build a nice tuple with zip, you should use tuple unpacking to give the element readable names. entry_and_widths is a confusing name, especially since it also contains the alignments!

Here we can now get rid of most of your code, since it boils down to:

entry = align(entry, width)
out.append("| {} ".format(entry)

Here I used str.format to make it a bit easier and avoid the costly string addition. Note that we could also use str.format to do the adjusting for us (using e.g. "{>2}".format(entry) instead of "{}".format(str.rjust(entry, 2)), but that would mean nesting formats, which starts to get ugly very quickly.

I also use the fact that 0 compares to False to make the code skipping a column if it is empty shorter.

out = []
for row in rows:
 for entry, width, align in zip(row, col_widths, alignments):
 if not width:
 continue
 out.append("| {} ".format(align(entry, width)))
 out.append("|\n")
query = "".join(out)

Final code:

#!/usr/bin/env python
"""MMD Table Formatter.
Silly script that takes a MMD table as a
string and returns a tidied version of the table
"""
import sys
def get_alignment(cell):
 """"
 :---" left aligned
 "---:" right aligned
 ":--:" centered (also accepts "---"), default
 """
 if cell.startswith(":") and not cell.endswith(":"):
 return str.ljust
 elif cell.endswith(":") and not cell.startswith(":"):
 return str.rjust
 return str.center
query = sys.argv[1]
# For "cleaned" table entries:
rows = [[el.strip() for el in row.split('|')] for row in query.splitlines()]
# Max length of each "entry" is what we'll use
# to evenly space "columns" in the final table
col_widths = [max(map(len, column)) for column in zip(*rows)]
# Let's align entries as per intended formatting.
# Second line of input string contains alignment commmands:
alignments = map(get_alignment, rows[1])
# Prepare for output string:
out = []
for row in rows:
 for entry, width, align in zip(row, col_widths, alignments):
 if not width:
 continue
 out.append("| {} ".format(align(entry, width)))
 out.append("|\n")
query = "".join(out)
sys.stdout.write(query)

Question 4

You should use query.splitlines() rather than query.split('\n'). Applied to 'test string\n' the former yield ['test string'] whereas the latter yield ['test string', ''] which is not that convenient to work with.

Question 5

Returning str.ljust et al., was exactly what I was looking for, and unaware I could do; so cheers. Thanks for reminding me of map too :) @MathiasEttinger Originally I had used splitlines() but opted for StringIO for speed - no idea if this is justified.

Question 6

@css As a wild gues I would say its not justified. Unless you had performances issues, ran a profiler and saw that the bottleneck was splitlines.

Question 7

To complete on @Graipher's answer, I would simplify the formating of the rows using list-comprehensions/generator expressions rather than building a list to feed into join:

query = '\n'.join(
 '| {} |'.format( # Build a row composed of an inner part between delimiters
 ' | '.join(align(entry, width)
 for entry, width, align in zip(row, col_widths, alignments)))
 for row in rows)

Graipher 41.7k7 gold badges70 silver badges134 bronze badges · Accepted Answer · 2016-10-26 08:39:35Z

As far as I can see, there is no need for the StringIO (?). You can just use query.split('\n'). Regardless, that for loop can be condensed into a list comprehension:

rows = [[el.strip() for el in row.split('|')] for row in query.splitlines()]

If the StringIO is really needed, I would use with..as:

with StringIO.StringIO(query) as s_obj:
 rows = [[el.strip() for el in row.split('|')] for row in s_obj]

For the column widths you can use that len is already a function, so there is no need for lambda x: len(x). This way you can also inline all of it into one list comprehension:

col_widths = [max(map(len, column)) for column in zip(*rows)]

For the alignments, I would define a function that returns the alignment, given the content of a cell. First I had this function return your strings. But then I realized that you are already using entry.ljust further down. I then changed it to return the function to use here. Note that str.ljust("ab", 2) and "ab".ljust(2) are equivalent, so later we just call align(entry, width).

def get_alignment(cell):
 """"
 :---" left aligned
 "---:" right aligned
 ":--:" centered (also accepts "---"), default
 """
 if cell.startswith(":") and not cell.endswith(":"):
 return str.ljust
 elif cell.endswith(":") and not cell.startswith(":"):
 return str.rjust
 return str.center

To get all alignments, we just use map again:

alignments = map(get_alignment, rows[1])

Finally, the output part. Since you already build a nice tuple with zip, you should use tuple unpacking to give the element readable names. entry_and_widths is a confusing name, especially since it also contains the alignments!

Here we can now get rid of most of your code, since it boils down to:

entry = align(entry, width)
out.append("| {} ".format(entry)

Here I used str.format to make it a bit easier and avoid the costly string addition. Note that we could also use str.format to do the adjusting for us (using e.g. "{>2}".format(entry) instead of "{}".format(str.rjust(entry, 2)), but that would mean nesting formats, which starts to get ugly very quickly.

I also use the fact that 0 compares to False to make the code skipping a column if it is empty shorter.

out = []
for row in rows:
 for entry, width, align in zip(row, col_widths, alignments):
 if not width:
 continue
 out.append("| {} ".format(align(entry, width)))
 out.append("|\n")
query = "".join(out)

Final code:

#!/usr/bin/env python
"""MMD Table Formatter.
Silly script that takes a MMD table as a
string and returns a tidied version of the table
"""
import sys
def get_alignment(cell):
 """"
 :---" left aligned
 "---:" right aligned
 ":--:" centered (also accepts "---"), default
 """
 if cell.startswith(":") and not cell.endswith(":"):
 return str.ljust
 elif cell.endswith(":") and not cell.startswith(":"):
 return str.rjust
 return str.center
query = sys.argv[1]
# For "cleaned" table entries:
rows = [[el.strip() for el in row.split('|')] for row in query.splitlines()]
# Max length of each "entry" is what we'll use
# to evenly space "columns" in the final table
col_widths = [max(map(len, column)) for column in zip(*rows)]
# Let's align entries as per intended formatting.
# Second line of input string contains alignment commmands:
alignments = map(get_alignment, rows[1])
# Prepare for output string:
out = []
for row in rows:
 for entry, width, align in zip(row, col_widths, alignments):
 if not width:
 continue
 out.append("| {} ".format(align(entry, width)))
 out.append("|\n")
query = "".join(out)
sys.stdout.write(query)

You should use query.splitlines() rather than query.split('\n'). Applied to 'test string\n' the former yield ['test string'] whereas the latter yield ['test string', ''] which is not that convenient to work with.
Returning str.ljust et al., was exactly what I was looking for, and unaware I could do; so cheers. Thanks for reminding me of map too :) @MathiasEttinger Originally I had used splitlines() but opted for StringIO for speed - no idea if this is justified.
@css As a wild gues I would say its not justified. Unless you had performances issues, ran a profiler and saw that the bottleneck was splitlines.

Stack Exchange Network

Source formatting of Markdown Table

Solution

Notes:

Example

The Script

2 Answers 2

You must log in to answer this question.

Linked

Hot Network Questions

Source formatting of Markdown Table

Solution

Notes:

Example

The Script

2 Answers 2

You must log in to answer this question.

Linked

Related

Hot Network Questions