I use MultiMarkdown tables more than I thought I ever would. In my .md
files, I would like the source table code to be neat, evenly spaced, and resemble (in an alignment sense) the final HTML
rendered table.
Solution
Select the source of the table, press a hotkey on my computer, replace the untidy table with the tidy table.
Notes:
- This script (macro?) is for my own use; I will always write the source table the same way
- The script is written for Python 2.7
- I use a Mac app that allows me to assign a hot key to an action (e.g. a script), and the following seems to work with that at the moment!
Example
I write something like the following:
| Header 1 | Header 2 | ... | Header m |
| :--- | :--: | :--: | --: |
| $a_{11}$ | $a_{12}$ | ... | $a_{1m}$ |
| ... | ... | ... | ... |
| $a_{n1}$ | $a_{n2}$ | ... | $a_{nm} |
select it, press my hot key, and it is replaced by something like this (in the source):
| Header 1 | Header 2 | ... | Header m |
| :--- | :--: | :--: | --: |
| $a_{11}$ | $a_{12}$ | ... | $a_{1m}$ |
| ... | ... | ... | ... |
| $a_{n1}$ | $a_{n2}$ | ... | $a_{nm}$ |
As mentioned, the script seems to work.
How can I improve the script?
I would really appreciate some criticism of the script I've written. I am not a coder, but try on occasion to write little tool type things like this.
The Script
#!/usr/bin/env python
"""MMD Table Formatter.
Silly script that takes a MMD table as a
string and returns a tidied version of the table
"""
import sys
import StringIO
query = sys.argv[1]
# For "cleaned" table entries:
rows = []
# This NEEDS TO BE CLOSED AFTER!!
s_obj = StringIO.StringIO(query)
# Clean the entries:
for line in s_obj:
l = line.split('|')
rows.append([entry.strip() for entry in l])`enter code here`
# CLOSE
s_obj.close()
# Max length of each "entry" is what we'll use
# to evenly space "columns" in the final table
cols = zip(*rows)
col_widths = []
for columns in cols:
row_widths = map(lambda x: len(x), columns)
col_widths.append(max(row_widths))
# Let's align entries as per intended formatting.
# Second line of input string contains alignment commmands:
# ":---" left aligned
# "---:" right aligned
# ":--:" centered (also accepts "---")
alignment = []
for r in rows[1]:
if r.startswith(":") and not r.endswith(":"):
alignment.append("lalign")
elif r.endswith(":") and not r.startswith(":"):
alignment.append("ralign")
else:
alignment.append("centered")
# Prepare for output string:
out = []
for row in rows:
for entry_and_width in zip(row, col_widths, alignment):
if entry_and_width[1] == 0:
continue
if entry_and_width[2] == "centered":
outstring = "| " + entry_and_width[0].center(entry_and_width[1]) + ' '
out.append(outstring)
if entry_and_width[2] == "lalign":
outstring = "| " + entry_and_width[0].ljust(entry_and_width[1]) + ' '
out.append(outstring)
if entry_and_width[2] == "ralign":
outstring = "| " + entry_and_width[0].rjust(entry_and_width[1]) + ' '
out.append(outstring)
out.append("|\n")
query = "".join(out)
sys.stdout.write(query)
-
\$\begingroup\$ Note that a similar question was asked a few days ago. You might get some insights there. \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2016年10月26日 09:49:30 +00:00Commented Oct 26, 2016 at 9:49
2 Answers 2
As far as I can see, there is no need for the StringIO
(?).
You can just use query.split('\n')
. Regardless, that for loop can be condensed into a list comprehension:
rows = [[el.strip() for el in row.split('|')] for row in query.splitlines()]
If the StringIO
is really needed, I would use with..as
:
with StringIO.StringIO(query) as s_obj:
rows = [[el.strip() for el in row.split('|')] for row in s_obj]
For the column widths you can use that len
is already a function, so there is no need for lambda x: len(x)
. This way you can also inline all of it into one list comprehension:
col_widths = [max(map(len, column)) for column in zip(*rows)]
For the alignments, I would define a function that returns the alignment, given the content of a cell. First I had this function return your strings. But then I realized that you are already using entry.ljust
further down. I then changed it to return the function to use here. Note that str.ljust("ab", 2)
and "ab".ljust(2)
are equivalent, so later we just call align(entry, width)
.
def get_alignment(cell):
""""
:---" left aligned
"---:" right aligned
":--:" centered (also accepts "---"), default
"""
if cell.startswith(":") and not cell.endswith(":"):
return str.ljust
elif cell.endswith(":") and not cell.startswith(":"):
return str.rjust
return str.center
To get all alignments, we just use map
again:
alignments = map(get_alignment, rows[1])
Finally, the output part. Since you already build a nice tuple with zip
, you should use tuple unpacking to give the element readable names. entry_and_widths
is a confusing name, especially since it also contains the alignments!
Here we can now get rid of most of your code, since it boils down to:
entry = align(entry, width)
out.append("| {} ".format(entry)
Here I used str.format
to make it a bit easier and avoid the costly string addition. Note that we could also use str.format
to do the adjusting for us (using e.g. "{>2}".format(entry)
instead of "{}".format(str.rjust(entry, 2))
, but that would mean nesting formats, which starts to get ugly very quickly.
I also use the fact that 0
compares to False
to make the code skipping a column if it is empty shorter.
out = []
for row in rows:
for entry, width, align in zip(row, col_widths, alignments):
if not width:
continue
out.append("| {} ".format(align(entry, width)))
out.append("|\n")
query = "".join(out)
Final code:
#!/usr/bin/env python
"""MMD Table Formatter.
Silly script that takes a MMD table as a
string and returns a tidied version of the table
"""
import sys
def get_alignment(cell):
""""
:---" left aligned
"---:" right aligned
":--:" centered (also accepts "---"), default
"""
if cell.startswith(":") and not cell.endswith(":"):
return str.ljust
elif cell.endswith(":") and not cell.startswith(":"):
return str.rjust
return str.center
query = sys.argv[1]
# For "cleaned" table entries:
rows = [[el.strip() for el in row.split('|')] for row in query.splitlines()]
# Max length of each "entry" is what we'll use
# to evenly space "columns" in the final table
col_widths = [max(map(len, column)) for column in zip(*rows)]
# Let's align entries as per intended formatting.
# Second line of input string contains alignment commmands:
alignments = map(get_alignment, rows[1])
# Prepare for output string:
out = []
for row in rows:
for entry, width, align in zip(row, col_widths, alignments):
if not width:
continue
out.append("| {} ".format(align(entry, width)))
out.append("|\n")
query = "".join(out)
sys.stdout.write(query)
-
\$\begingroup\$ You should use
query.splitlines()
rather thanquery.split('\n')
. Applied to'test string\n'
the former yield['test string']
whereas the latter yield['test string', '']
which is not that convenient to work with. \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2016年10月26日 09:52:20 +00:00Commented Oct 26, 2016 at 9:52 -
\$\begingroup\$ Returning
str.ljust
et al., was exactly what I was looking for, and unaware I could do; so cheers. Thanks for reminding me ofmap
too :) @MathiasEttinger Originally I had usedsplitlines()
but opted forStringIO
for speed - no idea if this is justified. \$\endgroup\$c ss– c ss2016年10月26日 10:18:23 +00:00Commented Oct 26, 2016 at 10:18 -
1\$\begingroup\$ @css As a wild gues I would say its not justified. Unless you had performances issues, ran a profiler and saw that the bottleneck was
splitlines
. \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2016年10月26日 10:20:13 +00:00Commented Oct 26, 2016 at 10:20
To complete on @Graipher's answer, I would simplify the formating of the rows using list-comprehensions/generator expressions rather than building a list to feed into join
:
query = '\n'.join(
'| {} |'.format( # Build a row composed of an inner part between delimiters
' | '.join(align(entry, width)
for entry, width, align in zip(row, col_widths, alignments)))
for row in rows)