To present some results I have had to write a short method to turn an array (a list of lists) into a nice representable LaTex table. It needed to have some nice formatting features and the configuration options of pandas.DataFrame.to_latex
were just not enough.
So I came up with this code. First is the Cell
class, which can format the values in different ways (a tuple can be either a range so should get a -
in between or a value, uncertainty pair and should be rounded and get a \$\pm\$ sign in between).
Next is the actual Latextable
class to do this. The array can either be a list of columns (default) or a list of rows (with transposed=true
).
import math
class Cell(str):
"""
A Cell of a LatexTable. Supports displaying values as
just str, a range or a rounded value with uncertainty.
Inherits from string so we can use it in `str.join` calls.
"""
def __new__(cls, value, type_):
try:
if type_ == "r":
# is a range
value = " -- ".join(map(str, value))
elif type_ == "v":
# Is a value with an uncertainty
# In the actual code some more sophisticated function that
# rounds value and uncertainty to the same precision
# and by some better rules is used here
# value = "{} $\\pm$ {}".format(*pdg_round(*value))
value = "{} $\\pm$ {}".format(round(value[0], 2), round(value[1], 2))
except TypeError:
# value is not iterable
pass
return str.__new__(cls, value)
class LatexTable:
"""
Make a pretty printing LaTex table.
Requires `\\usepackage{booktabs}`.
"""
def __init__(self, values, **kwargs):
"""
Args:
values: List of columns or list of rows (default: list of columns)
Keyword args:
transpose (bool): `values` is a list of rows
columns (list): List of column headers
alignments (str): a string denoting the alignemnts of each column.
Choices: (l, c, r)
Default: all l
types (str): a string denoting columns as:
'n' (normal, nothing),
'v' (value with uncertainty),
'r' (range) for nicer formatting.
Will by default use 'v', which rounds to the PDG
specification (only when a 2-tuple is passed)
top_rule (bool): Do not add a top rule
bottom_rule (bool): Do not add a bottom rule
"""
if kwargs.get('transpose', False):
self.n_cols = len(values[0])
self.rows = values
else:
self.n_cols = len(values)
self.rows = zip(*values)
self.columns = kwargs.get('columns', map(str, range(self.n_cols)))
if len(self.columns) != self.n_cols:
raise ValueError("columns does not have the same length as the values ({} for the values vs {} for the columns)".format(
self.n_cols, len(self.columns)))
self.alignments = kwargs.get('alignments', "l" * self.n_cols)
if len(self.alignments.replace("|", "")) != self.n_cols:
raise ValueError("alignments does not have the same length as the values ({} for the values vs {} for the alignments)".format(
self.n_cols, len(self.alignments)))
self.types = kwargs.get('types', "v" * self.n_cols)
if len(self.types) != self.n_cols:
raise ValueError("types does not have the same length as the values ({} for the values vs {} for the types)".format(
self.n_cols, len(self.types)))
self.preamble = ["\\begin{{tabular}}{{{}}}".format(
self.alignments)]
if kwargs.get('top_rule', True):
self.preamble.append("\\toprule")
self.postamble = ["\\end{tabular}"]
if kwargs.get('bottom_rule', True):
self.postamble = ["\\bottomrule"] + self.postamble
def __str__(self):
table = self.preamble[:]
table.append(" & ".join(self.columns) + "\\\\")
table.append("\\midrule")
for row in self.rows:
row = [Cell(x, t) for x, t in zip(row, self.types)]
table.append(" & ".join(row) + "\\\\")
table += self.postamble
return "\n".join(table)
The options are all in the generic **kwargs
, because in the beginning, values
was *values
, so there was no other way.. I could switch to def __init__(self, values, columns=None, transpose=False, alignments=None, types=None, top_rule=True, bottom_rule=True)
, but that feels even more cluttered than what I have now. Any thoughts on this would be welcome.
It can be used like this:
>>> table = LatexTable([[1,2,3], [4,5,6]])
>>> print table
\begin{tabular}{ll}
\toprule
0 & 1\\
\midrule
1 & 4\\
2 & 5\\
3 & 6\\
\bottomrule
\end{tabular}
Or, a more fancy one:
>>> fancy_table = LatexTable([[(0, 10), (10, 20), (20, 30)], [(213.5, 10), (3502, 297), (16343, 3133)]], columns=["Age", "x"], types="rv", alignments="cr")
>>> print table
\begin{tabular}{cr}
\toprule
Age & x\\
\midrule
0 -- 10 & 214 $\pm$ 10\\
10 -- 20 & 3500 $\pm$ 300\\
20 -- 30 & 16300 $\pm$ 3100\\
\bottomrule
\end{tabular}
Compiled it looks like this:
And for anybody interested, here are the unittests, exploring all the different options (I have not written the tests for Cell
yet, I know. Also the lines are longer than 120 characters here, and I don't care. I could be using multi-line strings, maybe...):
import unittest
from latex_table import LatexTable
class TestLatexTable(unittest.TestCase):
def test_simple(self):
"""Simplest table, two columns, three rows, no labels, alignments"""
table = LatexTable([[1, 2, 3], [4, 5, 6]])
self.assertEqual(str(table),
'\\begin{tabular}{ll}\n\\toprule\n0 & 1\\\\\n\\midrule\n1 & 4\\\\\n2 & 5\\\\\n3 & 6\\\\\n\\bottomrule\n\\end{tabular}')
def test_transpose(self):
"""Instead of passing a list of columns, we can also pass a list of rows"""
table1 = LatexTable([[1, 2, 3], [4, 5, 6]])
table2 = LatexTable([[1, 4], [2, 5], [3, 6]], transpose=True)
self.assertEqual(str(table1), str(table2))
def test_uncertainties_automatic(self):
"""Some values have uncertainties"""
table = LatexTable([[(1, 1), 2, 3], [4, (5, 0), 6]])
self.assertEqual(str(table),
'\\begin{tabular}{ll}\n\\toprule\n0 & 1\\\\\n\\midrule\n1.0 $\\pm$ 1.0 & 4\\\\\n2 & 5.0 $\\pm$ 0.0\\\\\n3 & 6\\\\\n\\bottomrule\n\\end{tabular}')
def test_manual_types(self):
"""But sometimes we really want to print a tuple in some column"""
table = LatexTable([[(1, 1), 2, 3], [4, (5, 0), 6]], types="rv")
self.assertEqual(str(table),
'\\begin{tabular}{ll}\n\\toprule\n0 & 1\\\\\n\\midrule\n1 -- 1 & 4\\\\\n2 & 5.0 $\\pm$ 0.0\\\\\n3 & 6\\\\\n\\bottomrule\n\\end{tabular}')
def test_types_length_mismatch(self):
"""Length of types must match the length of values"""
with self.assertRaises(ValueError):
table = LatexTable([[1, 2, 3], [4, 5, 6]], types="r")
def test_manual_alignment(self):
"""Manually set alignemnts of columns"""
table = LatexTable([[(1, 1), 2, 3], [(4, 4), 5, 6], [
(7, 7), 8, 9]], alignments="lcr")
self.assertEqual(str(table),
'\\begin{tabular}{lcr}\n\\toprule\n0 & 1 & 2\\\\\n\\midrule\n1.0 $\\pm$ 1.0 & 4 $\\pm$ 4 & 7 $\\pm$ 7\\\\\n2 & 5 & 8\\\\\n3 & 6 & 9\\\\\n\\bottomrule\n\\end{tabular}')
def test_alignments_mismatch(self):
"""Length of alignments must match the length of values"""
with self.assertRaises(ValueError):
table = LatexTable([[1, 2, 3], [4, 5, 6]], alignments="l")
def test_alignments_pipes_allowed(self):
"""Pipes, denoting vertical separators, are not counted when calculating the mismatch"""
table = LatexTable([[1, 2, 3], [4, 5, 6]], alignments="|l|r|")
self.assertEqual(str(table),
'\\begin{tabular}{|l|r|}\n\\toprule\n0 & 1\\\\\n\\midrule\n1 & 4\\\\\n2 & 5\\\\\n3 & 6\\\\\n\\bottomrule\n\\end{tabular}')
def test_columns(self):
"""Column names can be set"""
table = LatexTable([[1, 2, 3], [4, 5, 6]], columns=["A", "B"])
self.assertEqual(str(table),
'\\begin{tabular}{ll}\n\\toprule\nA & B\\\\\n\\midrule\n1 & 4\\\\\n2 & 5\\\\\n3 & 6\\\\\n\\bottomrule\n\\end{tabular}')
def test_columns_mismatch(self):
"""Length of columns must match the length of values"""
with self.assertRaises(ValueError):
table = LatexTable([[1, 2, 3], [4, 5, 6]], columns=["A"])
def test_zip_shortest(self):
"""When columns have differing lengths, it is cut off after the end of the shortest"""
table = LatexTable([[1, 2, 3], [4, 5]])
self.assertEqual(str(table),
'\\begin{tabular}{ll}\n\\toprule\n0 & 1\\\\\n\\midrule\n1 & 4\\\\\n2 & 5\\\\\n\\bottomrule\n\\end{tabular}')
def test_top_rule(self):
"""Can turn off the printing of the top-rule"""
table = LatexTable([[1, 2, 3], [4, 5, 6]], top_rule=False)
self.assertEqual(str(table),
'\\begin{tabular}{ll}\n0 & 1\\\\\n\\midrule\n1 & 4\\\\\n2 & 5\\\\\n3 & 6\\\\\n\\bottomrule\n\\end{tabular}')
def test_bottom_rule(self):
"""Can turn off the printing of the bottom-rule"""
table = LatexTable([[1, 2, 3], [4, 5, 6]], bottom_rule=False)
self.assertEqual(str(table),
'\\begin{tabular}{ll}\n\\toprule\n0 & 1\\\\\n\\midrule\n1 & 4\\\\\n2 & 5\\\\\n3 & 6\\\\\n\\end{tabular}')
if __name__ == "__main__":
unittest.main()
# suite = unittest.TestLoader().loadTestsFromTestCase(TestLatexTable)
# unittest.TextTestRunner(verbosity=2).run(suite)
I am looking for improvements of the actual class (maybe I should have put the Cell
class as an inner class?), but comments on how to make the unittests better are also very welcome.
1 Answer 1
The problem with inheriting from str
in the cell is that it gives you much more unwanted options than you might want. Your Cell
can be represented as a str
, but it's not a str
, which is the main idea behind inheritance. If we look at all the possible values of functions of str
, I think we can see it's an overkill to inherit str
. I think it'd be a better idea to override the __str__
function of your Cell
object.
The type_
parameter in Cell
could use more documentation, especially since it's a string. If I was to use your code, I couldn't figure out what I'm supposed to do with this parameter without reading your code. You should consider using docstring (even though I now realise Cell
is more of an "internal" class, I think it'd be useful).
Regarding the LatexTable
class, I would maybe consider separating the arguments validation from the actual table construction code this way :
def __init__(self, values, **kwargs):
if kwargs.get('transpose', False):
self.n_cols = len(values[0])
self.rows = values
else:
self.n_cols = len(values)
self.rows = zip(*values)
self.columns = kwargs.get('columns', map(str, range(self.n_cols)))
self.alignments = kwargs.get('alignments', "l" * self.n_cols)
self.types = kwargs.get('types', "v" * self.n_cols)
self.validate_parameters()
self.preamble = ["\\begin{{tabular}}{{{}}}".format(
self.alignments)]
if kwargs.get('top_rule', True):
self.preamble.append("\\toprule")
self.postamble = ["\\end{tabular}"]
if kwargs.get('bottom_rule', True):
self.postamble = ["\\bottomrule"] + self.postamble
def validate_parameters():
if len(self.columns) != self.n_cols:
raise ValueError("columns does not have the same length as the values ({} for the values vs {} for the columns)".format(
self.n_cols, len(self.columns)))
if len(self.alignments.replace("|", "")) != self.n_cols:
raise ValueError("alignments does not have the same length as the values ({} for the values vs {} for the alignments)".format(
self.n_cols, len(self.alignments)))
if len(self.types) != self.n_cols:
raise ValueError("types does not have the same length as the values ({} for the values vs {} for the types)".format(
self.n_cols, len(self.types)))
This way, the construction code is less cluttered with the validation, which shouldn't change much. I also re-arranged the code so that we "load" all the parameters from kwargs
at the beginning. We might argue that this way we do operations that aren't necessary if there's a validation error, but the cost is so small I think the increase in readability is worth it.
The one thing I would change about the unit tests are the methods names. I read a great book about unit testing (I sadly can't remember the name it's been years) but there's one thing that struck me and it was the methodology of naming your tests this way :
method_whatsbeingtested_expectedresult
This way, when a test fails, you can know right on what's going wrong and you basically don't have to read the unit test to understand where your code failed. Your code is small enough that it might not make a very big difference, but I think it's a good way of writing unit tests.
-
\$\begingroup\$ I always thought the methods need to start with
test_
in order forunittest
to automatically execute them? In addition, if a test fails, the input, the expected and the actual output are printed, so I see no need to also encode them in the method name \$\endgroup\$Graipher– Graipher2019年08月15日 15:18:31 +00:00Commented Aug 15, 2019 at 15:18 -
\$\begingroup\$ @Graipher Oops yes, obviously if your testing framework works with method prefixed with
test_
, you should put this. The book in question was for C#, but the point still stands. I understand your point about the output, but if your unit tests were executed by an external tool, it might be easier to just see which test failed instead of having to "dig" through multiple outputs. It also makes it easier to see what the test tests without having to read the code, which I think is great when it comes to seeing what's tested in the class. \$\endgroup\$IEatBagels– IEatBagels2019年08月15日 15:21:48 +00:00Commented Aug 15, 2019 at 15:21
**kwargs
[...] Any thoughts on this would be welcome." Use Python 3.5 wheredef __init__(self, *values, columns=None, ...)
is valid syntax? \$\endgroup\$