Print a nice Latex table from an array of values

Question 1

To present some results I have had to write a short method to turn an array (a list of lists) into a nice representable LaTex table. It needed to have some nice formatting features and the configuration options of pandas.DataFrame.to_latex were just not enough.

So I came up with this code. First is the Cell class, which can format the values in different ways (a tuple can be either a range so should get a - in between or a value, uncertainty pair and should be rounded and get a \$\pm\$ sign in between).

Next is the actual Latextable class to do this. The array can either be a list of columns (default) or a list of rows (with transposed=true).

import math
class Cell(str):
 """
 A Cell of a LatexTable. Supports displaying values as
 just str, a range or a rounded value with uncertainty.
 Inherits from string so we can use it in `str.join` calls.
 """
 def __new__(cls, value, type_):
 try:
 if type_ == "r":
 # is a range
 value = " -- ".join(map(str, value))
 elif type_ == "v":
 # Is a value with an uncertainty
 # In the actual code some more sophisticated function that 
 # rounds value and uncertainty to the same precision
 # and by some better rules is used here
 # value = "{} $\\pm$ {}".format(*pdg_round(*value))
 value = "{} $\\pm$ {}".format(round(value[0], 2), round(value[1], 2))
 except TypeError:
 # value is not iterable
 pass
 return str.__new__(cls, value)
class LatexTable:
 """
 Make a pretty printing LaTex table.
 Requires `\\usepackage{booktabs}`.
 """
 def __init__(self, values, **kwargs):
 """
 Args:
 values: List of columns or list of rows (default: list of columns)
 Keyword args:
 transpose (bool): `values` is a list of rows
 columns (list): List of column headers
 alignments (str): a string denoting the alignemnts of each column.
 Choices: (l, c, r)
 Default: all l
 types (str): a string denoting columns as:
 'n' (normal, nothing),
 'v' (value with uncertainty),
 'r' (range) for nicer formatting.
 Will by default use 'v', which rounds to the PDG
 specification (only when a 2-tuple is passed)
 top_rule (bool): Do not add a top rule
 bottom_rule (bool): Do not add a bottom rule
 """
 if kwargs.get('transpose', False):
 self.n_cols = len(values[0])
 self.rows = values
 else:
 self.n_cols = len(values)
 self.rows = zip(*values)
 self.columns = kwargs.get('columns', map(str, range(self.n_cols)))
 if len(self.columns) != self.n_cols:
 raise ValueError("columns does not have the same length as the values ({} for the values vs {} for the columns)".format(
 self.n_cols, len(self.columns)))
 self.alignments = kwargs.get('alignments', "l" * self.n_cols)
 if len(self.alignments.replace("|", "")) != self.n_cols:
 raise ValueError("alignments does not have the same length as the values ({} for the values vs {} for the alignments)".format(
 self.n_cols, len(self.alignments)))
 self.types = kwargs.get('types', "v" * self.n_cols)
 if len(self.types) != self.n_cols:
 raise ValueError("types does not have the same length as the values ({} for the values vs {} for the types)".format(
 self.n_cols, len(self.types)))
 self.preamble = ["\\begin{{tabular}}{{{}}}".format(
 self.alignments)]
 if kwargs.get('top_rule', True):
 self.preamble.append("\\toprule")
 self.postamble = ["\\end{tabular}"]
 if kwargs.get('bottom_rule', True):
 self.postamble = ["\\bottomrule"] + self.postamble
 def __str__(self):
 table = self.preamble[:]
 table.append(" & ".join(self.columns) + "\\\\")
 table.append("\\midrule")
 for row in self.rows:
 row = [Cell(x, t) for x, t in zip(row, self.types)]
 table.append(" & ".join(row) + "\\\\")
 table += self.postamble
 return "\n".join(table)

The options are all in the generic **kwargs, because in the beginning, values was *values, so there was no other way.. I could switch to def __init__(self, values, columns=None, transpose=False, alignments=None, types=None, top_rule=True, bottom_rule=True), but that feels even more cluttered than what I have now. Any thoughts on this would be welcome.

It can be used like this:

>>> table = LatexTable([[1,2,3], [4,5,6]])
>>> print table
\begin{tabular}{ll}
\toprule
0 & 1\\
\midrule
1 & 4\\
2 & 5\\
3 & 6\\
\bottomrule
\end{tabular}

Or, a more fancy one:

>>> fancy_table = LatexTable([[(0, 10), (10, 20), (20, 30)], [(213.5, 10), (3502, 297), (16343, 3133)]], columns=["Age", "x"], types="rv", alignments="cr")
>>> print table
\begin{tabular}{cr}
\toprule
Age & x\\
\midrule
0 -- 10 & 214 $\pm$ 10\\
10 -- 20 & 3500 $\pm$ 300\\
20 -- 30 & 16300 $\pm$ 3100\\
\bottomrule
\end{tabular}

Compiled it looks like this:

compiled Latex Table

And for anybody interested, here are the unittests, exploring all the different options (I have not written the tests for Cell yet, I know. Also the lines are longer than 120 characters here, and I don't care. I could be using multi-line strings, maybe...):

import unittest
from latex_table import LatexTable
class TestLatexTable(unittest.TestCase):
 def test_simple(self):
 """Simplest table, two columns, three rows, no labels, alignments"""
 table = LatexTable([[1, 2, 3], [4, 5, 6]])
 self.assertEqual(str(table),
 '\\begin{tabular}{ll}\n\\toprule\n0 & 1\\\\\n\\midrule\n1 & 4\\\\\n2 & 5\\\\\n3 & 6\\\\\n\\bottomrule\n\\end{tabular}')
 def test_transpose(self):
 """Instead of passing a list of columns, we can also pass a list of rows"""
 table1 = LatexTable([[1, 2, 3], [4, 5, 6]])
 table2 = LatexTable([[1, 4], [2, 5], [3, 6]], transpose=True)
 self.assertEqual(str(table1), str(table2))
 def test_uncertainties_automatic(self):
 """Some values have uncertainties"""
 table = LatexTable([[(1, 1), 2, 3], [4, (5, 0), 6]])
 self.assertEqual(str(table),
 '\\begin{tabular}{ll}\n\\toprule\n0 & 1\\\\\n\\midrule\n1.0 $\\pm$ 1.0 & 4\\\\\n2 & 5.0 $\\pm$ 0.0\\\\\n3 & 6\\\\\n\\bottomrule\n\\end{tabular}')
 def test_manual_types(self):
 """But sometimes we really want to print a tuple in some column"""
 table = LatexTable([[(1, 1), 2, 3], [4, (5, 0), 6]], types="rv")
 self.assertEqual(str(table),
 '\\begin{tabular}{ll}\n\\toprule\n0 & 1\\\\\n\\midrule\n1 -- 1 & 4\\\\\n2 & 5.0 $\\pm$ 0.0\\\\\n3 & 6\\\\\n\\bottomrule\n\\end{tabular}')
 def test_types_length_mismatch(self):
 """Length of types must match the length of values"""
 with self.assertRaises(ValueError):
 table = LatexTable([[1, 2, 3], [4, 5, 6]], types="r")
 def test_manual_alignment(self):
 """Manually set alignemnts of columns"""
 table = LatexTable([[(1, 1), 2, 3], [(4, 4), 5, 6], [
 (7, 7), 8, 9]], alignments="lcr")
 self.assertEqual(str(table),
 '\\begin{tabular}{lcr}\n\\toprule\n0 & 1 & 2\\\\\n\\midrule\n1.0 $\\pm$ 1.0 & 4 $\\pm$ 4 & 7 $\\pm$ 7\\\\\n2 & 5 & 8\\\\\n3 & 6 & 9\\\\\n\\bottomrule\n\\end{tabular}')
 def test_alignments_mismatch(self):
 """Length of alignments must match the length of values"""
 with self.assertRaises(ValueError):
 table = LatexTable([[1, 2, 3], [4, 5, 6]], alignments="l")
 def test_alignments_pipes_allowed(self):
 """Pipes, denoting vertical separators, are not counted when calculating the mismatch"""
 table = LatexTable([[1, 2, 3], [4, 5, 6]], alignments="|l|r|")
 self.assertEqual(str(table),
 '\\begin{tabular}{|l|r|}\n\\toprule\n0 & 1\\\\\n\\midrule\n1 & 4\\\\\n2 & 5\\\\\n3 & 6\\\\\n\\bottomrule\n\\end{tabular}')
 def test_columns(self):
 """Column names can be set"""
 table = LatexTable([[1, 2, 3], [4, 5, 6]], columns=["A", "B"])
 self.assertEqual(str(table),
 '\\begin{tabular}{ll}\n\\toprule\nA & B\\\\\n\\midrule\n1 & 4\\\\\n2 & 5\\\\\n3 & 6\\\\\n\\bottomrule\n\\end{tabular}')
 def test_columns_mismatch(self):
 """Length of columns must match the length of values"""
 with self.assertRaises(ValueError):
 table = LatexTable([[1, 2, 3], [4, 5, 6]], columns=["A"])
 def test_zip_shortest(self):
 """When columns have differing lengths, it is cut off after the end of the shortest"""
 table = LatexTable([[1, 2, 3], [4, 5]])
 self.assertEqual(str(table),
 '\\begin{tabular}{ll}\n\\toprule\n0 & 1\\\\\n\\midrule\n1 & 4\\\\\n2 & 5\\\\\n\\bottomrule\n\\end{tabular}')
 def test_top_rule(self):
 """Can turn off the printing of the top-rule"""
 table = LatexTable([[1, 2, 3], [4, 5, 6]], top_rule=False)
 self.assertEqual(str(table),
 '\\begin{tabular}{ll}\n0 & 1\\\\\n\\midrule\n1 & 4\\\\\n2 & 5\\\\\n3 & 6\\\\\n\\bottomrule\n\\end{tabular}')
 def test_bottom_rule(self):
 """Can turn off the printing of the bottom-rule"""
 table = LatexTable([[1, 2, 3], [4, 5, 6]], bottom_rule=False)
 self.assertEqual(str(table),
 '\\begin{tabular}{ll}\n\\toprule\n0 & 1\\\\\n\\midrule\n1 & 4\\\\\n2 & 5\\\\\n3 & 6\\\\\n\\end{tabular}')
if __name__ == "__main__":
 unittest.main()
 # suite = unittest.TestLoader().loadTestsFromTestCase(TestLatexTable)
 # unittest.TextTestRunner(verbosity=2).run(suite)

I am looking for improvements of the actual class (maybe I should have put the Cell class as an inner class?), but comments on how to make the unittests better are also very welcome.

Question 2

"The options are all in the generic **kwargs [...] Any thoughts on this would be welcome." Use Python 3.5 where def __init__(self, *values, columns=None, ...) is valid syntax?

Question 3

@MathiasEttinger Yeah, I would love to, but the rest of my code uses a module that does not work properly with Python 3.x (hence the Python 2.7 tag)...

Question 4

The problem with inheriting from str in the cell is that it gives you much more unwanted options than you might want. Your Cell can be represented as a str, but it's not a str, which is the main idea behind inheritance. If we look at all the possible values of functions of str, I think we can see it's an overkill to inherit str. I think it'd be a better idea to override the __str__ function of your Cell object.

The type_ parameter in Cell could use more documentation, especially since it's a string. If I was to use your code, I couldn't figure out what I'm supposed to do with this parameter without reading your code. You should consider using docstring (even though I now realise Cell is more of an "internal" class, I think it'd be useful).

Regarding the LatexTable class, I would maybe consider separating the arguments validation from the actual table construction code this way :

def __init__(self, values, **kwargs):
 if kwargs.get('transpose', False):
 self.n_cols = len(values[0])
 self.rows = values
 else:
 self.n_cols = len(values)
 self.rows = zip(*values)
 self.columns = kwargs.get('columns', map(str, range(self.n_cols)))
 self.alignments = kwargs.get('alignments', "l" * self.n_cols)
 self.types = kwargs.get('types', "v" * self.n_cols)
 self.validate_parameters()
 self.preamble = ["\\begin{{tabular}}{{{}}}".format(
 self.alignments)]
 if kwargs.get('top_rule', True):
 self.preamble.append("\\toprule")
 self.postamble = ["\\end{tabular}"]
 if kwargs.get('bottom_rule', True):
 self.postamble = ["\\bottomrule"] + self.postamble
def validate_parameters():
 if len(self.columns) != self.n_cols:
 raise ValueError("columns does not have the same length as the values ({} for the values vs {} for the columns)".format(
 self.n_cols, len(self.columns)))
 if len(self.alignments.replace("|", "")) != self.n_cols:
 raise ValueError("alignments does not have the same length as the values ({} for the values vs {} for the alignments)".format(
 self.n_cols, len(self.alignments)))
 if len(self.types) != self.n_cols:
 raise ValueError("types does not have the same length as the values ({} for the values vs {} for the types)".format(
 self.n_cols, len(self.types)))

This way, the construction code is less cluttered with the validation, which shouldn't change much. I also re-arranged the code so that we "load" all the parameters from kwargs at the beginning. We might argue that this way we do operations that aren't necessary if there's a validation error, but the cost is so small I think the increase in readability is worth it.

The one thing I would change about the unit tests are the methods names. I read a great book about unit testing (I sadly can't remember the name it's been years) but there's one thing that struck me and it was the methodology of naming your tests this way :

method_whatsbeingtested_expectedresult

This way, when a test fails, you can know right on what's going wrong and you basically don't have to read the unit test to understand where your code failed. Your code is small enough that it might not make a very big difference, but I think it's a good way of writing unit tests.

Question 5

I always thought the methods need to start with test_ in order for unittest to automatically execute them? In addition, if a test fails, the input, the expected and the actual output are printed, so I see no need to also encode them in the method name

Question 6

@Graipher Oops yes, obviously if your testing framework works with method prefixed with test_, you should put this. The book in question was for C#, but the point still stands. I understand your point about the output, but if your unit tests were executed by an external tool, it might be easier to just see which test failed instead of having to "dig" through multiple outputs. It also makes it easier to see what the test tests without having to read the code, which I think is great when it comes to seeing what's tested in the class.

IEatBagels IEatBagels 12.7k3 gold badges48 silver badges99 bronze badges · Answer 1 · 2019-08-15 14:56:18Z

The problem with inheriting from str in the cell is that it gives you much more unwanted options than you might want. Your Cell can be represented as a str, but it's not a str, which is the main idea behind inheritance. If we look at all the possible values of functions of str, I think we can see it's an overkill to inherit str. I think it'd be a better idea to override the __str__ function of your Cell object.

The type_ parameter in Cell could use more documentation, especially since it's a string. If I was to use your code, I couldn't figure out what I'm supposed to do with this parameter without reading your code. You should consider using docstring (even though I now realise Cell is more of an "internal" class, I think it'd be useful).

Regarding the LatexTable class, I would maybe consider separating the arguments validation from the actual table construction code this way :

def __init__(self, values, **kwargs):
 if kwargs.get('transpose', False):
 self.n_cols = len(values[0])
 self.rows = values
 else:
 self.n_cols = len(values)
 self.rows = zip(*values)
 self.columns = kwargs.get('columns', map(str, range(self.n_cols)))
 self.alignments = kwargs.get('alignments', "l" * self.n_cols)
 self.types = kwargs.get('types', "v" * self.n_cols)
 self.validate_parameters()
 self.preamble = ["\\begin{{tabular}}{{{}}}".format(
 self.alignments)]
 if kwargs.get('top_rule', True):
 self.preamble.append("\\toprule")
 self.postamble = ["\\end{tabular}"]
 if kwargs.get('bottom_rule', True):
 self.postamble = ["\\bottomrule"] + self.postamble
def validate_parameters():
 if len(self.columns) != self.n_cols:
 raise ValueError("columns does not have the same length as the values ({} for the values vs {} for the columns)".format(
 self.n_cols, len(self.columns)))
 if len(self.alignments.replace("|", "")) != self.n_cols:
 raise ValueError("alignments does not have the same length as the values ({} for the values vs {} for the alignments)".format(
 self.n_cols, len(self.alignments)))
 if len(self.types) != self.n_cols:
 raise ValueError("types does not have the same length as the values ({} for the values vs {} for the types)".format(
 self.n_cols, len(self.types)))

This way, the construction code is less cluttered with the validation, which shouldn't change much. I also re-arranged the code so that we "load" all the parameters from kwargs at the beginning. We might argue that this way we do operations that aren't necessary if there's a validation error, but the cost is so small I think the increase in readability is worth it.

The one thing I would change about the unit tests are the methods names. I read a great book about unit testing (I sadly can't remember the name it's been years) but there's one thing that struck me and it was the methodology of naming your tests this way :

method_whatsbeingtested_expectedresult

This way, when a test fails, you can know right on what's going wrong and you basically don't have to read the unit test to understand where your code failed. Your code is small enough that it might not make a very big difference, but I think it's a good way of writing unit tests.

I always thought the methods need to start with test_ in order for unittest to automatically execute them? In addition, if a test fails, the input, the expected and the actual output are printed, so I see no need to also encode them in the method name
@Graipher Oops yes, obviously if your testing framework works with method prefixed with test_, you should put this. The book in question was for C#, but the point still stands. I understand your point about the output, but if your unit tests were executed by an external tool, it might be easier to just see which test failed instead of having to "dig" through multiple outputs. It also makes it easier to see what the test tests without having to read the code, which I think is great when it comes to seeing what's tested in the class.

Stack Exchange Network

Print a nice Latex table from an array of values

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Print a nice Latex table from an array of values

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions