Collating list of dicts to table format

Question 1

I've got a list of dicts, data, that updates periodically and is persisted to a database. Every time data is updated, the gui needs to display the collated data in a table format. The columns of the table, in order, are TEST, RANGE and the dates of all possible dates that are represented in data.

test names may contain multiple words, upper and lower case characters, non alphanumeric characters and whitespace.

The range key may contain None, an empty string " ", or a 2 tuple of strings.

The rows of the table should be sorted alphabetically.

The date columns should be ordered chronologically as well.

Example of data in:

data = [
 {'date': '201410171311', 'test': 'Test b', 'value': 30, 'range': ('10', '30')},
 {'date': '201610310152', 'test': 'Test A', 'value': 6, 'range': '<=10'},
 {'date': '201410171311', 'test': 'Test A', 'value': 8, 'range': '<=10'},
 {'date': '201702062358', 'test': 'Test b', 'value': 15, 'range': ('10', '30')},
 {'date': '201610310152', 'test': 'Test b', 'value': 20, 'range': ('10', '30')},
 {'date': '201402162358', 'test': 'Test A', 'value': 3, 'range': '<=10'},
 {'date': '201510171311', 'test': 'Test b', 'value': 45, 'range': ('10', '30')},
]

Example of data structure used in the app:

collated_table_data = [
 OrderedDict([
 ('TEST', 'Test A'),
 ('RANGE', '<=10'),
 ('Feb. 16, 2014', 3),
 ('Oct. 17, 2014', 8),
 ('Oct. 17, 2015', None),
 ('Oct. 31, 2016', 6),
 ('Feb. 6, 2017', None)
 ]),
 OrderedDict([
 ('TEST', 'Test b'),
 ('RANGE', '10 - 30'),
 ('Feb. 16, 2014', None),
 ('Oct. 17, 2014', 30),
 ('Oct. 17, 2015', 45),
 ('Oct. 31, 2016', 20),
 ('Feb. 6, 2017', 15)])
]

My process seems convoluted and confusing. I had a hard time figuring out what I did after a few months away. I don't like how I manage dates. It seems like I could sort by int('201410171311') but that seemed to have some other consequences.

I'm not crazy about the efficiency of the code. I need to continually collate the data as it gets updated, so it gets called frequently. But speed hasn't been an issue so far.

Rows and columns need to be sorted by name and date.

from collections import OrderedDict
from datetime import datetime
from tabulate import tabulate
def dateobj(s):
 """
 Returns a date object from a string format of '201702270153'
 """
 return datetime(int(s[:4]), int(s[4:6]), int(s[6:8]))
def datestr(dt):
 """
 Returns a readable date string from a datetime object.
 """
 return dt.strftime('%b. %-d, %Y')
def collate_results(data):
 # Create datetime objects for easier sorting.
 dates = sorted(set([dateobj(t['date']) for t in data]))
 # Convert to readable format.
 dates = [datestr(d) for d in dates]
 column_names = ['TEST', 'RANGE'] + dates
 test_names = set([t['test'] for t in data])
 # Case insensitive sort. Test names may contain non alpha chars.
 test_names = sorted(list(test_names),
 key=lambda s: ''.join(ch.lower() for ch in s if ch.isalpha()))
 rows = []
 for name in test_names:
 name_tests = [t for t in data if t['test'] == name]
 row = OrderedDict.fromkeys(column_names, None)
 for item in name_tests:
 row['TEST'] = item['test']
 if type(item['range']) is tuple:
 row['RANGE'] = item['range'][0] + ' - ' + item['range'][1]
 elif type(item['range']) is str:
 row['RANGE'] = item['range']
 else:
 row['RANGE'] = None
 # item['date'] is in a string like, '201410171311'
 # So, it needs to be converted and associated with the correct column.
 row[datestr(dateobj(item['date']))] = item['value']
 rows.append(row)
 return rows
if __name__ == '__main__':
 data = [
 {'date': '201410171311', 'test': 'Test b', 'value': 30, 'range': ('10', '30')},
 {'date': '201610310152', 'test': 'Test A', 'value': 6, 'range': '<=10'},
 {'date': '201410171311', 'test': 'Test A', 'value': 8, 'range': '<=10'},
 {'date': '201702062358', 'test': 'Test b', 'value': 15, 'range': ('10', '30')},
 {'date': '201610310152', 'test': 'Test b', 'value': 20, 'range': ('10', '30')},
 {'date': '201402162358', 'test': 'Test A', 'value': 3, 'range': '<=10'},
 {'date': '201510171311', 'test': 'Test b', 'value': 45, 'range': ('10', '30')},
 ]
 collated_table_data = [
 OrderedDict([
 ('TEST', 'Test A'),
 ('RANGE', '<=10'),
 ('Feb. 16, 2014', 3),
 ('Oct. 17, 2014', 8),
 ('Oct. 17, 2015', None),
 ('Oct. 31, 2016', 6),
 ('Feb. 6, 2017', None)
 ]),
 OrderedDict([
 ('TEST', 'Test b'),
 ('RANGE', '10 - 30'),
 ('Feb. 16, 2014', None),
 ('Oct. 17, 2014', 30),
 ('Oct. 17, 2015', 45),
 ('Oct. 31, 2016', 20),
 ('Feb. 6, 2017', 15)])
 ]
 table = collate_results(data)
 print table == expected
 print tabulate(table, headers="keys")

Output:

 True
 TEST RANGE Feb. 16, 2014 Oct. 17, 2014 Oct. 17, 2015 Oct. 31, 2016 Feb. 6, 2017
 ------ ------- --------------- --------------- --------------- --------------- --------------
 Test A <=10 3 8 6
 Test b 10 - 30 30 45 20 15

Question 2

Sorry, that was a botched copy/paste. Fixed now.

Question 3

And for the table == expected part, should we read table == collated_table_data instead?

Question 4

You offered a "specification" in the form of example input / output. Your specification is far from clear. A few motivating sentences about the business problem would be helpful.

return datetime(int(s[:4]), int(s[4:6]), int(s[6:8]))

One might use strptime() instead, but I suppose this is clear enough. Thank you for the useful comment. Rather than dateobj(), one might name this parse_iso8601. To be very picky about the (quite helpful!) docstring, PEP8 would like to see it indented 4 fewer spaces, and end with a period.

dates = sorted(set([dateobj(t['date']) for t in data]))

This makes perfect sense. The set will discard duplicate dates. It isn't obvious to me that "discard dups" is part of your specification. Another way to phrase it would be sorted(list(..., which does not discard dups. I honestly don't know which one better addresses the business need. If set is appropriate, you could use {} rather than [].

dates = [datestr(d) for d in dates]

I don't find that this improves clarity. I'd rather see sorted(set([datestr(dateobj(t['date']))... above.

You have a pair of very nice, very clear date functions. I don't know, maybe you'd prefer to combine them into a single function, which the list comprehension calls?

test_names = set([t['test'] for t in data])

Same remark, feel free to use {} for set comprehension. No need to break this out as a separate line, you could merge it into the next statement, which currently creates list, then set, then list, then sorts it. The lambda expression seems more expensive, and harder to understand, than merely s.lower(). Your test data would be more interesting if it instead mentioned 'Test B' and 'Test a'.

The identifier name_tests appears to be misnamed. I think you meant something like rows, or filtered_rows, or (though I'm not crazy about the vague data name) filtered_data.

row = OrderedDict.fromkeys(column_names, None)

No need to specify the default of None. No biggie.

else:
 row['RANGE'] = None

I don't understand what this means, in the sense that your specification and your sample data shed no light on this case.

row[datestr(dateobj(item['date']))] = item['value']

Again, this seems to motivate merging your two date functions.

print table == expected

This is a perfectly nice way to express it. A similar idiom would be: assert expected == table, table. That way, if the equality test fails, you get to see the offending table result.

from tabulate import tabulate

This is an interesting pypi library module. Thank you for teaching me about it. PEP8 asks that you highlight that it's not a standard library by inserting a blank line between the datetime and tabulate imports.

Feel free to say print(x) instead of print x, so python3 can interpret your program, as well.

Question 5

My memory is a bit fuzzy as to the reason not to use s.lower(). I want to say it failed with an error or did not produce the desired order of the tests. (I don't have the real data in front of me to test). I also might have been bent on practicing lambdas. Thanks for the feed back.

J_H J_H 41.6k3 gold badges38 silver badges157 bronze badges · Accepted Answer · 2017-09-09 00:23:03Z

You offered a "specification" in the form of example input / output. Your specification is far from clear. A few motivating sentences about the business problem would be helpful.

return datetime(int(s[:4]), int(s[4:6]), int(s[6:8]))

One might use strptime() instead, but I suppose this is clear enough. Thank you for the useful comment. Rather than dateobj(), one might name this parse_iso8601. To be very picky about the (quite helpful!) docstring, PEP8 would like to see it indented 4 fewer spaces, and end with a period.

dates = sorted(set([dateobj(t['date']) for t in data]))

This makes perfect sense. The set will discard duplicate dates. It isn't obvious to me that "discard dups" is part of your specification. Another way to phrase it would be sorted(list(..., which does not discard dups. I honestly don't know which one better addresses the business need. If set is appropriate, you could use {} rather than [].

dates = [datestr(d) for d in dates]

I don't find that this improves clarity. I'd rather see sorted(set([datestr(dateobj(t['date']))... above.

You have a pair of very nice, very clear date functions. I don't know, maybe you'd prefer to combine them into a single function, which the list comprehension calls?

test_names = set([t['test'] for t in data])

Same remark, feel free to use {} for set comprehension. No need to break this out as a separate line, you could merge it into the next statement, which currently creates list, then set, then list, then sorts it. The lambda expression seems more expensive, and harder to understand, than merely s.lower(). Your test data would be more interesting if it instead mentioned 'Test B' and 'Test a'.

The identifier name_tests appears to be misnamed. I think you meant something like rows, or filtered_rows, or (though I'm not crazy about the vague data name) filtered_data.

row = OrderedDict.fromkeys(column_names, None)

No need to specify the default of None. No biggie.

else:
 row['RANGE'] = None

I don't understand what this means, in the sense that your specification and your sample data shed no light on this case.

row[datestr(dateobj(item['date']))] = item['value']

Again, this seems to motivate merging your two date functions.

print table == expected

This is a perfectly nice way to express it. A similar idiom would be: assert expected == table, table. That way, if the equality test fails, you get to see the offending table result.

from tabulate import tabulate

This is an interesting pypi library module. Thank you for teaching me about it. PEP8 asks that you highlight that it's not a standard library by inserting a blank line between the datetime and tabulate imports.

Feel free to say print(x) instead of print x, so python3 can interpret your program, as well.

My memory is a bit fuzzy as to the reason not to use s.lower(). I want to say it failed with an error or did not produce the desired order of the tests. (I don't have the real data in front of me to test). I also might have been bent on practicing lambdas. Thanks for the feed back.

Stack Exchange Network

Collating list of dicts to table format

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Collating list of dicts to table format

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions