I've got a list of dicts, data
, that updates periodically and is persisted to a database. Every time data
is updated, the gui needs to display the collated data in a table format. The columns of the table, in order, are TEST, RANGE and the dates of all possible dates that are represented in data
.
test
names may contain multiple words, upper and lower case characters, non alphanumeric characters and whitespace.
The range
key may contain None
, an empty string " ", or a 2 tuple of strings.
The rows of the table should be sorted alphabetically.
The date columns should be ordered chronologically as well.
Example of data in:
data = [
{'date': '201410171311', 'test': 'Test b', 'value': 30, 'range': ('10', '30')},
{'date': '201610310152', 'test': 'Test A', 'value': 6, 'range': '<=10'},
{'date': '201410171311', 'test': 'Test A', 'value': 8, 'range': '<=10'},
{'date': '201702062358', 'test': 'Test b', 'value': 15, 'range': ('10', '30')},
{'date': '201610310152', 'test': 'Test b', 'value': 20, 'range': ('10', '30')},
{'date': '201402162358', 'test': 'Test A', 'value': 3, 'range': '<=10'},
{'date': '201510171311', 'test': 'Test b', 'value': 45, 'range': ('10', '30')},
]
Example of data structure used in the app:
collated_table_data = [
OrderedDict([
('TEST', 'Test A'),
('RANGE', '<=10'),
('Feb. 16, 2014', 3),
('Oct. 17, 2014', 8),
('Oct. 17, 2015', None),
('Oct. 31, 2016', 6),
('Feb. 6, 2017', None)
]),
OrderedDict([
('TEST', 'Test b'),
('RANGE', '10 - 30'),
('Feb. 16, 2014', None),
('Oct. 17, 2014', 30),
('Oct. 17, 2015', 45),
('Oct. 31, 2016', 20),
('Feb. 6, 2017', 15)])
]
My process seems convoluted and confusing. I had a hard time figuring out what I did after a few months away. I don't like how I manage dates. It seems like I could sort by int('201410171311')
but that seemed to have some other consequences.
I'm not crazy about the efficiency of the code. I need to continually collate the data as it gets updated, so it gets called frequently. But speed hasn't been an issue so far.
Rows and columns need to be sorted by name and date.
from collections import OrderedDict
from datetime import datetime
from tabulate import tabulate
def dateobj(s):
"""
Returns a date object from a string format of '201702270153'
"""
return datetime(int(s[:4]), int(s[4:6]), int(s[6:8]))
def datestr(dt):
"""
Returns a readable date string from a datetime object.
"""
return dt.strftime('%b. %-d, %Y')
def collate_results(data):
# Create datetime objects for easier sorting.
dates = sorted(set([dateobj(t['date']) for t in data]))
# Convert to readable format.
dates = [datestr(d) for d in dates]
column_names = ['TEST', 'RANGE'] + dates
test_names = set([t['test'] for t in data])
# Case insensitive sort. Test names may contain non alpha chars.
test_names = sorted(list(test_names),
key=lambda s: ''.join(ch.lower() for ch in s if ch.isalpha()))
rows = []
for name in test_names:
name_tests = [t for t in data if t['test'] == name]
row = OrderedDict.fromkeys(column_names, None)
for item in name_tests:
row['TEST'] = item['test']
if type(item['range']) is tuple:
row['RANGE'] = item['range'][0] + ' - ' + item['range'][1]
elif type(item['range']) is str:
row['RANGE'] = item['range']
else:
row['RANGE'] = None
# item['date'] is in a string like, '201410171311'
# So, it needs to be converted and associated with the correct column.
row[datestr(dateobj(item['date']))] = item['value']
rows.append(row)
return rows
if __name__ == '__main__':
data = [
{'date': '201410171311', 'test': 'Test b', 'value': 30, 'range': ('10', '30')},
{'date': '201610310152', 'test': 'Test A', 'value': 6, 'range': '<=10'},
{'date': '201410171311', 'test': 'Test A', 'value': 8, 'range': '<=10'},
{'date': '201702062358', 'test': 'Test b', 'value': 15, 'range': ('10', '30')},
{'date': '201610310152', 'test': 'Test b', 'value': 20, 'range': ('10', '30')},
{'date': '201402162358', 'test': 'Test A', 'value': 3, 'range': '<=10'},
{'date': '201510171311', 'test': 'Test b', 'value': 45, 'range': ('10', '30')},
]
collated_table_data = [
OrderedDict([
('TEST', 'Test A'),
('RANGE', '<=10'),
('Feb. 16, 2014', 3),
('Oct. 17, 2014', 8),
('Oct. 17, 2015', None),
('Oct. 31, 2016', 6),
('Feb. 6, 2017', None)
]),
OrderedDict([
('TEST', 'Test b'),
('RANGE', '10 - 30'),
('Feb. 16, 2014', None),
('Oct. 17, 2014', 30),
('Oct. 17, 2015', 45),
('Oct. 31, 2016', 20),
('Feb. 6, 2017', 15)])
]
table = collate_results(data)
print table == expected
print tabulate(table, headers="keys")
Output:
True
TEST RANGE Feb. 16, 2014 Oct. 17, 2014 Oct. 17, 2015 Oct. 31, 2016 Feb. 6, 2017
------ ------- --------------- --------------- --------------- --------------- --------------
Test A <=10 3 8 6
Test b 10 - 30 30 45 20 15
1 Answer 1
You offered a "specification" in the form of example input / output. Your specification is far from clear. A few motivating sentences about the business problem would be helpful.
return datetime(int(s[:4]), int(s[4:6]), int(s[6:8]))
One might use strptime()
instead, but I suppose this is clear enough. Thank you for the useful comment. Rather than dateobj()
, one might name this parse_iso8601
. To be very picky about the (quite helpful!) docstring, PEP8 would like to see it indented 4 fewer spaces, and end with a period.
dates = sorted(set([dateobj(t['date']) for t in data]))
This makes perfect sense. The set
will discard duplicate dates. It isn't obvious to me that "discard dups" is part of your specification. Another way to phrase it would be sorted(list(...
, which does not discard dups. I honestly don't know which one better addresses the business need. If set
is appropriate, you could use {}
rather than []
.
dates = [datestr(d) for d in dates]
I don't find that this improves clarity. I'd rather see sorted(set([datestr(dateobj(t['date']))...
above.
You have a pair of very nice, very clear date functions. I don't know, maybe you'd prefer to combine them into a single function, which the list comprehension calls?
test_names = set([t['test'] for t in data])
Same remark, feel free to use {}
for set comprehension. No need to break this out as a separate line, you could merge it into the next statement, which currently creates list, then set, then list, then sorts it. The lambda expression seems more expensive, and harder to understand, than merely s.lower()
. Your test data would be more interesting if it instead mentioned 'Test B' and 'Test a'.
The identifier name_tests
appears to be misnamed. I think you meant something like rows
, or filtered_rows
, or (though I'm not crazy about the vague data
name) filtered_data
.
row = OrderedDict.fromkeys(column_names, None)
No need to specify the default of None. No biggie.
else:
row['RANGE'] = None
I don't understand what this means, in the sense that your specification and your sample data shed no light on this case.
row[datestr(dateobj(item['date']))] = item['value']
Again, this seems to motivate merging your two date functions.
print table == expected
This is a perfectly nice way to express it. A similar idiom would be: assert expected == table, table
. That way, if the equality test fails, you get to see the offending table result.
from tabulate import tabulate
This is an interesting pypi library module. Thank you for teaching me about it. PEP8 asks that you highlight that it's not a standard library by inserting a blank line between the datetime and tabulate imports.
Feel free to say print(x)
instead of print x
, so python3 can interpret your program, as well.
-
\$\begingroup\$ My memory is a bit fuzzy as to the reason not to use
s.lower()
. I want to say it failed with an error or did not produce the desired order of the tests. (I don't have the real data in front of me to test). I also might have been bent on practicing lambdas. Thanks for the feed back. \$\endgroup\$Mox– Mox2017年09月09日 01:34:29 +00:00Commented Sep 9, 2017 at 1:34
table == expected
part, should we readtable == collated_table_data
instead? \$\endgroup\$