Is there a way to refactor the following Python code so that the number of lines in generation of the dictionary can be reduced?
data = frame['data'].split(';')
cat_data = {
'xbee_address': '\x00' + frame['source_addr'],
'state': data[0],
'flow_rate': data[1],
'operating_hours': data[2],
'operating_minutes': data[3],
'operating_seconds': data[4],
'hours_until_maintenance': data[5],
'number_of_cleansing': data[6],
'serial_number': data[7],
'item_number': data[8],
'month_of_manufacturing': data[9],
'year_of_manufacturing': data[10],
'software_version': data[11],
}
2 Answers 2
TL;DR: use {k: v for k, v in zip(names, data)}
where names
is a list of names for your values, or dict(zip(names, data))
(which I personally find less readable).
There are two sources of complexity in your code (and I mean human-percieved complexity, not computational complexity):
details, which hide the big picture of what's going on;
the fact that the values in
data
have names, and those names depend on the order of thedata
list.
The 1st is easy to solve, just split everything into functions, each of which does one thing only, on the appropriate level of abstraction.
The 2nd is more tricky, since even if you define a tuple of names for your values, there's still an implicit relationship left between the order of values in the frame['data']
string, and the order of values in your tuple. Those kinds of implicit relations are not obvious to someone who doesn't know as much as you do about your code (i.e. to everyone else).
Imagine that the structure of the string in frame['data']
changes. Then, your code will be broken, probably in a very unobvious way, and in a totally unrelated place. It's stuff like this that creates spaghetti code.
I don't know where the frame
comes from, so it's hard to give a good advice here. In general, you can pack the frame
and the tuple of names into another data structure where you create the frame
, and never use the frame
alone. Or, make the tuple of names a part of frame
. In any case, you want to keep them bound together.
Or, you may encapsulate frame
's contents in a class. This class can have a method which returns data in the appropriate dictionary form. The order of the names can be a private variable, encapsulated inside the class.
If I had to stay with frame
as it is, I'd do something like this (the names of the functions can be chosen much better, I don't know enough context to come up with good names):
NAMES = (
'state',
'flow_rate',
'operating_hours',
)
def get_cat_data(frame):
data = _get_data(frame)
base = _data_to_dict(data, NAMES)
base.update(_get_xbee_address(frame))
return base
def _get_data(frame):
return frame['data'].split(';')
def _data_to_dict(data, names):
return {k: v for k, v in zip(names, data)}
def _get_xbee_address(frame):
return {'xbee_address': '\x00' + frame['source_addr']} #consider naming the string literal and using a constant instead
I hope this helps.
-
\$\begingroup\$ Yes this helps. Unfortunately i cannot change 'frame' since it is a string of numbers that is coming from a zigbee device as payload data \$\endgroup\$Simon Kemper– Simon Kemper2016年10月14日 14:27:29 +00:00Commented Oct 14, 2016 at 14:27
Yes, consider creating a tuple (or list, does not really matter her) of all your different fields.
Then you can just iterate over that tuple and use enumerate
to get the index of data it corresponds to. Then you can put this into a dictionary-comprehension to build your dictionary.
At the end we just need to manually set the only field not given by an element of data
, 'xbee_address'
:
fields = 'state', ..., 'software_version'
data = frame['data'].split(';')
cat_data = {field: data[i] for i, field in enumerate(fields)}
cat_data['xbee_address'] = '\x00' + frame['source_addr']
Another possibility would be to use the built-in zip
:
cat_data = dict(zip(fields, data))
with the rest as above. This will also work if data
is longer than fields
, because zip
stops at the end of the shorter iterable.
Both versions only work if there are no holes in data
, so e.g. this is more complicated to replicate in this way:
cat_data = {
'xbee_address': '\x00' + frame['source_addr'],
'state': data[0],
'hours_until_maintenance': data[5],
'software_version': data[11],
}