I am developing a framework that allows to specify a machine learning model via a yaml
file with different parameters nested so the configuration files are easy to read for humans.
I would like to give users the option of instead of specifying a parameter giving a range of options to try via a list.
Then I have to take this and generate all the possible valid combinations for the parameters the user has given multiple values for.
To mark which parameters are in fact lists and which ones are multiple values, I have opted to choose that combination values begin with 'multi_' (though if you have a different take I would be interested to hear it!).
So for example an user could write:
config = {
'train_config': {'param1': 1, 'param2': [1,2,3], 'multi_param3':[2,3,4]},
'model_config': {'cnn_layers': [{'units':3},{'units':4}], 'multi_param4': [[1,2], [3,4]]}
}
Indicating that 6 configuration files must be generated, where the values of 'param3' and 'param4' take all the possible combinations.
I have written a generator function to do this:
from pandas.io.json.normalize import nested_to_record
import itertools
import operator
from functools import reduce
from collections import MutableMapping
from contextlib import suppress
def generate_multi_conf(config):
flat = nested_to_record(config)
flat = { tuple(key.split('.')): value for key, value in flat.items()}
multi_config_flat = { key[:-1] + (key[-1][6:],) : value for key, value in flat.items() if key[-1][:5]=='multi'}
if len(multi_config_flat) == 0: return # if there are no multi params this generator is empty
keys, values = zip(*multi_config_flat.items())
# delete the multi_params
# taken from https://stackoverflow.com/a/49723101/4841832
def delete_keys_from_dict(dictionary, keys):
for key in keys:
with suppress(KeyError):
del dictionary[key]
for value in dictionary.values():
if isinstance(value, MutableMapping):
delete_keys_from_dict(value, keys)
to_delete = ['multi_' + key[-1] for key, _ in multi_config_flat.items()]
delete_keys_from_dict(config, to_delete)
for values in itertools.product(*values):
experiment = dict(zip(keys, values))
for setting, value in experiment.items():
reduce(operator.getitem, setting[:-1], config)[setting[-1]] = value
yield config
Iterating over this with the example above gives:
{'train_config': {'param1': 1, 'param2': [1, 2, 3], 'param3': 2}, 'model_config': {'cnn_layers': [{'units': 3}, {'units': 4}], 'param4': [1, 2]}}
{'train_config': {'param1': 1, 'param2': [1, 2, 3], 'param3': 2}, 'model_config': {'cnn_layers': [{'units': 3}, {'units': 4}], 'param4': [3, 4]}}
{'train_config': {'param1': 1, 'param2': [1, 2, 3], 'param3': 3}, 'model_config': {'cnn_layers': [{'units': 3}, {'units': 4}], 'param4': [1, 2]}}
{'train_config': {'param1': 1, 'param2': [1, 2, 3], 'param3': 3}, 'model_config': {'cnn_layers': [{'units': 3}, {'units': 4}], 'param4': [3, 4]}}
{'train_config': {'param1': 1, 'param2': [1, 2, 3], 'param3': 4}, 'model_config': {'cnn_layers': [{'units': 3}, {'units': 4}], 'param4': [1, 2]}}
{'train_config': {'param1': 1, 'param2': [1, 2, 3], 'param3': 4}, 'model_config': {'cnn_layers': [{'units': 3}, {'units': 4}], 'param4': [3, 4]}}
Which is the result expected.
Any feedback on how to make this code more readable would be very much appreciated!
1 Answer 1
For non-trivial list comprehensions such as
multi_config_flat = { key[:-1] + (key[-1][6:],) : value for key, value in flat.items() if key[-1][:5]=='multi'}
You should split it onto multiple lines, i.e.
multi_config_flat = {key[:-1] + (key[-1][6:],): value
for key, value in flat.items()
if key[-1][:5]=='multi'}
This:
key[-1][:5]=='multi'
should be
key[-1].startswith('multi')
This:
if len(multi_config_flat) == 0: return
is equivalent (more or less) to
if not multi_config_flat:
return
The latter also catches the case of multi_config_flat
being None
, but that won't be possible in this context.
This:
for key, _ in multi_config_flat.items():
is not necessary; simply iterate over keys
:
for key in multi_config_flat:
This is fairly opaque:
reduce(operator.getitem, setting[:-1], config)[setting[-1]] = value
Probably you should assign the output of reduce
to a meaningfully named variable, so that your code is more clear.
-
1\$\begingroup\$
for key in multi_config_flag.keys()
can simply befor key in multi_config_flag
as the default iterator iskeys()
\$\endgroup\$Stephen Rauch– Stephen Rauch2018年12月24日 21:30:08 +00:00Commented Dec 24, 2018 at 21:30