I have a simulation in Python which reads its configuration from a toml file. Since I have tons of parameters, the toml file can grow quite large.
This is an example file, similar in structure to my actual configuration file:
city.toml
name = 'New York'
[[houses]]
address = 'Foo Street 42'
color = 'red'
[[houses.residents]]
name = 'John'
age = 35
[[houses.residents]]
name = 'Mary'
age = 32
# [[houses]]
# etc ...
# [[houses.residents]]
# etc ...
Once loaded in Python, this turns into a dictionary similar to this:
city = {
'name': 'New York',
'houses': [
{
'address': 'Foo Street 42',
'color': 'red',
'residents': [
{
'name': 'John',
'age': 35,
},
{
'name': 'Mary',
'age': 32,
},
],
},
],
}
My issue is that the toml file can be quite repetitive. For example, if the user wants to simulate multiple residents identical to john = {'name': 'John', 'age': 32}
, they have to go to each line where a house is defined, and copy-paste the value a bunch of times:
[[houses]]
[[houses.residents]]
name = 'John'
age = 35
[[houses]]
[[houses.residents]]
name = 'John'
age = 35
[[houses]]
[[houses.residents]]
name = 'John'
age = 35
# [[houses]]
# etc ...
# [[houses.residents]]
# etc ...
which is both time-consuming and error prone. In particular, the user may run a simulation which is valid, but has undesired parameters, and only notice it later (if at all).
I'm thinking of solving this issue by "modularizing" the repeatable parameters. Something like:
# etc ...
[[houses]]
# etc ...
residents = ['John', 'Mary']
[[houses]]
# etc ...
residents = ['John']
[[houses]]
# etc ...
residents = ['John', 'Mary', 'James']
My idea is that, after parsing the main toml file, the Python code would be responsible for reading the strings in the residents
array and load the respective residents from individual toml files, like:
John.toml
name = 'John'
age = 35
Mary.toml
name = 'Mary'
age = 32
The final Python dictionary would then be constructed at runtime, merging different "submodules" of the configuration file.
My questions:
- Is this a good pattern to follow? Is this approach actually used anywhere?
- Are there significant drawbacks? (One I can think of is passing a value like
residents = ['Mark']
without an existingMark.toml
file - the code would have to deal with these situations somehow) - Are there any alternative solutions that I have not considered?
-
Allow code? Switch to a full Python snippet?Thorbjørn Ravn Andersen– Thorbjørn Ravn Andersen2022年05月29日 09:03:40 +00:00Commented May 29, 2022 at 9:03
5 Answers 5
This doesn’t look like a configuration file to me, but a database. So use a database.
-
Databases are files on stereoids ;)Thomas Junk– Thomas Junk2022年05月27日 07:31:56 +00:00Commented May 27, 2022 at 7:31
-
I've considered using a databases, but the issue is that it becomes much harder for users to edit the parameters, compared to plain text files. Of course I could create a frontend that gives users a simple way to create/save/load the different city/houses/resident presets. It sounds a bit like a scope creep right now, but perhaps it's unavoidable.jfaccioni– jfaccioni2022年05月27日 12:33:26 +00:00Commented May 27, 2022 at 12:33
-
@jfaccioni The database can be populated in code.Thorbjørn Ravn Andersen– Thorbjørn Ravn Andersen2022年05月29日 09:04:09 +00:00Commented May 29, 2022 at 9:04
Your problem is not whether to modularize a configuration file or not. Your problem is managing its content.
Independend from the structure - modularized or not - you should work on a UI to manage your configuration. The configuration may be human readable but better automatically generated.
Perhaps it is no bad idea as @gnasher729 suggested putting the data into a database. So it keeps the configuration part separated from the data part.
Edit:
To clarify:
UI doesn't necessarily mean a GUI or a TUI. It could be a commandline interface. And as such helps the user to interact with the configuration in a safe/sane way.
Database could be such simple thing as SQLite.
-
Thanks for the comment - I am a bit reluctant to go with the UI + DB approach because it seems like such a scope creep right now, but I'm considering if it is perhaps unavoidable, given the complexity of the simulation and its parameters.jfaccioni– jfaccioni2022年05月27日 12:44:18 +00:00Commented May 27, 2022 at 12:44
This is a reasonable way to approach avoiding repeating the same resident. You will need to ensure that each resident has a unique name you can use to refer to them, and (as you noted) ensure that each name actually exists, but doing this will mostly likely be a simple matter of raising and catching exceptions.
The only other draw back is that this might become burdensome when you have a resident that will only be included in one house since you will have to modify either multiple files, or the same file in multiple locations if you decide to merge the configs.
This sounds very similar to the way some network managers handle interfaces. You don't have to re-define each interface when routing traffic from a subnet through eth1
, you only define them once and then refer to them by name.
All in all I think it largely depends on how common you think resident re-use might be. If re-use is more common than single-use the modularize, otherwise, leave it as is.
-
1If there are two people named Mark, how do you ensure there is only one? Using a bullet?gnasher729– gnasher7292022年05月27日 07:05:49 +00:00Commented May 27, 2022 at 7:05
-
@gnasher729 a bullet, add a last name, initial, number, or potentially even add a new ID field that is gaurenteed to be unique among all residents.joshmeranda– joshmeranda2022年05月27日 11:55:51 +00:00Commented May 27, 2022 at 11:55
-
@gnasher729 there should only be a single
Mark.toml
file in the directory. Of course in real life there are multiple people named "Mark", but from the simulation's perspective it's simply a given preset of values (for what it's worth I'm not simulating humans in residences, it was simply an analogy).jfaccioni– jfaccioni2022年05月27日 12:35:50 +00:00Commented May 27, 2022 at 12:35
Can there be two different John
s?
The identifier must be the same as the filename, must it also be the same as the name?
In general, age is a really poor field to persist. Consider a date instead, and calculating age when needed.
As you want to modularize among others this to reduce duplication, consider allowing the other pattern at the discretion of the user. Sometimes, it it just overhead.
-
"Can there be two different
Johns
" - not really, it's just a way of indicating which preset file you want to fill in a given value. Of course the user could name themJohn_surname1
andJohn_surname2
if they wanted, but that's probably not really needed (I'm not actually simulating people, so the issue of having two different presets referred by the same "name" should not come up). Also, "The identifier must be the same as the filename, must it also be the same as the name" - no, the preset doesn't even need to have the "name" key.jfaccioni– jfaccioni2022年05月27日 12:40:52 +00:00Commented May 27, 2022 at 12:40 -
I've considered allowing users to declare values in both ways - either through raw values or through a named preset. It might be worth it to include this flexibility.jfaccioni– jfaccioni2022年05月27日 12:42:29 +00:00Commented May 27, 2022 at 12:42
Your approach of separating house and resident definitions is a sound one, but managing such a large set of files is not really easy either.
What I would suggest is, while keeping the [[houses]]
array as you proposed in your question, make a table [people]
(in the same or a separate file), used like this:
[people.john]
age = 35
[people.mary]
age = 32
You still have a clear separation between house and resident definitions, while having everything in one or two places, making changing it much easier.
Missing references is something that you need to handle, but should not be hard. I my proposed people
should become a KV (Python dict likely) with keys
like john
or mary
.
The only downside here is that people unfamiliar with TOML could create residents with names which would break parsing - for example using a name like john.doe
without quoting it, which actually creates a nested object.
Further reading: TOML's table specification.
Explore related questions
See similar questions with these tags.