Translating dictionary keys in complex nested Python structures

Question 1

This is an actual work problem we had to solve. Put simply: given a structure (e.g. nested dictionaries) and a mapping of old dictionary keys to new ones, produce a new structure that is anatomically identical to the original, uses the new dictionary keys, and preservers every other value.

How to encode this mapping?
How to go about the translation?

Context

We receive these dictionaries in the form of json files through an API and, because of extraneous constraints, the sender doesn't have access to our internal nomenclature system. So we need to convert the names ourselves.

Assembling the mappings is quite a laborious manual effort, as it envolves figuring out semantics and talking to people. We are obviously working on a better solution, but these contraints will hold us for a while longer.

Details

Suppose a system which receives json messages such as

msg = {
 "id": 1,
 "summary": {
 "origin": {
 "url": "url",
 "slug": "slug"
 },
 "tags": ["a", "b"]
 },
 "items": [
 {
 "id": "abc",
 "price": 50
 },
 {
 "id": "def",
 "price": 110,
 "discount": 50
 } 
 ]
}

But in order to move the data forward, the names of the dictionary keys must follow a specific nomenclature. So they must be translated, like so:

translated_msg = {
 "IDENTIF": 1,
 "SUMM": {
 "ORIG": {
 "WEBADDRESS": "url",
 "LOCATOR": "slug"
 },
 "TAGS": ["a", "b"]
 },
 "PURCHASEDGOODS": [
 {
 "GOODSID": "abc",
 "GOODSPRICE": 50
 },
 {
 "GOODSID": "def",
 "GOODSPRICE": 110,
 "GIVENDISCOUNT": 50
 } 
 ]
}

The new terminology comes from a translation dictionary that has to be manually built by someone who is familiar with the data and the nomenclature to be followed. This field map must also encode the anatomy of the original structure because there may be multiple fields with the same name but in different depths. Notice the two id fields above.

Solution

With all this in mind, here is a field map structure which fits the criteria. Its syntax is part of the solution I came up with and can be modified.

field_map = {
 "/id": "IDENTIF",
 "/summary": "SUMM",
 "/summary/origin": "ORIG",
 "/summary/origin/url": "WEBADDRESS",
 "/summary/origin/slug": "LOCATOR",
 "/summary/tags": "TAGS",
 "/items": "PURCHASEDGOODS",
 "/items//id": "GOODSID",
 "/items//price": "GOODSPRICE",
 "/items//discount": "GIVENDISCOUNT",
}

Notice /items//discount has two slashes in the middle. Slashes represent going deeper one level within the structure.

Inspired by https://stackoverflow.com/a/40857703/10504841, here is a recursive solution that, given a structure and a field map, walks through the entire structure and builds a translated copy:

from typing import Iterable, Union
def is_valid_iterable(struct):
 return isinstance(struct, Iterable) and not isinstance(
 struct, (str, bytes)
 )
def is_key_in_dict(key, dict_):
 try:
 _ = dict_[key]
 return True
 except KeyError:
 return False
def translate_nested_structure(
 structure: Union[dict, list, tuple], trans_dict: dict, prefix: str = ""
) -> Union[dict, list, tuple]:
 """
 Translate dictionary keys in a nested structure using a translation
 dictionary. Maintains the same strucutre and primitive values.
 Useful for translating jsons and avro messages
 If a key is present in the structure but not in the translation dictionary,
 it is understood as undesired and removed from the output structure
 If a (sub)structure is made of only lists or tuples, the output
 is simply a copy of the given (sub)structure
 Supported types and content limitation for dictionary (sub)structures
 ------------------------------------------------------
 Key fields can be of any primitive type or None.
 Tuple keys are somewhat supported, but not fully tested and not documented.
 "/" are not allowed inside string keys, see translation dictionary syntax
 Value field can be lists, tuples, dicts, any primitive or None
 Translation dictionary syntax
 ------------------------------
 The translation dictionary must capture the anatomy of the nested
 structure, as different nested keys may share the same name.
 The syntax for the translation dictionary keys is made of
 "/"s and `orig_key`s.
 "/" are used to indicate going deeper whithin the strucure,
 so "/" may not be present inside string keys in the structure.
 Also, the number of preceding "/" should match the nesting level
 of the (sub)structure
 `orig_key` are pieces of string which contain
 the name of the specified original key in the structure.
 The syntax for the keys is easier to understand if thought of backwards:
 every key must end with an `orig_key`, since those are
 what need to be translated. A single preceding "/"
 indicates `orig_key` is key a inside another dicionary
 (e.g. "/start/in_a_dict`). In this case,
 unless `orig_key` is the first key (e.g. "/test"), then "/"
 must be preced by another `orig_key (e.g. "/start/test`).
 Multiple preceding "/" indicate `orig_key` is in a
 list or tuple (e.g. "/start//in_a_list", "//start").
 Since the translation dictionary values contain the desired
 new translated (sub)structure keys, the syntax and supported types are
 the same as the original structure syntax for keys. See above
 Parameters
 ----------
 structure: [dict | list | tuple]
 Nested dict, list or tuple.
 trans_dict: dict
 Translation dictionary, see example below.
 prefix: str
 Prefix used to find keys in the translation dictionary, leave blank
 Returns
 -------
 translated_structure: [dict, list, tuple]
 Same structure, but with translated dictionary keys
 Examples
 --------
 >>> sample_msg = {
 ... "a": {
 ... "b": ["c", "d"],
 ... "e": [
 ... {
 ... "f": {"g": "h"},
 ... },
 ... {
 ... "f": {"g": "h", "g2": "h2"},
 ... },
 ... ],
 ... "i": None,
 ... "j": [],
 ... },
 ... }
 >>> sample_translated_msg = {
 ... "aaaa": {
 ... "bbbb": ["c", "d"],
 ... "eeee": [
 ... {
 ... "ffff": {"gggg": "h"},
 ... },
 ... {
 ... "ffff": {"gggg": "h", "gggg2222": "h2"},
 ... },
 ... ],
 ... "iiii": None,
 ... "jjjj": [],
 ... },
 ... }
 >>> sample_field_map = {
 ... "/a": "aaaa",
 ... "/a/b": "bbbb",
 ... "/a/e": "eeee",
 ... "/a/e//f": "ffff",
 ... "/a/e//f/g": "gggg",
 ... "/a/e//f/g2": "gggg2222",
 ... "/a/i": "iiii",
 ... "/a/j": "jjjj",
 ... }
 >>> translated_msg = translate_nested_structure(
 ... sample_msg, sample_field_map
 ... )
 >>> translated_msg == sample_translated_msg
 True
 TODO
 ----
 - Improve the trans dict syntax?
 """
 def translate_dict(dict_struct, trans_dict, prefix=""):
 if not isinstance(dict_struct, dict):
 raise TypeError("Expect dict, received %s", type(dict_struct))
 new_dict = dict()
 for key, value in dict_struct.items():
 new_prefix = "/".join([prefix, str(key)])
 if not is_key_in_dict(new_prefix, trans_dict):
 continue
 new_key = trans_dict[new_prefix]
 if is_valid_iterable(value):
 new_value = translate_nested_structure(
 value, trans_dict, new_prefix
 )
 else:
 new_value = value
 new_dict[new_key] = new_value
 return new_dict
 def translate_simple_struct(simple_struct, trans_dict, prefix=""):
 if not isinstance(simple_struct, (list, tuple)):
 raise TypeError(
 "Expect list or tuple, received %s", type(simple_struct)
 )
 cls_ = type(simple_struct)
 new_simple_struct = cls_([])
 for item in simple_struct:
 new_prefix = "/".join([prefix, ""])
 if is_valid_iterable(item):
 new_item = translate_nested_structure(
 item, trans_dict, new_prefix
 )
 else:
 new_item = item
 new_simple_struct += cls_([new_item])
 return new_simple_struct
 if isinstance(structure, dict):
 return translate_dict(structure, trans_dict, prefix)
 else:
 return translate_simple_struct(structure, trans_dict, prefix)

About tuples as dicitonary keys. I tested a bit and it is possible to encode tuples in the current version of the field map encoding, but the syntax can become quite complicated, so I decided to leave them out for now. The encoding should be as human friendly as possible.

What are your thoughts on the code itself?
Do you have any suggestions on how to improve the encoding syntax?
What about increasing the level of abstraction and supporting more structures, such as sets, classes or custom Iterables?
I'd also like to hear if other people face similar problems. How often, it at all, do people need to translate dictionary keys like this?

Question 2

Few things seem to be missing, at least translate_nested_structure and is_valid_iterable.

Question 3

Interview question, programming challenge, homework?

Question 4

@Reinderien This is an actual work problem we are facing, I just tried to summarize and I guess it got confusing. We receive this data in json files through an API and, because of extraneous constraints, the sender doesn't have access to our internal nomenclature so we need to convert the names ourselves. Assembling the mappings is quite an effort, as it envolves figuring out semantics and talking to people. We are obviously working on a better solution, but this solution will have to do for some time

Question 5

yes, I came up with that syntax and it can be modified. I edited the question to include the clarifications brought up so far

Question 6

One level. "a": [[{"b":0}]] can be mapped with /a///b

Question 7

How to encode this mapping?

Not the way you've done it, I think. Zen says explicit is better than implicit, and your current mapping is highly implicit. You have a magic double-slash to indicate a list level, and you have an O(n2) problem with your key expressions. These are avoidable problems: don't think of your mapping as being flat, over-the-wire JSON data; think of it as well-typed, well-structured in-memory data. There's no reason for you to write a parsing layer if you don't need it.

Aside: translating from one dict-lasagna domain to another is evidence of a broader, more severe problem with lack of good models (or perhaps no models at all), but you have not shown enough other code for this to be talked about meaningfully.

If what you say is true and these data come directly from JSON, then you need to drop the code that cares about tuples because these will never happen.

Picking up on a few granular review issues (though perhaps these are moot since I'm suggesting that you throw all of the existing code away):

is_valid_iterable should only isinstance(struct, (dict, list))
is_key_in_dict needs to die, and the call needs to be replaced with key in some_dict

Suggested

A re-thought mapping could make use of simple polymorphism, with nary an isinstance in sight:

from dataclasses import dataclass, field
from typing import Any, Union, Optional
Payload = Union[dict[str, Any], list[Any]]
@dataclass
class Node:
 replacement: Optional[str] = None
 def translate(self, structure: Payload) -> Payload:
 return structure
@dataclass
class DictNode(Node):
 nodes: dict[str, 'Node'] = field(default_factory=dict)
 def translate(self, structure: Payload) -> Payload:
 translated = {}
 for key, value in structure.items():
 translator = self.nodes.get(key)
 if translator:
 key = translator.replacement or key
 value = translator.translate(value)
 translated[key] = value
 return translated
class ListNode(DictNode):
 def translate(self, structure: Payload) -> Payload:
 return [
 super(ListNode, self).translate(item)
 for item in structure
 ]
def test() -> None:
 from pprint import pprint
 msg = {
 'id': 1,
 'items': [{'id': 'abc', 'price': 50},
 {'discount': 50, 'id': 'def', 'price': 110}],
 'summary': {'origin': {'slug': 'slug', 'url': 'url'}, 'tags': ['a', 'b']}
 }
 field_map = DictNode(nodes={
 'id': Node('IDENTIF'),
 'summary': DictNode('SUMM', {
 'origin': DictNode('ORIG', {
 'url': Node('WEBADDRESS'),
 'slug': Node('LOCATOR'),
 }),
 'tags': Node('TAGS'),
 }),
 'items': ListNode('PURCHASEDGOODS', {
 'id': Node('GOODSID'),
 'price': Node('GOODSPRICE'),
 'discount': Node('GIVENDISCOUNT'),
 }),
 })
 pprint(field_map.translate(msg))
if __name__ == '__main__':
 test()

Output

{'IDENTIF': 1,
 'PURCHASEDGOODS': [{'GOODSID': 'abc', 'GOODSPRICE': 50},
 {'GIVENDISCOUNT': 50, 'GOODSID': 'def', 'GOODSPRICE': 110}],
 'SUMM': {'ORIG': {'LOCATOR': 'slug', 'WEBADDRESS': 'url'}, 'TAGS': ['a', 'b']}}

Question 8

I'd also hear if other people face problems similar to these. How often do people need to translate dictionary keys like this?

I'd say it's very unusual. In my experience, such dicts are either constructed by jsons or similar to give users / admins a friendly way to script without having any programming knowledge - and internally changing the keys makes no sense, except to increase complexity.

The other way dicts are used is in a programming context where associated data must be stored together. In this context, mostly constants are used as the keys, or input that stays constant. Again, internally changing the keys makes no sense.

The one purpose that I would see dicts used in this way is when the dict is used as a control mechanism, similar to a script engine but completely defined and used by developers. It can make some actual code look extremely neat and clean, however in my opinion it goes against the principle to make code explicit - and therefore decreases readability and understandability.

Question 9

I see your point but I am not familiar with script engines. Can you point me in the right direction? This is what comes up when I google it stackoverflow.com/q/1691201/10504841 and docs.oracle.com/javase/7/docs/api/javax/script/…

Question 10

I think I also failed to contextualize properly. This is an actual work problem we had to solve. I added some more details to the question

Question 11

@pbsb I don't know the actual terminology, but I had a project once where I noticed that the behaviour of my program was highly dependent and similar to the data I input and my configs. So much that I had lambdas in dicts and could 'script' entire behaviours with JSON files only. The code mostly worked with the dicts to transition between states and execute calls to other programs that were defined in said JSON. I named it script engine since it took in some 'script' - JSON - and executed code dependent on it. Not entirely like an interpreter, but a sized down, specialized version.

Question 12

TLDR: A sized down, specialized version of an interpreter.

Question 13

And to point you in the right direction, I'd suggest learning more about compilers (and interpreters, but they're included in compilers). We had very good courses in university, but I don't know how to get that good information outside university. The course was called introduction to compiler construction and conveyed the basics very well. EDIT: don't know how much they fit your problem though, I don't think it's what you're searching for. Doesn't hurt though, it was fun getting to know the magic behind compilers!

Question 14

How certain are you that you will only need key renaming? If this is a real project, your current needs are likely a simplification of your eventual needs. That's just how living software projects behave: you need something, you build something, and the experience with that built thing causes you to need other or different things. Currently, you seem to be performing a simple task: preserving the structural characteristics of the data while renaming the keys. What is the probability that you will need other things in the future: for example, value conversion (eg, int to float) or full-blown data restructuring?

Your need is not novel: do more research to learn how others have dealt with the problem. The Python ecosystem has libraries to perform different kinds of data remappings: here is one called jsonbender. I've never used it and cannot comment on its quality, but a quick scan through the README points to some issues you might want to consider -- notably, dealing with lists, configuring optionality, and building in support for callables to handle computation needs than cannot be easily expressed via a simple configuration syntax (in my own professional experience, that latter has been especially powerful on projects having some overlap with your needs).

Your implementation seems backwards and is thus too limiting. Like one of the reviews, your remapping (in field_map) strikes me as backwards: it maps old paths/keys to new paths/keys. But that is limiting because it provides no mechanism for controlling the output structure. It also seems less intuitive than the alternative -- namely, declaring the structure you want and then, at the leaf nodes, defining how/where to retrieve values from the source. I would encourage you to define the remapping from the perspective of the desired data. For example, if we focus just on the IDENTIF and SUMM keys (plus a FOO key added for illustration), one could define a remapping as follows. Each leaf value can be obtained by diving down though the hierarchy based on the keys declared in each tuple. Even though this example handles only the easy situations in your current problem, it does illustrate -- at least to my eye -- the intuitiveness of defining the remapping from the perspective of the desired output, as well as its greater flexibility in terms of data restructuring, should that need ever arise.

remapping = {
 # Simple dict-to-dict key renaming via data-diving tuples.
 "IDENTIF": ('id',),
 "SUMM": {
 "ORIG": {
 "WEBADDRESS": ('summary', 'origin', 'url'),
 "LOCATOR": ('summary', 'origin', 'slug'),
 },
 "TAGS": ('summary', 'tags'),
 },
 # Restructuring and even reuse of source nodes is possible.
 "FOO": {
 "BAR": ('id',),
 },
}

Dealing with pesky lists. That simple plan falters when it comes to lists. Your workaround was a double-slash convention and one reviewer suggests a using explicit types like Node, DictNode, and ListNode to configure the needed remappings. A middle-ground is to continue with the simplicity of your convention-based approach but to make it a bit more rigorous. The illustration above relies on the convention that a dict in the remapping configuration produces a dict in the output data. We could do the same with lists. The example below would be interpreted as follows: PURCHASEDGOODS will hold a list; we obtain the source data for that list from the key(s) declared inside the list; and the final element of the configuration-list will contain the specification for how to build individual values composing the list. I'm not necessarily advocating this approach, but it does illustrate a low-tech, convention-based approach with greater intuitiveness and flexibility than your current idea.

remapping = {
 ...
 "PURCHASEDGOODS": ['items',
 {
 "GOODSID": ('id',),
 "GOODSPRICE": ('price',),
 "GIVENDISCOUNT": ('discount',),
 }
 ],
}

Making that approach a bit more formal via explicit types. Another middle-ground is something like the following. It still relies on some conventional behavior relating to dicts, but it does have explicit types to distinguish the two primary ways to retrieve data from the source: (1) simple data-diving via a tuple of keys or (2) data-diving over a source list to produce an output list. One benefit of at least adding two types like these is that they provide a mechanism to configure optionality: for example, Diver('discount', default = 0). It would also provide a way to pass in callables to handle more complex needs or even simple value-conversion behavior you might want in the future: for example, Diver('discount', default = 0, convert = float).

remapping = {
 "IDENTIF": Diver('id'),
 "SUMM": {
 "ORIG": {
 "WEBADDRESS": Diver('summary', 'origin', 'url'),
 "LOCATOR": Diver('summary', 'origin', 'slug'),
 },
 "TAGS": Diver('summary', 'tags'),
 },
 "PURCHASEDGOODS": ListDiver('items',
 {
 "GOODSID": Diver('id'),
 "GOODSPRICE": Diver('price'),
 "GIVENDISCOUNT": Diver('discount'),
 },
 ],
}

Other possibilities. The next obvious extension is to formalize the dict-related configuration more explicitly (eg DictDiver). Whether that's worth the trouble depends on your expectations for the future of the project. To my mind, that step seems the least compelling: at a certain point, every project must adopt a variety of conventions and it's no crime to embrace them if they are intuitive and reasonable. If you were to take that step, you would end up with an approach similar to the substantive review you already have, but with the reversed orientation discussed above. Finally, I'll re-emphasize the recommendation to research other libraries that perform this kind of data conversion. Even if you end up adopting a low-tech, convention-based solution, your decision-making should be guided by how others have thought about this topic. And you might get lucky and find a library that already does exactly what you need.

Reinderien Reinderien 71k5 gold badges76 silver badges256 bronze badges · Answer 1 · 2022-06-18 01:26:27Z

How to encode this mapping?

Not the way you've done it, I think. Zen says explicit is better than implicit, and your current mapping is highly implicit. You have a magic double-slash to indicate a list level, and you have an O(n2) problem with your key expressions. These are avoidable problems: don't think of your mapping as being flat, over-the-wire JSON data; think of it as well-typed, well-structured in-memory data. There's no reason for you to write a parsing layer if you don't need it.

Aside: translating from one dict-lasagna domain to another is evidence of a broader, more severe problem with lack of good models (or perhaps no models at all), but you have not shown enough other code for this to be talked about meaningfully.

If what you say is true and these data come directly from JSON, then you need to drop the code that cares about tuples because these will never happen.

Picking up on a few granular review issues (though perhaps these are moot since I'm suggesting that you throw all of the existing code away):

is_valid_iterable should only isinstance(struct, (dict, list))
is_key_in_dict needs to die, and the call needs to be replaced with key in some_dict

Suggested

A re-thought mapping could make use of simple polymorphism, with nary an isinstance in sight:

from dataclasses import dataclass, field
from typing import Any, Union, Optional
Payload = Union[dict[str, Any], list[Any]]
@dataclass
class Node:
 replacement: Optional[str] = None
 def translate(self, structure: Payload) -> Payload:
 return structure
@dataclass
class DictNode(Node):
 nodes: dict[str, 'Node'] = field(default_factory=dict)
 def translate(self, structure: Payload) -> Payload:
 translated = {}
 for key, value in structure.items():
 translator = self.nodes.get(key)
 if translator:
 key = translator.replacement or key
 value = translator.translate(value)
 translated[key] = value
 return translated
class ListNode(DictNode):
 def translate(self, structure: Payload) -> Payload:
 return [
 super(ListNode, self).translate(item)
 for item in structure
 ]
def test() -> None:
 from pprint import pprint
 msg = {
 'id': 1,
 'items': [{'id': 'abc', 'price': 50},
 {'discount': 50, 'id': 'def', 'price': 110}],
 'summary': {'origin': {'slug': 'slug', 'url': 'url'}, 'tags': ['a', 'b']}
 }
 field_map = DictNode(nodes={
 'id': Node('IDENTIF'),
 'summary': DictNode('SUMM', {
 'origin': DictNode('ORIG', {
 'url': Node('WEBADDRESS'),
 'slug': Node('LOCATOR'),
 }),
 'tags': Node('TAGS'),
 }),
 'items': ListNode('PURCHASEDGOODS', {
 'id': Node('GOODSID'),
 'price': Node('GOODSPRICE'),
 'discount': Node('GIVENDISCOUNT'),
 }),
 })
 pprint(field_map.translate(msg))
if __name__ == '__main__':
 test()

Output

{'IDENTIF': 1,
 'PURCHASEDGOODS': [{'GOODSID': 'abc', 'GOODSPRICE': 50},
 {'GIVENDISCOUNT': 50, 'GOODSID': 'def', 'GOODSPRICE': 110}],
 'SUMM': {'ORIG': {'LOCATOR': 'slug', 'WEBADDRESS': 'url'}, 'TAGS': ['a', 'b']}}

lukstru lukstru 1,0184 silver badges18 bronze badges · Answer 2 · 2022-06-17 20:41:33Z

3

\$\begingroup\$

I'd also hear if other people face problems similar to these. How often do people need to translate dictionary keys like this?

I'd say it's very unusual. In my experience, such dicts are either constructed by jsons or similar to give users / admins a friendly way to script without having any programming knowledge - and internally changing the keys makes no sense, except to increase complexity.

The other way dicts are used is in a programming context where associated data must be stored together. In this context, mostly constants are used as the keys, or input that stays constant. Again, internally changing the keys makes no sense.

The one purpose that I would see dicts used in this way is when the dict is used as a control mechanism, similar to a script engine but completely defined and used by developers. It can make some actual code look extremely neat and clean, however in my opinion it goes against the principle to make code explicit - and therefore decreases readability and understandability.

Share

answered Jun 17, 2022 at 20:41

lukstru's user avatar

lukstru lukstru

1,0184 silver badges18 bronze badges

\$\endgroup\$

5

\$\begingroup\$ I see your point but I am not familiar with script engines. Can you point me in the right direction? This is what comes up when I google it stackoverflow.com/q/1691201/10504841 and docs.oracle.com/javase/7/docs/api/javax/script/… \$\endgroup\$

pbsb
– pbsb

2022年06月17日 22:19:26 +00:00
Commented Jun 17, 2022 at 22:19
\$\begingroup\$ I think I also failed to contextualize properly. This is an actual work problem we had to solve. I added some more details to the question \$\endgroup\$

pbsb
– pbsb

2022年06月17日 22:20:16 +00:00
Commented Jun 17, 2022 at 22:20
\$\begingroup\$ @pbsb I don't know the actual terminology, but I had a project once where I noticed that the behaviour of my program was highly dependent and similar to the data I input and my configs. So much that I had lambdas in dicts and could 'script' entire behaviours with JSON files only. The code mostly worked with the dicts to transition between states and execute calls to other programs that were defined in said JSON. I named it script engine since it took in some 'script' - JSON - and executed code dependent on it. Not entirely like an interpreter, but a sized down, specialized version. \$\endgroup\$

lukstru
– lukstru

2022年06月18日 12:39:54 +00:00
Commented Jun 18, 2022 at 12:39
\$\begingroup\$ TLDR: A sized down, specialized version of an interpreter. \$\endgroup\$

lukstru
– lukstru

2022年06月18日 12:40:10 +00:00
Commented Jun 18, 2022 at 12:40
\$\begingroup\$ And to point you in the right direction, I'd suggest learning more about compilers (and interpreters, but they're included in compilers). We had very good courses in university, but I don't know how to get that good information outside university. The course was called introduction to compiler construction and conveyed the basics very well. EDIT: don't know how much they fit your problem though, I don't think it's what you're searching for. Doesn't hurt though, it was fun getting to know the magic behind compilers! \$\endgroup\$

lukstru
– lukstru

2022年06月18日 12:43:08 +00:00
Commented Jun 18, 2022 at 12:43

Add a comment |

FMc FMc 13.1k2 gold badges29 silver badges40 bronze badges · Answer 3 · 2022-06-18 18:35:11Z

How certain are you that you will only need key renaming? If this is a real project, your current needs are likely a simplification of your eventual needs. That's just how living software projects behave: you need something, you build something, and the experience with that built thing causes you to need other or different things. Currently, you seem to be performing a simple task: preserving the structural characteristics of the data while renaming the keys. What is the probability that you will need other things in the future: for example, value conversion (eg, int to float) or full-blown data restructuring?

Your need is not novel: do more research to learn how others have dealt with the problem. The Python ecosystem has libraries to perform different kinds of data remappings: here is one called jsonbender. I've never used it and cannot comment on its quality, but a quick scan through the README points to some issues you might want to consider -- notably, dealing with lists, configuring optionality, and building in support for callables to handle computation needs than cannot be easily expressed via a simple configuration syntax (in my own professional experience, that latter has been especially powerful on projects having some overlap with your needs).

Your implementation seems backwards and is thus too limiting. Like one of the reviews, your remapping (in field_map) strikes me as backwards: it maps old paths/keys to new paths/keys. But that is limiting because it provides no mechanism for controlling the output structure. It also seems less intuitive than the alternative -- namely, declaring the structure you want and then, at the leaf nodes, defining how/where to retrieve values from the source. I would encourage you to define the remapping from the perspective of the desired data. For example, if we focus just on the IDENTIF and SUMM keys (plus a FOO key added for illustration), one could define a remapping as follows. Each leaf value can be obtained by diving down though the hierarchy based on the keys declared in each tuple. Even though this example handles only the easy situations in your current problem, it does illustrate -- at least to my eye -- the intuitiveness of defining the remapping from the perspective of the desired output, as well as its greater flexibility in terms of data restructuring, should that need ever arise.

remapping = {
 # Simple dict-to-dict key renaming via data-diving tuples.
 "IDENTIF": ('id',),
 "SUMM": {
 "ORIG": {
 "WEBADDRESS": ('summary', 'origin', 'url'),
 "LOCATOR": ('summary', 'origin', 'slug'),
 },
 "TAGS": ('summary', 'tags'),
 },
 # Restructuring and even reuse of source nodes is possible.
 "FOO": {
 "BAR": ('id',),
 },
}

Dealing with pesky lists. That simple plan falters when it comes to lists. Your workaround was a double-slash convention and one reviewer suggests a using explicit types like Node, DictNode, and ListNode to configure the needed remappings. A middle-ground is to continue with the simplicity of your convention-based approach but to make it a bit more rigorous. The illustration above relies on the convention that a dict in the remapping configuration produces a dict in the output data. We could do the same with lists. The example below would be interpreted as follows: PURCHASEDGOODS will hold a list; we obtain the source data for that list from the key(s) declared inside the list; and the final element of the configuration-list will contain the specification for how to build individual values composing the list. I'm not necessarily advocating this approach, but it does illustrate a low-tech, convention-based approach with greater intuitiveness and flexibility than your current idea.

remapping = {
 ...
 "PURCHASEDGOODS": ['items',
 {
 "GOODSID": ('id',),
 "GOODSPRICE": ('price',),
 "GIVENDISCOUNT": ('discount',),
 }
 ],
}

Making that approach a bit more formal via explicit types. Another middle-ground is something like the following. It still relies on some conventional behavior relating to dicts, but it does have explicit types to distinguish the two primary ways to retrieve data from the source: (1) simple data-diving via a tuple of keys or (2) data-diving over a source list to produce an output list. One benefit of at least adding two types like these is that they provide a mechanism to configure optionality: for example, Diver('discount', default = 0). It would also provide a way to pass in callables to handle more complex needs or even simple value-conversion behavior you might want in the future: for example, Diver('discount', default = 0, convert = float).

remapping = {
 "IDENTIF": Diver('id'),
 "SUMM": {
 "ORIG": {
 "WEBADDRESS": Diver('summary', 'origin', 'url'),
 "LOCATOR": Diver('summary', 'origin', 'slug'),
 },
 "TAGS": Diver('summary', 'tags'),
 },
 "PURCHASEDGOODS": ListDiver('items',
 {
 "GOODSID": Diver('id'),
 "GOODSPRICE": Diver('price'),
 "GIVENDISCOUNT": Diver('discount'),
 },
 ],
}

Other possibilities. The next obvious extension is to formalize the dict-related configuration more explicitly (eg DictDiver). Whether that's worth the trouble depends on your expectations for the future of the project. To my mind, that step seems the least compelling: at a certain point, every project must adopt a variety of conventions and it's no crime to embrace them if they are intuitive and reasonable. If you were to take that step, you would end up with an approach similar to the substantive review you already have, but with the reversed orientation discussed above. Finally, I'll re-emphasize the recommendation to research other libraries that perform this kind of data conversion. Even if you end up adopting a low-tech, convention-based solution, your decision-making should be guided by how others have thought about this topic. And you might get lucky and find a library that already does exactly what you need.

Stack Exchange Network

Translating dictionary keys in complex nested Python structures

Context

Details

Solution

3 Answers 3

Suggested

Output

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Translating dictionary keys in complex nested Python structures

Context

Details

Solution

3 Answers 3

Suggested

Output

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions