1
\$\begingroup\$

I wanted to have something more convenient to work with nested dictionaries that come from yaml files so I wrote this DictIndexter. Its __getitem__ and __setitem__ accept a path instead of a single key. Additionally the getter can also take a tuple with path and default value. The setter on the other hand is able to create new branches, but only when it's a clean branch. This is a branch assigned to a non-existing key. It wont overwrite another value.


Implementation


class DictIndexer:
 def __init__(self, source: Dict[str, Any], set_creates_branches: bool = True):
 self.source = source
 self.set_creates_branches = set_creates_branches
 def __getitem__(self, path: str | Tuple[str, Any]) -> Any | None:
 path, default = path if isinstance(path, tuple) else (path, None)
 item = self.source
 for key in DictIndexer._split(path):
 item = item.get(key, None)
 if item is None:
 return default
 else:
 return item
 def __setitem__(self, path: str, value: Any):
 class Missing:
 pass
 names = DictIndexer._split(path)
 prev = self.source
 item = self.source
 for key in names[:-1]:
 item = item.get(key, Missing())
 match item:
 case dict():
 prev = item
 case Missing():
 if self.set_creates_branches:
 prev[key] = {}
 else:
 raise KeyError(f"Cannot create branch '{key}. Disabled")
 case _:
 raise KeyError(f"Cannot overwrite '{key}.")
 else:
 item[names[-1]] = value
 @staticmethod
 def _split(path: str) -> List[str]:
 return re.split(r"[/\\]", path)

Tests

With pytest I made sure that the following features work:


def test_can_get_item_by_path():
 data = {
 "foo": {
 "bar": "baz"
 }
 }
 assert DictIndexer(data)["foo/bar"] == "baz"
def test_gets_null_when_last_name_does_not_exist():
 data = {
 "foo": {
 "bar": "baz"
 }
 }
 assert DictIndexer(data)["foo/baz"] is None
def test_gets_null_or_default_when_middle_name_does_not_exist():
 data = {
 "foo": {
 "bar": {
 "baz": "baz"
 }
 }
 }
 assert DictIndexer(data)["foo/baz/baz"] is None
 assert DictIndexer(data)[("foo/baz/baz", "qux")] == "qux"
def test_can_set_item():
 data = {
 "foo": {
 "bar": {
 "baz": "baz"
 }
 }
 }
 DictIndexer(data)["foo/bar/baz"] = "qux"
 assert DictIndexer(data)["foo/bar/baz"] == "qux"
def test_can_set_item_on_new_branches():
 data = {
 "foo": {
 "bar": {
 "baz": {
 "foo": "bar"
 }
 }
 }
 }
 DictIndexer(data)["foo/bar/baz/qux"] = "qux"
 assert DictIndexer(data)["foo/bar/baz/qux"] == "qux"
def test_does_not_create_new_branches_when_disabled():
 data = {
 "foo": {
 "bar": {
 }
 }
 }
 with pytest.raises(KeyError):
 DictIndexer(data, set_creates_branches=False)["foo/bar/baz/qux"] = "qux"
def test_does_not_overwrite_value_with_dict():
 data = {
 "foo": {
 "bar": {
 "baz": "baz"
 }
 }
 }
 with pytest.raises(KeyError):
 DictIndexer(data)["foo/bar/baz/qux"] = "qux"

Example

I currently use it to customize the logger cofiguration:

def _initialize_logging():
 import logging.config
 config = DictIndexer(assets.cfg("wiretap"))
 config["formatters/auto/()"] = wiretap.MultiFormatter
 config["formatters/auto/./values"] = [f"v{app.VERSION}-{os.environ['mode'][0]}"]
 config["handlers/file/filename"] = assets.home("log", f"{app.ID}.log")
 config["handlers/sqlserver/insert"] = assets.sql("wiretap", "insert log.sql")
 logging.config.dictConfig(config.source)

Why do you think? What would you improve?

Peilonrayz
44.4k7 gold badges80 silver badges157 bronze badges
asked Mar 27, 2023 at 11:26
\$\endgroup\$
1
  • \$\begingroup\$ You offer both {getter, setter}, cool! FYI, here is a nice package focused on the getter use case: pypi.org/project/glom \$\endgroup\$ Commented Mar 29, 2023 at 16:30

3 Answers 3

4
\$\begingroup\$

Overview

The code looks good, properly typed, tested. This is a great starting point.

Here are various details can could be improved.

Documentation

Your code could definitely be improved with docstrings.

Imports

This is more about the StackExchange question than about the code itself but providing the imports can make testing/reviewing easier for others.

from typing import Dict, Any, Tuple, List
import re

Tests: splitting the setup from the logic being tested

This is mostly a matter of personal preference but I find a line such as assert DictIndexer(data)[("foo/baz/baz", "qux")] == "qux" a bit hard to grasp. The test cases can probably be made easier to read if the DictIndexer was created as an initial step and then used to perform queries.

We'd get something like:

def test_gets_null_or_default_when_middle_name_does_not_exist():
 data = DictIndexer({
 "foo": {
 "bar": {
 "baz": "baz"
 }
 }
 })
 assert data["foo/baz/baz"] is None
 assert data[("foo/baz/baz", "qux")] == "qux"

This is even more relevant when with pytest.raises(KeyError) is used as you definitely do not want the constructor of DictIndexer to be the part raising the error.

Splitting

Your method to split relies on regexp. It seems slightly overkill in this particular case as you are using a single character for splitting and str.split would do the trick.

Also, you could make your code slightly more generic by providing the separator in the constructor.

 def __init__(self, source: Dict[str, Any], set_creates_branches: bool = True, separator='/'):
 self.source = source
 self.set_creates_branches = set_creates_branches
 self.separator = separator
 (...)
 def _split(self, path: str) -> List[str]:
 return path.split(self.separator)

Getting elements with default value

I find the way the default value can be provided to __getitem__ slightly awkward. My preference would be to rely on how things are done for dict:

  • d[key]: Return the item of d with key key. Raises a KeyError if key is not in the map.
  • get(key[, default]): Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to None, so that this method never raises a KeyError.
 def get(self, path, default=None):
 try:
 return self[path]
 except KeyError:
 return default
 def __getitem__(self, path: str) -> Any | None:
 item = self.source
 for key in self._split(path):
 item = item.get(key, None)
 if item is None:
 raise KeyError(f"Key '{key} not found.")
 else:
 return item

else for for loop

The optional else for for loops is a nice idiom which deserves some love and can be used to perform an operation in the case where break did not occur in the loop and the loops stops in a "normal" way.

In our particular case, there is no break involved at all so I'm not so sure that it is relevant.

Tuple unpacking using asterisks

Instead of messing with indices with names in __setitem__, you could directly unpack it into the parts you are interested in: *names, last_name = self._split(path).

Final code

At this stage, the code looks like this:

from typing import Dict, Any, Tuple, List
class DictIndexer:
 def __init__(self, source: Dict[str, Any], set_creates_branches: bool = True, separator='/'):
 self.source = source
 self.set_creates_branches = set_creates_branches
 self.separator = separator
 def get(self, path, default=None):
 try:
 return self[path]
 except KeyError:
 return default
 def __getitem__(self, path: str) -> Any | None:
 item = self.source
 for key in self._split(path):
 item = item.get(key, None)
 if item is None:
 raise KeyError(f"Key '{key} not found.")
 return item
 def __setitem__(self, path: str, value: Any):
 class Missing:
 pass
 *names, last_name = self._split(path)
 prev = self.source
 item = self.source
 for key in names:
 item = item.get(key, Missing())
 match item:
 case dict():
 prev = item
 case Missing():
 if self.set_creates_branches:
 prev[key] = {}
 else:
 raise KeyError(f"Cannot create branch '{key}. Disabled")
 case _:
 raise KeyError(f"Cannot overwrite '{key}.")
 item[last_name] = value
 def _split(self, path: str) -> List[str]:
 return path.split(self.separator)
def test_can_get_item_by_path():
 data = DictIndexer({
 "foo": {
 "bar": "baz"
 }
 })
 assert data["foo/bar"] == "baz"
def test_gets_null_when_last_name_does_not_exist():
 data = DictIndexer({
 "foo": {
 "bar": "baz"
 }
 })
 try:
 data["foo/baz"]
 except KeyError:
 pass
 assert data.get("foo/baz") is None 
def test_gets_null_or_default_when_middle_name_does_not_exist():
 data = DictIndexer({
 "foo": {
 "bar": {
 "baz": "baz"
 }
 }
 })
 try:
 data["foo/baz/baz"]
 except KeyError:
 pass
 assert data.get("foo/baz/baz", "qux") == "qux"
def test_can_set_item():
 data = DictIndexer({
 "foo": {
 "bar": {
 "baz": "baz"
 }
 }
 })
 assert data["foo/bar/baz"] != "qux"
 data["foo/bar/baz"] = "qux"
 assert data["foo/bar/baz"] == "qux"
def test_can_set_item_on_new_branches():
 data = DictIndexer({
 "foo": {
 "bar": {
 "baz": {
 "foo": "bar"
 }
 }
 }
 })
 data["foo/bar/baz/qux"] = "qux"
 assert data["foo/bar/baz/qux"] == "qux"
def test_does_not_create_new_branches_when_disabled():
 data = DictIndexer({
 "foo": {
 "bar": {
 }
 }
 }, set_creates_branches=False)
 try:
 data["foo/bar/baz/qux"] = "qux"
 except KeyError:
 pass
def test_does_not_overwrite_value_with_dict():
 data = DictIndexer({
 "foo": {
 "bar": {
 "baz": "baz"
 }
 }
 })
 try:
 data["foo/bar/baz/qux"] = "qux"
 except KeyError:
 pass

Edit

Something that I had forgotten: now that __getitem__ throws, one could directly use:

 def __getitem__(self, path: str) -> Any | None:
 item = self.source
 for key in self._split(path):
 item = item[key]
 return item

Similarly, we could imagine getting rid of the Missing class and work with exceptions directly:

 def __setitem__(self, path: str, value: Any):
 *names, last_name = self._split(path)
 prev = self.source
 item = self.source
 for key in names:
 try:
 item = item[key]
 except KeyError:
 if self.set_creates_branches:
 prev[key] = {}
 continue
 else:
 raise KeyError(f"Cannot create branch '{key}. Disabled")
 match item:
 case dict():
 prev = item
 case _:
 raise KeyError(f"Cannot overwrite '{key}.")
 item[last_name] = value
answered Mar 28, 2023 at 8:03
\$\endgroup\$
3
  • \$\begingroup\$ These are great advices! I've learnt some new tricks and better practices. Thanks! \$\endgroup\$ Commented Mar 28, 2023 at 8:56
  • 1
    \$\begingroup\$ @t3chb0t I'm glad you like it! FYI, I've added an additional suggestion. \$\endgroup\$ Commented Mar 28, 2023 at 9:58
  • \$\begingroup\$ I like it very much. Code must be like art and now it is ;-] \$\endgroup\$ Commented Mar 28, 2023 at 16:07
4
\$\begingroup\$

Bug

I'm not liking some code that I see in __setitem__, so let's run a few more tests:

>>> data = {"foo": {}}
>>> DictIndexer(data)["foo/bar/baz/qux"] = "qux"
Traceback (most recent call last):
 File "<pyshell#5>", line 1, in <module>
 DictIndexer(data)["foo/bar/baz/qux"] = "qux"
 File "C:/Users/aneufeld/Documents/Experiments/DictIndex.py", line 27, in __setitem__
 item = item.get(key, Missing())
AttributeError: 'Missing' object has no attribute 'get'

Yup. Creating more than one missing path segment is definitely a problem.

>>> data
{'foo': {'bar': {}}}

It has only created the first missing key & dictionary. So let's try a shorter test:

>>> data = {"foo": {}}
>>> DictIndexer(data)["foo/bar/qux"] = "qux"
Traceback (most recent call last):
 File "<pyshell#8>", line 1, in <module>
 DictIndexer(data)["foo/bar/qux"] = "qux"
 File "C:/Users/aneufeld/Documents/Experiments/DictIndex.py", line 39, in __setitem__
 item[names[-1]] = value
TypeError: 'Missing' object does not support item assignment

Interesting. That fails in a slightly different way.

Let's look at the code:

 prev = self.source
 item = self.source
 for key in names[:-1]:
 item = item.get(key, Missing())
 match item:
 case dict():
 prev = item
 case Missing():
 if self.set_creates_branches:
 prev[key] = {}
 else:
 raise KeyError(f"Cannot create branch '{key}. Disabled")
 case _:
 raise KeyError(f"Cannot overwrite '{key}.")

Yup. We initializes prev, then we have a loop the update the value of prev at each iteration ... except when a missing key is encountered, something can be done and the loop goes on to the next iteration without prev being updated.

Wait ... no ... the error messages are talking about the Missing object! This is not the bug I was expecting; it is an entirely different bug!

item gets left as Missing() and,

  • if it was the last iteration we can't set the final key value into item,
  • if it wasn't the last iteration, we can't look up the next path key from item.

When you create a new branch, you need to update prev to the newly created dictionary (the bug I thought I saw), but you also need to set item to that same newly created dictionary ...

 ...
 item = item.get(key, Missing()) # Intended action: update item
 match item:
 case dict():
 prev = item # Prep for next loop
 case Missing():
 if self.set_creates_branches:
 prev[key] = {} # Create missing dictionary
 item = prev.get(key) # <-- Fix: Replicate intended action
 prev = item # <-- Fix: prep for next loop
 ...

We can re-work this to remove inefficiencies. Note that before and after this code block, prev and item are the same, so we can move prev entirely inside the loop.

 item = self.source
 for key in names[:-1]:
 prev = item
 item = item.get(key, Missing())
 match item:
 case dict():
 pass
 case Missing():
 if self.set_creates_branches:
 item = prev[key] = {}
 else:
 raise KeyError(f"Cannot create branch '{key}. Disabled")
 case _:
 raise KeyError(f"Cannot overwrite '{key}.")
 else:
 item[names[-1]] = value

Testing

Tests are good! I like testing!

But, are you really testing the right things?

def test_can_set_item():
 data = {
 "foo": {
 "bar": {
 "baz": "baz"
 }
 }
 }
 DictIndexer(data)["foo/bar/baz"] = "qux"
 assert DictIndexer(data)["foo/bar/baz"] == "qux"

Have any other keys/subkeys been added to data? Any been accidentally removed? Yes, the key you are setting is tested to verify getting it returns the value you just set, but is that the only effect?

 assert data == {"foo": {"bar": {"baz": "qux"}}}

Now we're testing data is properly modified.

answered Mar 29, 2023 at 15:44
\$\endgroup\$
1
  • 3
    \$\begingroup\$ This review contains solid advice that the OP would be wise to follow: when testing, don't assert so narrowly that you effectively assume your code's correctness. I re-learned this lesson recently. A small project I'm working on had accumulated a couple dozen tests that were too repetitive. I wrote an assertion-making helper function that most of the tests were able to use. I thought I was just getting rid of code repetition, but there was an unanticipated side effect: for several tests, use of the helper widened their assertion-making considerably – exposing a couple of edge-case bugs! \$\endgroup\$ Commented Mar 29, 2023 at 16:45
3
+200
\$\begingroup\$

Your __getitem__ function indicates two conflicting ideas.

  1. You call out __getitem__ can return None.

    def __getitem__(...) -> Any | None:
    
  2. Your function cannot return None unless default is None.

    item = item.get(key, None)
    if item is None:
     return default
    
    >>> DictIndexer({"a": None})["a", "default value"]
    default value
    

Because of the code in the second item, your class doesn't work in a similar way to dict.get. A fairly large Python idiom is EAFP. To easily get the same interface as dict.get is really simple with EAFP.

>>> {"a": None}.get("a", "default value")
None
def __getitem__(self, path: str | Tuple[str, Any]) -> Any | None:
 path, default = path if isinstance(path, tuple) else (path, None)
 try:
 item = self.source
 for key in self._split(path):
 item = item[key]
 return item
 except KeyError:
 return default

I can also see a potential for conflicting ideas on type hints - Any | None is Any. I did things like Any | None when I wanted to make the class generic but didn't fully know how. So I'm going to go through an over-kill entirely generic class.

First we can define a recursive generic alias for a dictionary of dictionaries or values. Basically the dict tree we're interacting with.

TKey = TypeVar("TKey")
TValue = TypeVar("TValue")
ATree: TypeAlias = dict[TKey, TValue] | TValue | dict[TKey, "ATree[TKey, TValue]"]

From here we can define a generic class which takes a Tree as input. And we'll define _split's type hints to remove errors with mypy.

class DictIndexer(Generic[TKey, TValue]):
 def __init__(self, source: Tree[TKey, TValue]) -> None:
 self.source = source
 @classmethod
 def _split(cls, path: TKey) -> Iterable[TKey]:
 raise NotImplementedError(f"{cls.__name__}._split has not been defined")

And then finally use overload to completely type __getitem__. There are two paths:

  • no default specified so the type is None, or
  • a default type is provided which we should union with.
TDefault = TypeVar("TDefault")
...
 @overload
 def __getitem__(self, path: TKey | tuple[TKey]) -> TValue | None: ...
 @overload
 def __getitem__(self, path: tuple[TKey, TDefault]) -> TValue | TDefault: ...
 def __getitem__(self, path: Any) -> Any:

From here we can see the generic types for DictIndexer are inferred by the dictionary literal. And we can see __getitem__ correctly sets the correct return types.

tree = DictIndexer({"a": None})
reveal_type(tree)
reveal_type(tree["a"])
reveal_type(tree["a", "default value"])
>>> mypy foo.py
foo.py:36: note: Revealed type is "foo.DictIndexer[builtins.str, None]"
foo.py:37: note: Revealed type is "None"
foo.py:38: note: Revealed type is "Union[None, builtins.str]"

Your __setitem__ is far less EAFP than __getitem__. If we ignore the "able to create new branches, but only when it's a clean branch" part we can make a very simple EAFP function.

Lets look at the edge-cases:

  1. >>> {}["a"]
    KeyError: 'a'
    
  2. >>> None["a"]
    TypeError: 'NoneType' object is not subscriptable
    

So we can quite easily rewrite __setitem__ in the exact same way we did with __getitem__.

def __setitem__(self, path: TKey, value: TValue) -> None:
 *head, tail = key, *_ = self._split(path)
 try:
 item = self.source
 for key in head:
 item = item[key]
 except KeyError:
 raise KeyError(f"Cannot create branch '{key}. Disabled") from None
 except TypeError:
 raise KeyError(f"Cannot overwrite '{key}.") from None
 else:
 item[tail] = value

Now we can change __setitem__ to support the dict.setdefault like behaviour.

try:
 item = self.source
 if not self.set_creates_branches:
 for key in head:
 item = item[key]
 else:
 for key in head:
 try:
 item = item[key]
 except KeyError:
 item[key] = item = {}
except KeyError:
from typing import Any, Generic, Iterable, TypeAlias, TypeVar, overload
TKey = TypeVar("TKey")
TValue = TypeVar("TValue")
TDefault = TypeVar("TDefault")
ATree: TypeAlias = dict[TKey, TValue] | TValue | dict[TKey, "ATree[TKey, TValue]"]
class BaseDictIndexer(Generic[TKey, TValue]):
 def __init__(
 self,
 source: ATree[TKey, TValue],
 set_creates_branches: bool = True,
 ) -> None:
 self.source = source
 self.set_creates_branches = set_creates_branches
 @classmethod
 def _split(cls, path: TKey) -> Iterable[TKey]:
 raise NotImplementedError(f"{cls.__name__}._split has not been defined")
 @overload
 def __getitem__(self, path: TKey | tuple[TKey]) -> TValue | None: ...
 @overload
 def __getitem__(self, path: tuple[TKey, TDefault]) -> TValue | TDefault: ...
 def __getitem__(self, path: Any) -> Any:
 path, default = path if isinstance(path, tuple) else (path, None)
 try:
 item = self.source
 for key in DictIndexer._split(path):
 item = item[key] # type: ignore
 return item
 except KeyError:
 return default
 def __setitem__(self, path: TKey, value: TValue) -> None:
 *head, tail = key, *_ = self._split(path)
 try:
 item = self.source
 if not self.set_creates_branches:
 for key in head:
 item = item[key] # type: ignore
 else:
 for key in head:
 try:
 item = item[key] # type: ignore
 except KeyError:
 item[key] = item = {} # type: ignore
 except KeyError:
 raise KeyError(f"Cannot create branch '{key}'. Disabled") from None
 except TypeError:
 raise KeyError(f"Cannot overwrite '{key}'.") from None
 else:
 item[tail] = value # type: ignore
class DictIndexer(BaseDictIndexer[str, TValue]):
 @staticmethod
 def _split(path: str) -> list[str]:
 return path.split("/")
# tree = DictIndexer({"a": None}, set_creates_branches=False)
tree = DictIndexer({"a": None})
print(tree["a"])
tree["b/c/d"] = None
print(tree["b/c/d"])
print(tree.source)
answered Mar 29, 2023 at 0:24
\$\endgroup\$
7
  • \$\begingroup\$ This is rocket-science for me, but I now see how powerful generics can be so I'll dig into that. \$\endgroup\$ Commented Mar 29, 2023 at 11:03
  • \$\begingroup\$ I'm studying your suggestions and I'm currently stuck on Tree[TKey, TValue] that I wasn't able to crack. You defined it as a TypeAlias (with a typo as ATree) from what I understand it can be either one of the tree types separated by the pipe |. What I don't understand is why is Tree taking two generic arguments? Is it because you use the first of the types in the list this is dict[TKey, TValue] as the source or is it becuase the types within the alias use generic types so you need to specify them in the declaration. I tend to think this is the latter. \$\endgroup\$ Commented Apr 1, 2023 at 10:06
  • \$\begingroup\$ Then in the last type Dict[TKey, "Tree[TKey, TValue]"] you use a string. I guess this is because of the recursion or self-reference. I'm typing it step by step in PyCharm and it recognizes this correctly. I think I've found it in the docs. It's called a forward reference, right? \$\endgroup\$ Commented Apr 1, 2023 at 10:13
  • 1
    \$\begingroup\$ 1. I prefix all TypeAlias' with A. I also prefix Protocols with I (like C#) and suffix ABCs with Base. 2. Lets pretend ATree is ATree = dict[TKey, TValue]. When we're using ATree we're effectively just using dict. So when you pass the types to dict you need both the key and the value types. So because we've defined two generic types in the type alias we need to specify all the generics. -- An interesting side note is if you ATree = TValue | dict ... the arguments to ATree will be flipped to ATree[TValue, TKey]. Yes the string is a forward reference. \$\endgroup\$ Commented Apr 1, 2023 at 16:39
  • 1
    \$\begingroup\$ @t3chb0t 4. Multiple assignment acts from left to right. item[key] = item = {} assigns item[key] first, then assigns item. If you instead do item = item[key] = {} then you're doing item = {}, item[key] = item, which should obviously become the infinitely recursive dict you see. 3. Your result doesn't seem right, I get head=[1, 2, 3, 4, 5], tail=6, key=1, x=[2, 3, 4, 5, 6]. Notice tail=6 not tail=1. As mentioned before multiple assignments act right to left. Additionally the resulting object in tuple unpacking is the entire tuple. We haven't sliced the tuple. \$\endgroup\$ Commented Apr 1, 2023 at 16:48

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.