Navigate objects from a path provided as a string

Question 1

Recently I wrote a validator for some objects. To increase usability the validator gives the user hints on how to fix some of the found issues. Since the application is around something programmatic, and that many of the issues wouldn't make sense without identifying a path to an object; I decided to build a navigator.

At first I didn't have this so I just used getattr and defaulted to the very helpful <Unknown> value if that failed to get the value. This resulted in some useful; some less useful messages.

Upon completing the V1 of the below code I had the clever idea of allowing the validator to just fix the fixable issues. Since this is a flag, it allows the user to get a working version and then come back and clean it up once it works. However this quickly showed some errors in my initial design as I didn't plan for setting and deleting values.

I feel like given how catastrophically V1 failed V2 isn't going to be much better. Unlike with V1, with V2 I followed TDD, and so the code slowly built up over some time. This has caused a lot of the nuance to just disappear and I find _parse to be a bit of an eyesore.

I have added a full description on what this achieves in the code, however a short overview would be that it's similar to eval() just it's not evil.

Attribute indexing similar to standard Python. data.foo.bar.baz.
Escape special (all) characters using a slash \.
Index items using brackets. data["key1"]["key2"].
Allow two literals strings or integers, 1 and "key".

However it acts quite a bit differently as each level has data implicitly prepended - if it is not a literal.

[["key"]] would be the same as data[data["key"]] in Python, and
[foo] would be the same as data[data.foo] rather than data["foo"].

I would like a review of any or all of my code. However changes that reduce version compatibility are generally unhelpful.

import dataclasses
from typing import Any, Iterator, List, NoReturn, Optional, Union, cast
@dataclasses.dataclass
class Attr:
 _index: "Path"
 def get(self, root_obj: Any, obj: Any) -> Any:
 """Get attribute from object."""
 return getattr(obj, self._index.get(root_obj))
 def set(self, root_obj: Any, obj: Any, value: Any) -> Any:
 """Set object's attribute to value."""
 setattr(obj, self._index.get(root_obj), value)
 def delete(self, root_obj: Any, obj: Any) -> Any:
 """Delete object's attribute."""
 delattr(obj, self._index.get(root_obj))
@dataclasses.dataclass
class Item:
 _index: "Path"
 def get(self, root_obj: Any, obj: Any) -> Any:
 """Index object."""
 return obj[self._index.get(root_obj)]
 def set(self, root_obj: Any, obj: Any, value: Any) -> Any:
 """Set object's index to value."""
 obj[self._index.get(root_obj)] = value
 def delete(self, root_obj: Any, obj: Any) -> Any:
 """Delete object's index."""
 del obj[self._index.get(root_obj)]
@dataclasses.dataclass
class Literal:
 _index: Any
 def get(self, root_obj: Any, obj: Any) -> Any:
 """Get literal value."""
 return self._index
 def set(self, root_obj: Any, obj: Any, value: Any) -> NoReturn:
 """Unable to change a literal, the path is misconfigured."""
 raise TypeError("Can't set to a literal")
 def delete(self, root_obj: Any, obj: Any) -> NoReturn:
 """Unable to delete a literal, the path is misconfigured."""
 raise TypeError("Can't delete a literal")
@dataclasses.dataclass
class Path:
 _nodes: List[Union[Attr, Item, Literal]]
 @classmethod
 def from_str(cls, value: str) -> "Path":
 """
 Build a path from string form.
 Without any special characters this works the same way that
 `getattr` and friends works.
 >>> assert '1'.isnumeric()
 >>> assert getattr('1', 'isnumeric')()
 >>> assert Path.from_str('isnumeric').get('1')()
 You can get the same functionality as attrgetter with one
 argument by splitting by period.
 >>> import datetime
 >>> import operator
 >>> assert (
 ... operator.attrgetter('date.today')(datetime)()
 ... == Path.from_str('date.today').get(datetime)()
 ... )
 You can index an item using square brackets.
 Like we're used to in Python.
 >>> data = ['foo', 'bar', 'baz']
 >>> assert data[1] == 'bar'
 >>> assert Path.from_str('[1]').get(data) == 'bar'
 You can index a dictionary by indexing by a string literal.
 >>> data = {'foo': 'bar'}
 >>> assert data['foo'] == 'bar'
 >>> assert Path.from_str('["foo"]').get(data) == 'bar'
 You can escape characters by using a period.
 >>> data = {'foo.bar': 'baz'}
 >>> assert data['foo.bar'] == 'baz'
 >>> assert Path.from_str('["foo\\.bar"]').get(data) == 'baz'
 You can set and delete by using those methods instead.
 And you can mix all the above together to walk complex paths.
 >>> data = {'foo': 'bar', 'bar': 'baz'}
 >>> assert Path.from_str('[["foo"]]').get(data) == 'baz'
 >>> class Test:
 ... foo = ['bar', 'baz']
 >>> assert Path.from_str('foo[1]').get(Test) == 'baz'
 """
 return cls(list(_parse(iter(value))))
 def get(self, obj: Any) -> Any:
 """Walk the path and get the resulting value."""
 root_obj = obj
 for node in self._nodes:
 obj = node.get(root_obj, obj)
 return obj
 def set(self, obj: Any, value: Any) -> Any:
 """Set the leaf node to the entered value."""
 root_obj = obj
 for node in self._nodes[:-1]:
 obj = node.get(root_obj, obj)
 self._nodes[-1].set(root_obj, obj, value)
 def delete(self, obj: Any) -> Any:
 """Delete the leaf node."""
 root_obj = obj
 for node in self._nodes[:-1]:
 obj = node.get(root_obj, obj)
 self._nodes[-1].delete(root_obj, obj)
STRING_DELIMITERS = {'"', "'"}
Split = Union[None, str]
def _parse(
 chars: Iterator[str],
 end: Optional[str] = None,
) -> Iterator[Union[Attr, Item, Literal]]:
 """
 Parse a string into an easy to use representation.
 - Non-special characters are just appended to segment. It's a
 list to prevent the \$O(n^2)\$ time complexity immutable
 strings would have.
 This is later parsed by :code:`_convert` to yield its correct
 representation. This function only yields from this function.
 - Split stores the previous form of split.
 This can be "" - no split, a . or [ for a period or bracket split,
 or None for a character that split the segment but can't split.
 An example of None would be a string literal in the middle of
 a segment 'foo"bar"baz'. This is not a legal construct.
 That should be '"foobarbaz"'.
 - This function is recursive only when interpreting the [ split.
 This means foo[0].bar would have four segments. Both foo and 0
 will have a split of "". As they're the start of their own
 parse function. bar would have the split . as it follows a period.
 The empty list between ]. would have the split None. This is
 as "[foo]bar.baz" doesn't make much sense.
 """
 segment = []
 split: Split = ""
 for char in chars:
 if char == "\\":
 segment.append(next(chars))
 elif char in STRING_DELIMITERS:
 if segment:
 raise ValueError(
 "String literal can't start in the middle of an attribute"
 )
 yield from _convert(split, _extract_string(chars, char))
 split = None
 elif char == ".":
 yield from _convert(split, segment)
 segment = []
 split = "."
 elif char == "[":
 if segment:
 yield from _convert(split, segment)
 segment = []
 yield from _convert("[", _parse(chars, "]"))
 split = None
 elif char == "]":
 if char == end:
 break
 raise ValueError("Found a close bracket without a matching open bracket")
 else:
 segment.append(char)
 else:
 if end:
 raise ValueError("Found an open bracket without a matching close bracket")
 if segment:
 yield from _convert(split, segment)
def _convert(
 split: Split,
 segment: Union[List[str], Iterator[Union[Attr, Item, Literal]]],
) -> Iterator[Union[Attr, Item, Literal]]:
 """
 Convert a segment into an attribute, item or literal.
 All leaf nodes are Literals, these are normally plain old strings.
 However if the first segment (split == "") starts with an integer
 then we will convert the segment to an integer. This allows us to
 index lists.
 If we have an illegal split, None, with content in the segment
 then we raise here.
 Whilst not a segment we pass the result of recursive _parse calls
 through this function. These are converted into an Item index with
 the result given to the containing Path.
 Everything else is just converted to an attribute with a path
 containing a single literal.
 You should notice that this means "1" is a valid path and would
 just return the integer 1.
 >>> assert Path.from_str('1').get(None) == 1
 """
 if split is None:
 if segment:
 raise ValueError("String literals can't end halfway through an attribute")
 return
 if split == "[":
 _segment = cast(Iterator[Union[Attr, Item, Literal]], segment)
 yield Item(Path(list(_segment)))
 return
 value = "".join(cast(List[str], segment))
 if split == "":
 if not value:
 return
 elif value[0].isnumeric():
 yield Literal(int(value))
 return
 elif value[0] in STRING_DELIMITERS:
 yield Literal(value[1:-1])
 return
 yield Attr(Path([Literal(value)]))
def _extract_string(chars: Iterator[str], literal: str) -> List[str]:
 """Extract string with matching delimiter."""
 segment = []
 for char in chars:
 if char == "\\":
 char = next(chars)
 elif char == literal:
 break
 segment.append(char)
 else:
 raise ValueError("String literal doesn't have a closing delimiter")
 return [literal] + segment + [literal]

To test the code just run python foo.py and it'll run all the tests.
When working with the code, you may want to move the Path.from_str tests above the class Test.

if __name__ == "__main__":
 class Test:
 foo = {"bar": 1, "baz": 2}
 bar = "baz"
 assert Path.from_str("foo").get(Test) == {"bar": 1, "baz": 2}
 assert Path.from_str(".foo").get(Test) == {"bar": 1, "baz": 2}
 assert Path.from_str('foo["bar"]').get(Test) == 1
 assert Path.from_str('foo["baz"]').get(Test) == 2
 assert Path.from_str("bar").get(Test) == "baz"
 assert Path.from_str("bar[0]").get(Test) == "b"
 assert Path.from_str("bar[1]").get(Test) == "a"
 assert Path.from_str("bar[2]").get(Test) == "z"
 path = Path.from_str("foo[bar]")
 assert path.get(Test) == 2
 path.set(Test, 3)
 assert path.get(Test) == 3
 data = {"foo": "bar", "bar": "baz"}
 assert Path.from_str('[["foo"]]').get(data) == "baz"
 assert Path.from_str("1") == Path([Literal(1)])
 assert Path.from_str('"a"') == Path([Literal("a")])
 assert Path.from_str("a") == Path([Attr(Path([Literal("a")]))])
 assert Path.from_str("a\\.b") == Path([Attr(Path([Literal("a.b")]))])
 assert Path.from_str("a.b") == Path(
 [Attr(Path([Literal("a")])), Attr(Path([Literal("b")]))]
 )
 assert Path.from_str(".a") == Path([Attr(Path([Literal("a")]))])
 assert Path.from_str('["a"]') == Path([Item(Path([Literal("a")]))])
 assert Path.from_str("[a]") == Path([Item(Path([Attr(Path([Literal("a")]))]))])
 assert Path.from_str("a[b]") == Path(
 [Attr(Path([Literal("a")])), Item(Path([Attr(Path([Literal("b")]))])),]
 )
 for work in ["a[b]", '"a"[b]', "[a].b", "[a][b]"]:
 try:
 Path.from_str(work)
 except:
 print(work)
 raise
 for fail in ["[a]b", 'a"b"', '"b"c', '[a]"b"', '"a', 'a"', "[a", "a]"]:
 try:
 ret = Path.from_str(fail)
 except:
 pass
 else:
 print(fail, ret)
 assert False
 import doctest
 doctest.testmod()

Question 2

This is just a quick answer. I don't know exactly what other constraints you have, but it feels very clumsy to have to do something like this:

path = Path.from_str("foo[bar]")
assert path.get(Test) == 2
path.set(Test, 3)
assert path.get(Test) == 3

I would instead consider for Path to take the object the path will be taken off and be a wrapper for __getitem__ and __setitem__ (and probably __delitem__ as well, for completeness sake), so that this snippet could become:

test = PathWrapper(Test)
assert test["foo[bar]"] == 2
test["foo[bar]"] = 3
assert test["foo[bar]"] == 3

This has the advantage that it would also work as a class decorator, so you can add this functionality to classes already upon definition if wanted. Implementation left as an exercise ;)

The tests for the internal representation of some common paths would then probably use some PathWrapper._internal_model property or something.

In addition, as soon as your tests become more than three cases or so, I would recommend going for a more programmatic way. For most of your cases an array of input and expected output would be fine to iterate over, but not for the cases where you modify the path. It might be worth it to at least use the unittest module, which does not add too much overhead (code wise). For the doctest module you probably have too many test cases already.

Question 3

I was thinking of adding a top level function for get, set and del. Do you think that's better or worse than you've proposed? I'm confused what Path._internal_model is, is that meant to be PathWrapper._internal_model? I agree with the last paragraph, but given that pytest has doctest support I'd just use that, my setup's just a bit messed up atm

Question 4

First of all, nice code. I like the documentation and the type hints. One thing you could do is to follow a style convention for your docstrings; see some examples here.

For me, I think the most glaring thing is that you have all of your tests in a main function. I would separate these tests by type and move them over to individual files, and I would personally use (and recommend) pytest; you could pass in an instance of your class as a fixture. If you end up with a few files of tests, I'd keep these in a separate folder and would create a Makefile for running them.

One thing I noticed is that you specify return type Any for some methods that don't really return anything, such as Attr.set.

You also have some repeated code that you could refactor out; see the methods of Path:

def set(self, obj: Any, value: Any) -> Any:
 """Set the leaf node to the entered value."""
 root_obj = obj
 for node in self._nodes[:-1]:
 obj = node.get(root_obj, obj)
 self._nodes[-1].set(root_obj, obj, value)
def delete(self, obj: Any) -> Any:
 """Delete the leaf node."""
 root_obj = obj
 for node in self._nodes[:-1]:
 obj = node.get(root_obj, obj)
 self._nodes[-1].delete(root_obj, obj)

which could be something like:

def set(self, root_obj: Any, value: Any) -> None:
 """Set the leaf node to the entered value."""
 self._nodes[-1].set(
 root_obj,
 find_node(root_obj),
 value
 )
def delete(self, root_obj: Any, value: Any) -> None:
 """Delete the leaf node."""
 self._nodes[-1].delete(
 root_obj,
 find_node(root_obj)
 )
def find_node(self, root_obj: Any) -> Any:
 """Traverses tree and finds leaf node"""
 obj = root_obj
 for node in self._nodes[:-1]:
 obj = node.get(root_obj, obj)
 return obj

Finally, there are some places where I think you would benefit from more whitespace---one example is just before the following in the function _parse.

 if segment:
 yield from _convert(split, segment)

Question 5

Please can you elaborate on your first point. Am I not following conventions? Pretty sure I am, so any additional information here - pointing to where I'm not - would be helpful. I'm not a fan of make, I use tox and nox. As for whitespace black's output can be good, but some of it is aweful.

Question 6

I mean to use the docstring to clarify arguments and return type, and to mention possible exceptions and errors raised. About make vs tox/nox, your preference; I just like having a separate way of running the tests. I agree about black. I just got a bit caught reading some of the lines because I first thought they were logically closer to the functionality above, but whitespace is a preference so that was a very minor comment.

Question 7

Could I clarify do you mean "to clarify {argument and return} type", "to clarify argument and {return} type" or "to clarify {argument and return} value(s)". Because clarifying the type seems a bit redundant. Yeah using make/tox/nox as a command to run tests is nice :)

Question 8

The third in this case since we're dealing with Any—perhaps I should be more careful with my terminology... Can I ask: how do you like dealing with tox/nox? Are you working in Python exclusively?

Question 9

Ok that makes more sense. :) Yes I work exclusively with Python. I used tox for a little bit, but hit a wall where I needed to have more control than an ini. So I mostly use nox now. Here is the skeleton nox.py file I use. To note the [:-1] is not superfluous and actually breaks the code so very severely. It changes the path from foo.bar to foo.bar.bar.

Graipher Graipher 41.6k7 gold badges70 silver badges134 bronze badges · Accepted Answer · 2020-03-31 11:44:58Z

This is just a quick answer. I don't know exactly what other constraints you have, but it feels very clumsy to have to do something like this:

path = Path.from_str("foo[bar]")
assert path.get(Test) == 2
path.set(Test, 3)
assert path.get(Test) == 3

I would instead consider for Path to take the object the path will be taken off and be a wrapper for __getitem__ and __setitem__ (and probably __delitem__ as well, for completeness sake), so that this snippet could become:

test = PathWrapper(Test)
assert test["foo[bar]"] == 2
test["foo[bar]"] = 3
assert test["foo[bar]"] == 3

This has the advantage that it would also work as a class decorator, so you can add this functionality to classes already upon definition if wanted. Implementation left as an exercise ;)

The tests for the internal representation of some common paths would then probably use some PathWrapper._internal_model property or something.

In addition, as soon as your tests become more than three cases or so, I would recommend going for a more programmatic way. For most of your cases an array of input and expected output would be fine to iterate over, but not for the cases where you modify the path. It might be worth it to at least use the unittest module, which does not add too much overhead (code wise). For the doctest module you probably have too many test cases already.

I was thinking of adding a top level function for get, set and del. Do you think that's better or worse than you've proposed? I'm confused what Path._internal_model is, is that meant to be PathWrapper._internal_model? I agree with the last paragraph, but given that pytest has doctest support I'd just use that, my setup's just a bit messed up atm

Stack Exchange Network

Navigate objects from a path provided as a string

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Navigate objects from a path provided as a string

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions