Recently I wrote a validator for some objects. To increase usability the validator gives the user hints on how to fix some of the found issues. Since the application is around something programmatic, and that many of the issues wouldn't make sense without identifying a path to an object; I decided to build a navigator.
At first I didn't have this so I just used getattr
and defaulted to the very helpful <Unknown>
value if that failed to get the value. This resulted in some useful; some less useful messages.
Upon completing the V1 of the below code I had the clever idea of allowing the validator to just fix the fixable issues. Since this is a flag, it allows the user to get a working version and then come back and clean it up once it works. However this quickly showed some errors in my initial design as I didn't plan for setting and deleting values.
I feel like given how catastrophically V1 failed V2 isn't going to be much better. Unlike with V1, with V2 I followed TDD, and so the code slowly built up over some time. This has caused a lot of the nuance to just disappear and I find _parse
to be a bit of an eyesore.
I have added a full description on what this achieves in the code, however a short overview would be that it's similar to eval()
just it's not evil.
- Attribute indexing similar to standard Python.
data.foo.bar.baz
. - Escape special (all) characters using a slash \.
- Index items using brackets.
data["key1"]["key2"]
. - Allow two literals strings or integers,
1
and"key"
.
However it acts quite a bit differently as each level has data
implicitly prepended - if it is not a literal.
[["key"]]
would be the same asdata[data["key"]]
in Python, and[foo]
would be the same asdata[data.foo]
rather thandata["foo"]
.
I would like a review of any or all of my code. However changes that reduce version compatibility are generally unhelpful.
import dataclasses
from typing import Any, Iterator, List, NoReturn, Optional, Union, cast
@dataclasses.dataclass
class Attr:
_index: "Path"
def get(self, root_obj: Any, obj: Any) -> Any:
"""Get attribute from object."""
return getattr(obj, self._index.get(root_obj))
def set(self, root_obj: Any, obj: Any, value: Any) -> Any:
"""Set object's attribute to value."""
setattr(obj, self._index.get(root_obj), value)
def delete(self, root_obj: Any, obj: Any) -> Any:
"""Delete object's attribute."""
delattr(obj, self._index.get(root_obj))
@dataclasses.dataclass
class Item:
_index: "Path"
def get(self, root_obj: Any, obj: Any) -> Any:
"""Index object."""
return obj[self._index.get(root_obj)]
def set(self, root_obj: Any, obj: Any, value: Any) -> Any:
"""Set object's index to value."""
obj[self._index.get(root_obj)] = value
def delete(self, root_obj: Any, obj: Any) -> Any:
"""Delete object's index."""
del obj[self._index.get(root_obj)]
@dataclasses.dataclass
class Literal:
_index: Any
def get(self, root_obj: Any, obj: Any) -> Any:
"""Get literal value."""
return self._index
def set(self, root_obj: Any, obj: Any, value: Any) -> NoReturn:
"""Unable to change a literal, the path is misconfigured."""
raise TypeError("Can't set to a literal")
def delete(self, root_obj: Any, obj: Any) -> NoReturn:
"""Unable to delete a literal, the path is misconfigured."""
raise TypeError("Can't delete a literal")
@dataclasses.dataclass
class Path:
_nodes: List[Union[Attr, Item, Literal]]
@classmethod
def from_str(cls, value: str) -> "Path":
"""
Build a path from string form.
Without any special characters this works the same way that
`getattr` and friends works.
>>> assert '1'.isnumeric()
>>> assert getattr('1', 'isnumeric')()
>>> assert Path.from_str('isnumeric').get('1')()
You can get the same functionality as attrgetter with one
argument by splitting by period.
>>> import datetime
>>> import operator
>>> assert (
... operator.attrgetter('date.today')(datetime)()
... == Path.from_str('date.today').get(datetime)()
... )
You can index an item using square brackets.
Like we're used to in Python.
>>> data = ['foo', 'bar', 'baz']
>>> assert data[1] == 'bar'
>>> assert Path.from_str('[1]').get(data) == 'bar'
You can index a dictionary by indexing by a string literal.
>>> data = {'foo': 'bar'}
>>> assert data['foo'] == 'bar'
>>> assert Path.from_str('["foo"]').get(data) == 'bar'
You can escape characters by using a period.
>>> data = {'foo.bar': 'baz'}
>>> assert data['foo.bar'] == 'baz'
>>> assert Path.from_str('["foo\\.bar"]').get(data) == 'baz'
You can set and delete by using those methods instead.
And you can mix all the above together to walk complex paths.
>>> data = {'foo': 'bar', 'bar': 'baz'}
>>> assert Path.from_str('[["foo"]]').get(data) == 'baz'
>>> class Test:
... foo = ['bar', 'baz']
>>> assert Path.from_str('foo[1]').get(Test) == 'baz'
"""
return cls(list(_parse(iter(value))))
def get(self, obj: Any) -> Any:
"""Walk the path and get the resulting value."""
root_obj = obj
for node in self._nodes:
obj = node.get(root_obj, obj)
return obj
def set(self, obj: Any, value: Any) -> Any:
"""Set the leaf node to the entered value."""
root_obj = obj
for node in self._nodes[:-1]:
obj = node.get(root_obj, obj)
self._nodes[-1].set(root_obj, obj, value)
def delete(self, obj: Any) -> Any:
"""Delete the leaf node."""
root_obj = obj
for node in self._nodes[:-1]:
obj = node.get(root_obj, obj)
self._nodes[-1].delete(root_obj, obj)
STRING_DELIMITERS = {'"', "'"}
Split = Union[None, str]
def _parse(
chars: Iterator[str],
end: Optional[str] = None,
) -> Iterator[Union[Attr, Item, Literal]]:
"""
Parse a string into an easy to use representation.
- Non-special characters are just appended to segment. It's a
list to prevent the \$O(n^2)\$ time complexity immutable
strings would have.
This is later parsed by :code:`_convert` to yield its correct
representation. This function only yields from this function.
- Split stores the previous form of split.
This can be "" - no split, a . or [ for a period or bracket split,
or None for a character that split the segment but can't split.
An example of None would be a string literal in the middle of
a segment 'foo"bar"baz'. This is not a legal construct.
That should be '"foobarbaz"'.
- This function is recursive only when interpreting the [ split.
This means foo[0].bar would have four segments. Both foo and 0
will have a split of "". As they're the start of their own
parse function. bar would have the split . as it follows a period.
The empty list between ]. would have the split None. This is
as "[foo]bar.baz" doesn't make much sense.
"""
segment = []
split: Split = ""
for char in chars:
if char == "\\":
segment.append(next(chars))
elif char in STRING_DELIMITERS:
if segment:
raise ValueError(
"String literal can't start in the middle of an attribute"
)
yield from _convert(split, _extract_string(chars, char))
split = None
elif char == ".":
yield from _convert(split, segment)
segment = []
split = "."
elif char == "[":
if segment:
yield from _convert(split, segment)
segment = []
yield from _convert("[", _parse(chars, "]"))
split = None
elif char == "]":
if char == end:
break
raise ValueError("Found a close bracket without a matching open bracket")
else:
segment.append(char)
else:
if end:
raise ValueError("Found an open bracket without a matching close bracket")
if segment:
yield from _convert(split, segment)
def _convert(
split: Split,
segment: Union[List[str], Iterator[Union[Attr, Item, Literal]]],
) -> Iterator[Union[Attr, Item, Literal]]:
"""
Convert a segment into an attribute, item or literal.
All leaf nodes are Literals, these are normally plain old strings.
However if the first segment (split == "") starts with an integer
then we will convert the segment to an integer. This allows us to
index lists.
If we have an illegal split, None, with content in the segment
then we raise here.
Whilst not a segment we pass the result of recursive _parse calls
through this function. These are converted into an Item index with
the result given to the containing Path.
Everything else is just converted to an attribute with a path
containing a single literal.
You should notice that this means "1" is a valid path and would
just return the integer 1.
>>> assert Path.from_str('1').get(None) == 1
"""
if split is None:
if segment:
raise ValueError("String literals can't end halfway through an attribute")
return
if split == "[":
_segment = cast(Iterator[Union[Attr, Item, Literal]], segment)
yield Item(Path(list(_segment)))
return
value = "".join(cast(List[str], segment))
if split == "":
if not value:
return
elif value[0].isnumeric():
yield Literal(int(value))
return
elif value[0] in STRING_DELIMITERS:
yield Literal(value[1:-1])
return
yield Attr(Path([Literal(value)]))
def _extract_string(chars: Iterator[str], literal: str) -> List[str]:
"""Extract string with matching delimiter."""
segment = []
for char in chars:
if char == "\\":
char = next(chars)
elif char == literal:
break
segment.append(char)
else:
raise ValueError("String literal doesn't have a closing delimiter")
return [literal] + segment + [literal]
To test the code just run python foo.py
and it'll run all the tests.
When working with the code, you may want to move the Path.from_str
tests above the class Test
.
if __name__ == "__main__":
class Test:
foo = {"bar": 1, "baz": 2}
bar = "baz"
assert Path.from_str("foo").get(Test) == {"bar": 1, "baz": 2}
assert Path.from_str(".foo").get(Test) == {"bar": 1, "baz": 2}
assert Path.from_str('foo["bar"]').get(Test) == 1
assert Path.from_str('foo["baz"]').get(Test) == 2
assert Path.from_str("bar").get(Test) == "baz"
assert Path.from_str("bar[0]").get(Test) == "b"
assert Path.from_str("bar[1]").get(Test) == "a"
assert Path.from_str("bar[2]").get(Test) == "z"
path = Path.from_str("foo[bar]")
assert path.get(Test) == 2
path.set(Test, 3)
assert path.get(Test) == 3
data = {"foo": "bar", "bar": "baz"}
assert Path.from_str('[["foo"]]').get(data) == "baz"
assert Path.from_str("1") == Path([Literal(1)])
assert Path.from_str('"a"') == Path([Literal("a")])
assert Path.from_str("a") == Path([Attr(Path([Literal("a")]))])
assert Path.from_str("a\\.b") == Path([Attr(Path([Literal("a.b")]))])
assert Path.from_str("a.b") == Path(
[Attr(Path([Literal("a")])), Attr(Path([Literal("b")]))]
)
assert Path.from_str(".a") == Path([Attr(Path([Literal("a")]))])
assert Path.from_str('["a"]') == Path([Item(Path([Literal("a")]))])
assert Path.from_str("[a]") == Path([Item(Path([Attr(Path([Literal("a")]))]))])
assert Path.from_str("a[b]") == Path(
[Attr(Path([Literal("a")])), Item(Path([Attr(Path([Literal("b")]))])),]
)
for work in ["a[b]", '"a"[b]', "[a].b", "[a][b]"]:
try:
Path.from_str(work)
except:
print(work)
raise
for fail in ["[a]b", 'a"b"', '"b"c', '[a]"b"', '"a', 'a"', "[a", "a]"]:
try:
ret = Path.from_str(fail)
except:
pass
else:
print(fail, ret)
assert False
import doctest
doctest.testmod()
2 Answers 2
This is just a quick answer. I don't know exactly what other constraints you have, but it feels very clumsy to have to do something like this:
path = Path.from_str("foo[bar]")
assert path.get(Test) == 2
path.set(Test, 3)
assert path.get(Test) == 3
I would instead consider for Path
to take the object the path will be taken off and be a wrapper for __getitem__
and __setitem__
(and probably __delitem__
as well, for completeness sake), so that this snippet could become:
test = PathWrapper(Test)
assert test["foo[bar]"] == 2
test["foo[bar]"] = 3
assert test["foo[bar]"] == 3
This has the advantage that it would also work as a class decorator, so you can add this functionality to classes already upon definition if wanted. Implementation left as an exercise ;)
The tests for the internal representation of some common paths would then probably use some PathWrapper._internal_model
property or something.
In addition, as soon as your tests become more than three cases or so, I would recommend going for a more programmatic way. For most of your cases an array of input and expected output would be fine to iterate over, but not for the cases where you modify the path. It might be worth it to at least use the unittest
module, which does not add too much overhead (code wise). For the doctest
module you probably have too many test cases already.
-
1\$\begingroup\$ I was thinking of adding a top level function for get, set and del. Do you think that's better or worse than you've proposed? I'm confused what
Path._internal_model
is, is that meant to bePathWrapper._internal_model
? I agree with the last paragraph, but given that pytest has doctest support I'd just use that, my setup's just a bit messed up atm \$\endgroup\$2020年03月31日 12:16:26 +00:00Commented Mar 31, 2020 at 12:16
First of all, nice code. I like the documentation and the type hints. One thing you could do is to follow a style convention for your docstrings; see some examples here.
For me, I think the most glaring thing is that you have all of your tests in a main function. I would separate these tests by type and move them over to individual files, and I would personally use (and recommend) pytest; you could pass in an instance of your class as a fixture. If you end up with a few files of tests, I'd keep these in a separate folder and would create a Makefile
for running them.
One thing I noticed is that you specify return type Any
for some methods that don't really return anything, such as Attr.set
.
You also have some repeated code that you could refactor out; see the methods of Path
:
def set(self, obj: Any, value: Any) -> Any:
"""Set the leaf node to the entered value."""
root_obj = obj
for node in self._nodes[:-1]:
obj = node.get(root_obj, obj)
self._nodes[-1].set(root_obj, obj, value)
def delete(self, obj: Any) -> Any:
"""Delete the leaf node."""
root_obj = obj
for node in self._nodes[:-1]:
obj = node.get(root_obj, obj)
self._nodes[-1].delete(root_obj, obj)
which could be something like:
def set(self, root_obj: Any, value: Any) -> None:
"""Set the leaf node to the entered value."""
self._nodes[-1].set(
root_obj,
find_node(root_obj),
value
)
def delete(self, root_obj: Any, value: Any) -> None:
"""Delete the leaf node."""
self._nodes[-1].delete(
root_obj,
find_node(root_obj)
)
def find_node(self, root_obj: Any) -> Any:
"""Traverses tree and finds leaf node"""
obj = root_obj
for node in self._nodes[:-1]:
obj = node.get(root_obj, obj)
return obj
Finally, there are some places where I think you would benefit from more whitespace---one example is just before the following in the function _parse
.
if segment:
yield from _convert(split, segment)
-
\$\begingroup\$ Please can you elaborate on your first point. Am I not following conventions? Pretty sure I am, so any additional information here - pointing to where I'm not - would be helpful. I'm not a fan of make, I use tox and nox. As for whitespace black's output can be good, but some of it is aweful. \$\endgroup\$2020年03月31日 12:43:00 +00:00Commented Mar 31, 2020 at 12:43
-
\$\begingroup\$ I mean to use the docstring to clarify arguments and return type, and to mention possible exceptions and errors raised. About make vs tox/nox, your preference; I just like having a separate way of running the tests. I agree about black. I just got a bit caught reading some of the lines because I first thought they were logically closer to the functionality above, but whitespace is a preference so that was a very minor comment. \$\endgroup\$ades– ades2020年03月31日 17:16:02 +00:00Commented Mar 31, 2020 at 17:16
-
\$\begingroup\$ Could I clarify do you mean "to clarify {argument and return} type", "to clarify argument and {return} type" or "to clarify {argument and return} value(s)". Because clarifying the type seems a bit redundant. Yeah using make/tox/nox as a command to run tests is nice :) \$\endgroup\$2020年03月31日 17:25:02 +00:00Commented Mar 31, 2020 at 17:25
-
\$\begingroup\$ The third in this case since we're dealing with
Any
—perhaps I should be more careful with my terminology... Can I ask: how do you like dealing with tox/nox? Are you working in Python exclusively? \$\endgroup\$ades– ades2020年03月31日 17:31:38 +00:00Commented Mar 31, 2020 at 17:31 -
2\$\begingroup\$ Ok that makes more sense. :) Yes I work exclusively with Python. I used tox for a little bit, but hit a wall where I needed to have more control than an ini. So I mostly use nox now. Here is the skeleton nox.py file I use. To note the
[:-1]
is not superfluous and actually breaks the code so very severely. It changes the path fromfoo.bar
tofoo.bar.bar
. \$\endgroup\$2020年03月31日 17:39:05 +00:00Commented Mar 31, 2020 at 17:39