Use case: I sometimes write programs that want to store a little state between runs, e.g., "What was the last file selected last time this program ran?" I wanted a convenient way to track this state. Python's built-in shelve
module is pretty good but I wanted a way to restrict the possible set of keys in the shelf so that I wouldn't (due to a bag) store something under some unexpected key that would never be read. While I was at it I made it so the you access values using foo.a
instead of foo['a']
, though I paid a price for this.
Questions I have about this code are:
- Is there an easier way I could have accomplished this using built-in stuff or widely used libraries?
- Did I choose the right design?
- I could have tried to inherit from one of the existing shelf classes, though the
shelve
module decides at runtime what sort of shelf you should get back fromshelve.open
. - I could have tried to use descriptors to create my own pseudo-properties, sort of like what's done here, but I don't see a need to decide which fields to include at the class, rather than instance, level
- I could have tried to inherit from one of the existing shelf classes, though the
- Is there any less gross for me to use
__getattr__
and__setattr__
(without having to store a list of fields to be handled as special)? - Did I handle the nested context managers (shelf within
RestrictedShelf
) correctly? Do I need a bunch of try/except logic in__exit__
, as in the fourth example in this answer? - Should I be defining my own exception classes, identical with
Exception
beyond the class name, as I did? - Is making _check_special_keys a class method, rather than a static method, the correct approach?
import shelve
import os
import sys
class UnexpectedKeyError(Exception):
pass
class BadSpecialKey(Exception):
pass
class RestrictedShelf(object):
_SPECIAL_KEYS = ["_filepath", "_keys", "_shelf"]
@classmethod
def _check_special_keys(cls):
for key in cls._SPECIAL_KEYS:
if not key.startswith("_"):
raise BadSpecialKey("Special key does not start with underscore: '{}'".format(key))
def __init__(self, filepath, keys):
self._filepath = filepath
self._keys = list(keys) # Make a copy in case someone modifies the original list
for key in keys:
if not isinstance(key, str):
raise TypeError("String object expected for key, {} found".format(key.__class__.__name__))
# The keys in _SPECIAL_KEYS can't be used, but let's just rule out all keys that start with underscore for good measure
if key.startswith("_"):
raise ValueError("Key illegally starts with underscore: '{}'".format(key))
def __enter__(self):
self._shelf = shelve.open(self._filepath)
for key in self._shelf.keys():
if key not in self._keys:
# This is not quite the same thing as a regular KeyError
raise UnexpectedKeyError(key, "Shelf at '{}' contains unexpected key '{}'".format(self._filepath, key))
return self
def __exit__(self, exc_type, exc_val, exc_tb):
if sys.version_info[0] <= 2:
self._shelf.close()
else:
self._shelf.__exit__(exc_type, exc_val, exc_tb)
def __getattr__(self, key):
self.check_key(key)
return self._shelf.get(key)
def __setattr__(self, key, value):
if key in self._SPECIAL_KEYS:
# https://stackoverflow.com/a/7042247/2829764
super(RestrictedShelf, self).__setattr__(key, value)
else:
self.check_key(key)
self._shelf[key] = value
def check_key(self, key):
if key not in self._keys:
raise KeyError(key)
RestrictedShelf._check_special_keys()
if __name__ == "__main__":
with RestrictedShelf(os.path.expanduser("~/tmp/test.shelf"), "a b c".split()) as rs:
print(rs.a) # `None` the first time you run this; `10` thereafter
rs.a = 10
print(rs.a) # 10
print(rs.b) # None
1 Answer 1
1. Separation of concerns
There are three main features in this code:
Restricting the keys that can be stored in a dictionary.
Presenting a dictionary as a namespace.
Adding context manager behaviour to shelves in older versions of Python.
The principle of separation of concerns indicates that these should be implemented separately. See §3 below for one way to do this. This makes the code more understandable, testable, and reuseable.
2. Other review points
There are no docstrings. What does this code do? How do I use it?
The code in
__exit__
assumes thatShelf
objects have context manager behaviour in Python 3.0 or later. But the documentation says that this behaviour was only added in Python 3.4.Storing
_keys
as a list means that checking to see if a key is valid takes time proportional to the number of keys. It would be better to store the keys in a set so that checking validity takes constant time.There are implementations of
__getattr__
and__setattr__
but not__delattr__
.A caller who tried to store a key
check_key
in the shelve would end up overwriting the method of that name.
3. Revised code
Restricting the keys that can be stored in a dictionary.
import collections.abc class RestrictedMapping(collections.abc.MutableMapping): """A proxy for a mapping that restricts the set of allowable keys. >>> d = RestrictedMapping({0:0, 1:1}, range(3)) >>> del d[1] >>> d[2] = 2 >>> d[3] = 3 Traceback (most recent call last): ... KeyError: 3 >>> sorted(d.items()) [(0, 0), (2, 2)] """ def __init__(self, mapping, keys): keys = set(keys) for key in mapping: if key not in keys: raise ValueError('invalid key {}'.format(key)) self.keys = keys self.mapping = mapping super(RestrictedMapping, self).__init__() def _check_key(self, key): if key not in self.keys: raise KeyError(key) def __getitem__(self, key): self._check_key(key) return self.mapping[key] def __setitem__(self, key, value): self._check_key(key) self.mapping[key] = value def __delitem__(self, key): self._check_key(key) del self.mapping[key] def __iter__(self): return iter(self.mapping) def __len__(self): return len(self.mapping)
Presenting a dictionary as a namespace.
class Namespace(object): """A proxy for a mapping that provides access to items via attribute lookup. >>> d = dict(a=1, b=2, c=3) >>> n = Namespace(d) # n is now a proxy for d >>> n.d = 4 >>> n.a, n.b, n.c, n.d (1, 2, 3, 4) >>> del n.b >>> sorted(d.items()) [('a', 1), ('c', 3), ('d', 4)] """ def __init__(self, mapping): super(Namespace, self).__setattr__('_mapping', mapping) def __getattr__(self, attr): return self._mapping[attr] def __setattr__(self, attr, value): self._mapping[attr] = value def __delattr__(self, attr): del self._mapping[attr]
Combining restriction on keys, presentation as namespace, and addition of context manager behaviour.
from contextlib import contextmanager import shelve @contextmanager def restricted_shelve(filename, keys): shelf = shelve.open(filename) try: yield Namespace(RestrictedMapping(shelf, keys)) finally: shelf.close()
4. Namespace antipattern
Presenting a mapping as a namespace instead of a dictionary is superficially attractive because n.name
is three characters shorter than d['name']
, but it's an antipattern in Python. (The only place this pattern is used in the standard library is in the argparse
module.)
The problems are:
Attribute names need to be syntactically valid as Python identifiers, so you have to be very sure you're never going to need an attribute called
class
orreturn
or1
or whatever.There will be collisions between the data attributes and the implementation attributes. In my version in §3.2 above you can't have attributes named
_mapping
or__init__
or__dict__
or__slots__
or ...There's no easy way to get at the list of attributes.
There's not much you can do with a namespace-like object: the Python standard library usually expects you to use mappings instead. For example, you can't pass a namespace to
itertools.ChainMap
, or convert it to JSON usingjson.dumps
, or output it as CSV usingcsv.DictWriter
, and so on.
5. Answers to questions
As far as I know, there's no easy way to implement features 1 (restriction on keys) and 2 (presentation as namespace). Feature 3 (adding context manager behaviour) is more easily done using
contextlib.contextmanager
as shown in §3.3 above.Your design puts all features into one class. This technique is known as the "big ball of mud".
Since your code knows exactly when it is assigning a special attribute, it can call
super().__setattr__
itself, thus avoiding the need for special handling in the__setattr__
method. See §3.2 above.Your code looks OK to me, but writing
__exit__
methods is complicated and so (if you can) it's better to usecontextlib.contextmanager
which takes care of this complexity for you.It's sensible to define your own exception classes, but it's good practice to inherit directly from
Exception
only if there is no close match in the exception hierarchy. HereUnexpectedKeyError
is a runtime problem caused by an invalid value of the correct type, so it makes sense to inherit fromValueError
.The
BadSpecialKey
exception indicates a static programming error, not runtime exceptional behaviour, so I would use an assertion here.It would be better to avoid having special keys altogther: you'll see in §3 above that I didn't need them. But if you must, then
@classmethod
makes slightly more sense: in theory you might have a subclass with a different value for_SPECIAL_KEYS
.
-
\$\begingroup\$ Thanks, there's a lot to learn here. I see what you mean about the namespace antipattern and will avoid it in the future. However, let me add that
namedtuple
does function as a namespace. I think that may have been my inspiration; you could describe what I'm trying the accomplish with the code above as either a restricted shelf (as I did) or as a persistent mutable namedtuple. \$\endgroup\$kuzzooroo– kuzzooroo2015年06月09日 14:45:01 +00:00Commented Jun 9, 2015 at 14:45 -
\$\begingroup\$ Good point about
namedtuple
. It looks as if the recordtype package provides a mutable record type, but I don't know whether it serializes easily. \$\endgroup\$Gareth Rees– Gareth Rees2015年06月09日 15:09:14 +00:00Commented Jun 9, 2015 at 15:09