I've written a utility class BetterDict
to help me on some of my other projects. I've implemented the &
, +
, and -
operators. It extends from the base class dict
so I can use all the builtin functions without having to rewrite them all. I would like some feedback on my implementation, best practices, more pythonic way of doing this, etc.
from typing import NewType
BetterDict = NewType('BetterDict', object)
class BetterDict(dict):
def __and__(self, other: BetterDict) -> BetterDict:
"""
Returns a dictionary containing only the keys that are
present in both dicts.
"""
result = BetterDict()
for (sk, _), (ok, ov) in zip(self.items(), other.items()):
if sk == ok:
result[sk] = ov
return result
def __add__(self, other: BetterDict) -> BetterDict:
"""
Combines two dictionaries, replacing same keys with "other" values.
"""
result = BetterDict()
for k, v in self.items():
result[k] = v
for k, v in other.items():
result[k] = v
return result
def __sub__(self, other: BetterDict) -> BetterDict:
"""
Returns a dictionary containing only the keys that are not
present in both dicts.
"""
result = BetterDict()
keys = list(set(self.keys()).symmetric_difference(other.keys()))
for key in keys:
if key in self.keys():
result[key] = self[key]
if key in other.keys():
result[key] = other[key]
return result
Example Code
from better_dict import BetterDict
a = BetterDict({'a': 1})
b = BetterDict({'a': 3, 'b': 2})
print(a & b)
print(a + b)
print(a - b)
2 Answers 2
As @FMc already commented, your method __add__()
assumes that both dicts have the same keys at the same positions, which is a very bold assumption.
Furthermore, the documentation of __add__()
does not indicate, that values of other
will override values of self
, which may be surprising to the user (and users generally don't like to be surprised).
Some further nitpicks:
- You restrict
other
toBetterDict
, though it would also work with a plain olddict
. - I don't like
for (sk, _), (ok, ov) in zip(self.items(), other.items()):
since you throw away the value of the first item anyway. It should readfor sk, (ok, ov) in zip(self.keys(), other.items()):
imho. But since this assumes, that both dicts have the same keys in the same places, this should be rewritten entirely anyways. - Instead of the
NewType
you should usetyping.Self
when referring toBetterDict
within its methods.
Interesting class.
What motivated it?
You might come up with some more descriptive name than BetterDict
,
perhaps a name drawn from the business domain that motivated it.
comprehensions
The first two methods, {__and__
, __or__
},
might become just "return some dict comprehension".
C-language intersection
The builtin set
operators are implemented
in C rather than python bytecode.
I wonder if, at least for some workloads,
__and__
would go quicker if we were to use
the intersection of these sets:
self.keys() & other.keys()
dict addition
This is idiomatically a one-liner:
return { **self, **other }
We have defined certain semantics, but they are not the ones
the word "addition" would have called to mind.
For example, adding a pair of Counter
objects won't sum the counts.
Adding a pair of id_to_list
dicts won't yield .values() with longer lists.
Well now, this is funny.
I was about to suggest to you that dict |
union seems the
more natural description of the semantics this offers,
similar to set union.
So I did a little digging around, and learned that
pep-584
implemented d1 | d2
a few years back, in cPython 3.9.
Notice that it doesn't commute: (d1 | d2) != (d2 | d1)
dict subtraction
I guess I'm gonna have to quibble about this notation, too.
You're explicitly relying on .symmetric_difference()
,
which is very nice, I'm sure it's fast,
it works fine for a pair of sets s1 ^ s2
.
The natural notation for your dictionary operation
would be d1 ^ d2
.
I feel the current notation is misleading,
for same reason as above with addition.
I wonder if a pair of filter() operations would go quicker than the current code, at least for some workloads.
unit tests
It wouldn't hurt to throw in an automated
test suite.
The three print
's don't tell the Gentle Reader
what the expected output is, and they won't show
a Red bar if a maintenance engineer introduces a regression.
Example workloads and benchmark timings wouldn't hurt, either.
I don't agree with the notation choices, but this codebase does achieve its original design goals.
I would be willing to delegate or accept maintenance tasks on it.
__and__()
does not work as claimed. For example,print(BetterDict({'z': 3, 'a': 2, 'b': 1}) & BetterDict({'a': 3, 'b': 2}))
produces an empty dict. You need to figure out which keys exist in both dicts, not simply assume that items from each dict are stored in the same order (chances are, they are not). \$\endgroup\$