[Python-Dev] PEP 455 -- TransformDict

2015年5月14日 07:32:49 -0700

Before the Python 3.5 feature freeze, I should step-up and
formally reject PEP 455 for "Adding a key-transforming
dictionary to collections".
I had completed an involved review effort a long time ago
and I apologize for the delay in making the pronouncement.
What made it a interesting choice from the outset is that the
idea of a "transformation" is an enticing concept that seems
full of possibility. I spent a good deal of time exploring
what could be done with it but found that it mostly fell short
of its promise.
There were many issues. Here are some that were at the top:
* Most use cases don't need or want the reverse lookup feature
 (what is wanted is a set of one-way canonicalization functions).
 Those that do would want to have a choice of what is saved
 (first stored, last stored, n most recent, a set of all inputs,
 a list of all inputs, nothing, etc). In database terms, it
 models a many-to-one table (the canonicalization or
 transformation function) with the one being a primary key into
 another possibly surjective table of two columns (the
 key/value store). A surjection into another surjection isn't
 inherently reversible in a useful way, nor does it seem to be a
 common way to model data.
* People are creative at coming up with using cases for the TD
 but then find that the resulting code is less clear, slower,
 less intuitive, more memory intensive, and harder to debug than
 just using a plain dict with a function call before the lookup:
 d[func(key)]. It was challenging to find any existing code
 that would be made better by the availability of the TD.
* The TD seems to be all about combining data scrubbing
 (case-folding, unicode canonicalization, type-folding, object
 identity, unit-conversion, or finding a canonical member of an
 equivalence class) with a mapping (looking-up a value for a
 given key). Those two operations are conceptually orthogonal.
 The former doesn't get easier when hidden behind a mapping API
 and the latter loses the flexibility of choosing your preferred
 mapping (an ordereddict, a persistentdict, a chainmap, etc) and
 the flexibility of establishing your own rules for whether and
 how to do a reverse lookup.
Raymond Hettinger
P.S. Besides the core conceptual issues listed above, there
are a number of smaller issues with the TD that surfaced
during design review sessions. In no particular order, here
are a few of the observations:
* It seems to require above average skill to figure-out what
 can be used as a transform function. It is more
 expert-friendly than beginner friendly. It takes a little
 while to get used to it. It wasn't self-evident that
 transformations happen both when a key is stored and again
 when it is looked-up (contrast this with key-functions for
 sorting which are called at most once per key).
* The name, TransformDict, suggests that it might transform the
 value instead of the key or that it might transform the
 dictionary into something else. The name TransformDict is so
 general that it would be hard to discover when faced with a
 specific problem. The name also limits perception of what
 could be done with it (i.e. a function that logs accesses
 but doesn't actually change the key).
* The tool doesn't self describe itself well. Looking at the
 help(), or the __repr__(), or the tooltips did not provide
 much insight or clarity. The dir() shows many of the
 _abc implementation details rather than the API itself.
* The original key is stored and if you change it, the change
 isn't stored. The _original dict is private (perhaps to
 reduce the risk of putting the TD in an inconsistent state)
 but this limits access to the stored data.
* The TD is unsuitable for bijections because the API is
 inherently biased with a rich group of operators and methods
 for forward lookup but has only one method for reverse lookup.
* The reverse feature is hard to find (getitem vs __getitem__)
 and its output pair is surprising and a bit awkward to use.
 It provides only one accessor method rather that the full
 dict API that would be given by a second dictionary. The
 API hides the fact that there are two underlying dictionaries.
* It was surprising that when d[k] failed, it failed with
 transformation exception rather than a KeyError, violating
 the expectations of the calling code (for example, if the
 transformation function is int(), the call d["12"]
 transforms to d[12] and either succeeds in returning a value
 or in raising a KeyError, but the call d["12.0"] fails with
 a TypeError). The latter issue limits its substitutability
 into existing code that expects real mappings and for
 exposing to end-users as if it were a normal dictionary.
* There were other issues with dict invariants as well and
 these affected substitutability in a sometimes subtle way.
 For example, the TD does not work with __missing__().
 Also, "k in td" does not imply that "k in list(td.keys())".
* The API is at odds with wanting to access the transformations.
 You pay a transformation cost both when storing and when
 looking up, but you can't access the transformed value itself.
 For example, if the transformation is a function that scrubs
 hand entered mailing addresses and puts them into a standard
 format with standard abbreviations, you have no way of getting
 back to the cleaned-up address.
* One design reviewer summarized her thoughts like this:
 "There is a learning curve to be climbed to figure out what
 it does, how to use it, and what the applications [are].
 But, the [working out the same] examplea with plain dicts
 requires only basic knowledge." -- Patricia
_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to