homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Add a case-insensitive case-preserving dict
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Arfrever, barry, demian.brecht, eric.araujo, eric.smith, ethan.furman, georg.brandl, jaraco, mark.dickinson, martin.panter, mrabarnett, pitrou, r.david.murray, rhettinger, sbt, serhiy.storchaka, theller, tim.peters, vstinner
Priority: normal Keywords: patch

Created on 2013年09月09日 11:22 by pitrou, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
transform.patch pitrou, 2013年09月10日 12:38 review
transformdict.patch pitrou, 2013年09月10日 19:40 review
ctransformdict.patch serhiy.storchaka, 2013年09月11日 09:15 review
dict__transform__.patch serhiy.storchaka, 2013年09月13日 19:52 Add dict.__transform__ review
transformdict2.patch pitrou, 2013年09月14日 14:18 review
transformdict3.patch pitrou, 2013年09月14日 21:35 review
Messages (86)
msg197359 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月09日 11:22
This is a very common need when implementing network protocols. You want to match keys case-insensitively but also preserve the original casing (e.g. for presentation).
When searching on the Web, you see many people reimplementing their own variant (often incomplete, or buggy). For example, Twisted has its own, the email package has something resembling it, WebOb also.
Having an implementation in the stdlib would spare many people the effort, ensure the implementation is complete and well-tested, and perhaps also add some optimizations to mitigate the overhead compared to a plain dict.
Note this is an instance of a more general pattern, where they key used for matching is derived from the lookup key using a constant derivation function. So maybe we want to implement the more general pattern and let users specify str.lower as the key derivation function.
msg197360 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2013年09月09日 11:50
Surely a case-insensitive dict should use str.casefold, not str.lower?
msg197361 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月09日 11:54
> Surely a case-insensitive dict should use str.casefold, not
> str.lower?
Perhaps. Network protocols will usually only allow ASCII in parts
where case is insensitive (e.g. header names), so it shouldn't make a
difference.
Implementing the generic pattern means this is left at the user's
discretion, though.
msg197362 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月09日 13:02
For the record, I have my own implementation here:
https://bitbucket.org/optiflowsrd/obelus/src/tip/obelus/casedict.py?at=default
https://bitbucket.org/optiflowsrd/obelus/src/tip/obelus/test/test_casedict.py?at=default 
msg197366 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013年09月09日 14:59
For the record, email is not a good argument for this, since email could not use this data structure (its data structure is *not* a dict, but a list with dict-like features grafted on).
I do think this would be useful, and the generic version (analogous to defaultdict) would seem to make sense.
msg197367 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月09日 18:11
Ok, let's bikeshed this a bit. What should be the name?
- projectdict?
- normalizedict?
- normdict?
- derivedict?
- transformdict?
- any ideas?
msg197368 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2013年09月09日 18:19
I would say
- transformkeydict
Too bad we can't just add an extra 'transform_key' keyword to defaultdict.
msg197369 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2013年09月09日 18:34
It would be nice to combine the behaviors that defaultdict and the case-insensitive comparisons.
msg197370 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月09日 18:36
> It would be nice to combine the behaviors that defaultdict and the case-insensitive comparisons.
Any use case? In my experience they are used in completely different
situations. defaultdict mostly to use the writing of some (internal)
algorithms, a case-insensitive dict to store user-visible data.
msg197376 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2013年09月09日 19:13
Just today I was using a defaultdict where the keys are stock symbols. They're case insensitive (at least for this particular application).
In this case I just str.upper everything, but it would be a nice feature to case-preserve the keys that I pre-populate. I care less about the keys from user data.
msg197377 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月09日 19:14
> In this case I just str.upper everything, but it would be a nice
> feature to case-preserve the keys that I pre-populate. I care less
> about the keys from user data.
Well, stock symbols are what I would call user data :-)
msg197379 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2013年09月09日 19:18
True enough!
I was trying to distinguish keys that I populate with initial values (mostly stock indexes) versus those where I just read values from a user-supplied file. When I populate the index values, I'd like to preserve the case I initially used. When I use user-supplied values, I don't know that the first value I use to populate the defaultdict has any more meaning that the last one I see.
It would just be a nice-to-have feature for which I have a real use case. It's not so critical that I can't work around it.
msg197380 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2013年09月09日 19:24
To the point, however, Eric's example would make use of both the defaultdict portion and the transformkey portion in a single dict.
msg197381 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2013年09月09日 19:26
mappeddict?
Re defaultdict, you could write a dict that does all of these things, called superdict! :-)
msg197387 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013年09月09日 19:47
coercekeydict
msg197389 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月09日 19:51
I would like to shorten the proposals to "transformdict" and "coercedict".
(after all, transforming the values would have little sense: you can do it yourself trivially)
msg197390 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年09月09日 19:58
FYI os.environ uses something similar: keys and values are encoded and decoded using functions. So any transformation is supported.
http://hg.python.org/cpython/file/eac63e7ceb03/Lib/os.py#l636
On UNIX, the encoder and decoder are os.fsencode() and os.fsdecode() (not exactly, the real functions are more strict on the input type).
On Windows, the encoder converts the key to uppercase.
msg197391 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月09日 20:00
> FYI os.environ uses something similar: keys and values are encoded and
> decoded using functions. So any transformation is supported.
I don't think this is the same situation. os.environ has bijective
transformations, which don't pose any implementation challenge.
The whole point of a "transformdict" is to allow for multiple keys to
actually map to the same dict entry.
msg197392 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013年09月09日 20:00
+1 For the general idea
+1 For the more generic approach of which "lowercase" is just one special case
+10 to make this a PEP so that more people have a chance to express their opinion (currently only those who noticed it on the issues mailing list). I find the issue tracker a very bad medium for any kind of brain-storming or bikeshedding.
msg197393 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月09日 20:02
> +10 to make this a PEP so that more people have a chance to express
> their opinion (currently only those who noticed it on the issues
> mailing list). I find the issue tracker a very bad medium for any kind
> of brain-storming or bikeshedding.
Well, I don't think there is a lot of brainstorming to be done here: we
are talking about a single well-defined functionality. I don't mind
writing a PEP if other people ask for it, but I'd rather spend my time
on more important things.
As for bikeshedding, there is no "good medium" for it, really :-) At
worse we could ask python-dev for their naming contributions (a great
idea, surely).
msg197398 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013年09月09日 20:41
Indeed. Although there was apparently some call for it, it doesn't sound from a quick google like defaultdict was deemed to require a PEP. Presumably the informed audience should be wider than this issue, though.
I also note that defaultdict is implemented via a special method on dict itself (__missing__), and if this one was implemented the same way it would be easy to combine the features.
msg197399 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2013年09月09日 20:43
Precisely what I was thinking. :)
msg197400 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月09日 20:44
> I also note that defaultdict is implemented via a special method on
> dict itself (__missing__), and if this one was implemented the same
> way it would be easy to combine the features.
It's not that simple: to remember the original casing you need either a
second container, or to use (original_key, value) tuples as values. Both
approaches have non-trivial repercussions on the implementation of many
methods.
msg197401 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月09日 20:46
(as a sidenote, you might want a case-insensitive OrderedDict as well, I see no reason to make a special case for defaultdict here)
msg197402 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年09月09日 21:12
See also discussion on a topic: http://comments.gmane.org/gmane.comp.python.ideas/18469 .
Proposed names: custom_dict, KeyedDictionary, Dictionary.
It will be confused if this dict will not be compatible with PyDict API. It is possible to add such feature directly into the dict class (I experimented with IdentityDict).
msg197403 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月09日 21:16
> Proposed names: custom_dict, KeyedDictionary, Dictionary.
Sounds much too vague and un-specific.
> It will be confused if this dict will not be compatible with PyDict
> API.
Why? Many custom dict-like classes aren't.
> It is possible to add such feature directly into the dict class (I
> experimented with IdentityDict).
Can you explain how?
msg197405 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年09月09日 21:51
> Why? Many custom dict-like classes aren't.
And this is weird (issue10977).
> Can you explain how?
By patching Objects/dictobject.c of course. I suppose it should require changing about 400 lines of code, a little more than for IdentityDict. When you provides the specification perhaps I will provide a patch.
msg197406 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月09日 21:54
> By patching Objects/dictobject.c of course.
Am I stupid :-)
> I suppose it should require changing about 400 lines of code, a little
> more than for IdentityDict. When you provides the specification
> perhaps I will provide a patch.
Well, take a look at the code I've pointed to above. I'm curious to know
how you'll integrate it in dictobject.c without slowing down normal dict
objects, and without making them bigger.
msg197410 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年09月09日 22:22
> I'm curious to know how you'll integrate it in dictobject.c without slowing down normal dict objects, and without making them bigger.
It of course will make a size of source file bigger, but shouldn't affect a size or performance of normal dicts. A dict object contains dk_lookup. Constructor for keyed dict (subclass of ) should initialize it with specialized function which calls the "key" function and recalculate a hash (yes, with this simple approach a hash will be calculated twice, for original and for transformed keys). Hmm, actually it can be even simpler than for IdentityDict (for which not calculating a hash was important). Also some other methods which relies on dict implementation details (e.g. making a copy of dict) should be modified.
The most cumbersome part is the tests. Unfortunately I lost my tests for IdentityDict (used hg diff without --git). It will be good if your provide complete test suite.
msg197411 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月09日 22:25
> It of course will make a size of source file bigger, but shouldn't
> affect a size or performance of normal dicts. A dict object contains
> dk_lookup.
You need to keep both the original keys and the transformed keys. It's not only about transforming keys on lookup. Otherwise, yes, it's trivial.
msg197412 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月09日 22:27
> It will be good if your provide complete test suite.
... Again, take a look at the code I've posted above ...
msg197430 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月10日 09:35
(now relayed on python-dev)
msg197434 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月10日 12:38
I'm uploading a pure Python transformdict implementation + tests, for Serhiy's benefits (and others') :-)
msg197445 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013年09月10日 14:53
With the current patch __repr__() will fail if the untransformed key is unhashable:
>>> d = collections.transformdict(id)
>>> L = [1,2,3]
>>> d[L] = None 
>>> d.keys()
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "C:\Repos\cpython-dirty\lib\collections\abc.py", line 444, in __repr__
 return '{0.__class__.__name__}({0._mapping!r})'.format(self)
 File "C:\Repos\cpython-dirty\lib\collections\__init__.py", line 944, in __repr__
 self._transform, repr(dict(self)))
TypeError: unhashable type: 'list'
msg197446 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月10日 14:56
> With the current patch __repr__() will fail if the untransformed key
> is unhashable:
Yeah, the __repr__() implementation will be a bit annoying to get right :-)
msg197457 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2013年09月10日 17:51
FTR this is one message from the previous thread about this: https://mail.python.org/pipermail/python-ideas/2010-June/007332.html 
msg197464 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月10日 19:40
Updated patch: fixes repr(), and retains the first key not the last.
msg197469 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013年09月11日 02:41
I would *really* like for this to start outside the standard library.
It needs to mature with user feedback before being dumped
in the collections module (which was never intended to be a
giant pile of every collection a person could think of). 
Adding yet more dictionary variants is an example of
way-too-many-ways-to-do-it.
msg197479 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年09月11日 08:41
Here is a preliminary C implementation.
msg197516 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月12日 10:14
Serhiy, any benchmarks for your implementation? Does it slow down regular dicts?
msg197525 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2013年09月12日 14:08
On 09/11/2013 02:39 PM, Tim Delaney wrote on PyDev:
> 
> I would think that retrieving the keys from the dict would return the transformed keys (I'd 
> call them canonical keys).
The more I think about this the more I agree. A canonicaldict with a key function that simply stored the transformed key and it's value would seem to be a lot simpler:
 - no need to store a separate "presentation" key
 - no confusion about which of the first key/last key seen is stored
 - no mistakes with the "first" key not being added before real data
 and getting the presentation key wrong
Further, in order to store the non-canonical keys a separate list must be kept of the keys to preseed the canonicaldict; if we store the canonical keys a separate list must be kept for presentation purposes -- so worst case scenario we're keeping the same amount of information and best-case scenario the presentation of the keys doesn't matter and we just saved ourselves an extra data structure.
msg197526 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013年09月12日 14:16
It would be simpler, but it would also be useless for the actual use case for which this issue was opened.
msg197527 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2013年09月12日 14:34
True, but how big a deal is that?
For one, it seems questionable to have the presentation portion of the data be part of the key.
For two, when presentation is important a separate list must be kept anyway to preseed the dict; so just use that list to cycle through the canonicaldict:
--> some_dict = some_function_that_returns_a_conanicaldict()
--> presentation_list = ['IBM','Intel','AMD']
--> for company in presentation_list:
... key = some_dict.key[company] # demo purposes only
... value = some_dict[company]
... print(key, company, value)
ibm IBM 2172
intel Intel 3210
amd AMD 4399
msg197528 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月12日 14:50
Ethan, please don't post the same message *both* on the tracker
and on the mailing-list. I'm sure most people here also read
the ML thread.
msg197529 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2013年09月12日 14:53
Right, sorry.
msg197531 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013年09月12日 15:02
You are conceptualizing this very differently. In our view, this data structure is for cases where the original key is the most important piece of information (about the keys). The transformation in the lookup process is entirely in the service of looking up the value paired with that original key when there is more than one possible representation of that key. It is the original key that is critical when re-serializing the data or otherwise making use of the keys for anything other than lookup. So this is about making the data structure succinctly model the problem domain, which is what OO is supposed to be good at :)
msg197533 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月12日 15:12
> R. David Murray added the comment:
> 
> You are conceptualizing this very differently. In our view, this
> data structure is for cases where the original key is the most
> important piece of information (about the keys). The transformation
> in the lookup process is entirely in the service of looking up the
> value paired with that original key when there is more than one
> possible representation of that key. It is the original key that is
> critical when re-serializing the data or otherwise making use of the
> keys for anything other than lookup. So this is about making the
> data structure succinctly model the problem domain, which is what OO
> is supposed to be good at :)
Thanks for putting it much more convincingly than my python-dev
response :-)
msg197635 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年09月13日 19:52
Here is an alternative C implementation. It adds to the dict class support of the __transform__() method. If this method is defined in dict subclass it used to transforming keys. collections.TransformDict is just utilizes this feature as collections.defaultdict utilizes __missing__(). This patch changes twice less C code than previous one (227 vs 474 lines).
msg197637 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年09月13日 20:00
> Does it slow down regular dicts?
I were surprized, but yes. The ComplexPythonFunctionCalls test from pybench is 40% slower with ctransformdict.patch (and I still don't known why). With dict__transform__.patch it is only 2% slower. All other pybench tests are approximately equal.
msg197644 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月13日 20:23
> > Does it slow down regular dicts?
> 
> I were surprized, but yes. The ComplexPythonFunctionCalls test from
> pybench is 40% slower with ctransformdict.patch (and I still don't
> known why). With dict__transform__.patch it is only 2% slower. All
> other pybench tests are approximately equal.
Did you try any other microbenchmarks? Your approach sounds promising.
msg197648 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年09月13日 20:38
> Did you try any other microbenchmarks? Your approach sounds promising.
Any microbenchmarks which I tried did not show any interesting. Until I found the cause of slowing down ComplexPythonFunctionCalls I have no idea which tests can be representable.
Of course you can run benchmarks yourself.
msg197710 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月14日 14:18
Updated patch adding the getitem() method.
msg197711 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月14日 14:18
Note: I haven't renamed transformdict to TransformDict yet.
msg197733 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月14日 21:35
Uploading new patch with added transform_func property.
msg197969 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2013年09月17日 08:05
Note that I'm strongly against this name of the getitem() method.
msg197970 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月17日 08:11
> Georg Brandl added the comment:
> 
> Note that I'm strongly against this name of the getitem() method.
Any suggestion?
msg197973 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2013年09月17日 08:57
Not really. Would "entry" be acceptable instead of "item"?
msg197975 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月17日 09:10
> Georg Brandl added the comment:
> 
> Not really. Would "entry" be acceptable instead of "item"?
getentry() sounds decent to me, but it loses the parallel to popitem() and items().
msg197980 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2013年09月17日 10:16
Hmm, I didn't consider popitem(). Maybe I'm too paranoid about users confusing __getitem__() and getitem() after all :)
msg197981 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年09月17日 11:15
But why not getkey()? Why you need return value too?
msg197982 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年09月17日 12:06
> But why not getkey()? Why you need return value too?
Because it's more useful to return both.
msg197983 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年09月17日 13:08
Sorry, I don't understand why it's more useful. We need create a tuple and then index it or unpack it and drop one of elements. This only muddles away a time and programmer's attention.
msg197984 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013年09月17日 13:34
Because most often the time at which you want the original key is the point at which you are about to re-serialize the data...so you need the value too.
msg197986 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013年09月17日 14:03
I do think getitem is the most natural name for the method.
msg197987 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年09月17日 14:08
Oh, could anyone borrow Guido's time machine and rename either __getitem__() to __getvalue__() or items() to entries()?
msg197989 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2013年09月17日 14:12
On 09/17/2013 09:34 AM, R. David Murray wrote:
> 
> R. David Murray added the comment:
> 
> Because most often the time at which you want the original key is the point at which you are about to re-serialize the data...so you need the value too.
I can't think of a case where I'd need (original_key, value) where I
wouldn't also be iterating over items(). Especially so if I'm serializing.
On the other hand, I don't have a use case for the original key, anyway.
So I don't have a strong feeling about this, other than it feels odd
that the answer to the original question (I think on python-dev) "how do
we get the original key back?" is answered by "by giving you the
original key and its value".
msg197990 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2013年09月17日 14:19
On 09/17/2013 10:12 AM, Eric V. Smith wrote:
> On the other hand, I don't have a use case for the original key, anyway.
> So I don't have a strong feeling about this, other than it feels odd
> that the answer to the original question (I think on python-dev) "how do
> we get the original key back?" is answered by "by giving you the
> original key and its value".
I meant: I don't have a use case for finding the original key outside of
iterating over items().
msg198281 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2013年09月22日 14:32
I just want to say thanks for working on this. I also have needed this functionality for various needs in the past. To fulfill my needs, I wrote this implementation:
https://bitbucket.org/jaraco/jaraco.util/src/1ab3e7061f96bc5e179b6b2c46b06d1c20f87129/jaraco/util/dictlib.py?at=default#cl-221
That implementation is used in the irc library for a case-insensitive dict, but using the IRC-specific standard for case insensitivity (https://bitbucket.org/jaraco/irc/src/1576b10dc2923d4d7234319d2d1e11a5080e1f7d/irc/dict.py?at=default#cl-49).
I share this just to add a +1 for the need and to provide additional use cases and implementations for reference.
msg198912 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年10月03日 19:34
Raymond, have you had time to look at this?
msg199652 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013年10月13日 03:41
Antoine, is the PEP ready for review?
msg199708 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年10月13日 14:28
> Antoine, is the PEP ready for review?
Well, I think it is. Do you think other points should be addressed in it?
We still have some time.
msg205979 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013年12月12日 20:34
+1 for this (for Python 3.5, now, I guess). I've just found another place where I'd use it.
Looking at the implementation, one thing surprises me a bit: I'd expect the KeyError from a 'del' or 'pop' operation to have the untransformed key rather than the transformed key in its .args.
How about '_keys' and '_values' for the slot names, in place of '_original' and '_data'?
msg205995 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013年12月13日 00:22
Mark, what was the use case you found?
msg206027 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013年12月13日 08:12
> Mark, what was the use case you found?
It's essentially an IdentityDict, though I've found other more specific transforms useful.
I was writing a tool to find reference cycles between Python objects (we have a customer application that's working in a multithreaded COM environment and has to ensure that COM objects are released on the same types of threads they were created on, so we have to be careful about cyclic garbage and delayed garbage collection).
The graph of Python objects (class 'ObjectGraph') is modelled as a fairly standard directed graph (set of vertices, set of edges, two dictionaries mapping each edge to its head and tail), but of course for this application the dict and set have to be based on object identity rather than normal equality. Using a TransformDict (and an IdentitySet) lets me write the standard graph algorithms (e.g., for finding strongly connected components) in a natural way, leaving it to the TransformDict and IdentitySet to do the necessary id() conversions under the hood.)
I also have a similar AnnotatedGraph object (a sort of offline version of the ObjectGraph), where the edges and vertices carry additional information and it's convenient to be able to use a lightweight ID rather than an entire vertex or edge as a dictionary key. Again, using a TransformDict lets one hide the details and present the graph manipulation code readably and naturally.
Some code here, if you're interested:
https://github.com/mdickinson/refcycle/blob/refactor/refcycle/object_graph.py
Caveat: it's work in progress.
msg206195 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013年12月14日 17:26
[Mark Dickinson]
> It's essentially an IdentityDict, though I've found other 
> more specific transforms useful.
Have any of the applications had use for the part of the API that looks up the original, untransformed key?
msg206196 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013年12月14日 18:16
Not my applications, no.
msg234050 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015年01月15日 04:56
3.5 is almost here; Raymond, care to make a ruling?
msg234062 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015年01月15日 07:28
Yes.
I intend to button this one up before long.
msg234077 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015年01月15日 14:29
For the record, this is related to PEP 455 (key-transforming dictionary)
msg234087 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015年01月15日 17:04
The API is simple and well defined, the addition is small, I don't understand what is the problem with this enhancement.
msg236105 - (view) Author: Demian Brecht (demian.brecht) * (Python triager) Date: 2015年02月16日 16:26
Some refactoring that I'm working on for http.client could use this (currently I have it as part of my patch set). I haven't run into any issues using it and it's definitely useful. Would be nice to get this merged.
msg236161 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015年02月18日 02:11
FYI, the PEP for this isn't going to be accepted (I'm working on the write-up for the reasons why and will post on python-dev). That said, it would be great if the code continues to be improved and then posted on the Python Package Index.
msg236163 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015年02月18日 02:28
I will be interested to see those reasons. Another way to do a similar thing might be using a Key(value, transform) class, somewhat along the lines of Issue 20632, but as a separate class rather than part of the core type system. But I have not thought that idea through very much.
msg236178 - (view) Author: Demian Brecht (demian.brecht) * (Python triager) Date: 2015年02月18日 15:16
> I will be interested to see those reasons.
+1. Something like what this PEP proposed would be beneficial in a few places throughout the library (header and cookie implementations would definitely benefit rather than having to deal with buggy normalization themselves). It’s unfortunate that this isn’t going to be approved.
msg236909 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2015年02月28日 20:51
I'm also eager to hear what limitations prevented the acceptance. Please do link back here when you've posted.
I have to say, I'm not entirely surprised. In my implementation, I struggled with some cases, and it certainly doesn't feel like a fully safe implementation.
That said, since I mentioned the implementation in jaraco.util earlier, I wanted to announce that those implementations (FoldedCase and FoldedCaseKeyedDict) have been moved to two libraries (jaraco.text and jaraco.collections).
msg243370 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2015年05月16日 22:07
From https://mail.python.org/pipermail/python-dev/2015-May/140003.html
======================================================================
Before the Python 3.5 feature freeze, I should step-up and
formally reject PEP 455 for "Adding a key-transforming
dictionary to collections".
I had completed an involved review effort a long time ago
and I apologize for the delay in making the pronouncement.
What made it a interesting choice from the outset is that the
idea of a "transformation" is an enticing concept that seems
full of possibility. I spent a good deal of time exploring
what could be done with it but found that it mostly fell short
of its promise.
There were many issues. Here are some that were at the top:
* Most use cases don't need or want the reverse lookup feature
 (what is wanted is a set of one-way canonicalization functions).
 Those that do would want to have a choice of what is saved
 (first stored, last stored, n most recent, a set of all inputs,
 a list of all inputs, nothing, etc). In database terms, it
 models a many-to-one table (the canonicalization or
 transformation function) with the one being a primary key into
 another possibly surjective table of two columns (the
 key/value store). A surjection into another surjection isn't
 inherently reversible in a useful way, nor does it seem to be a
 common way to model data.
* People are creative at coming up with using cases for the TD
 but then find that the resulting code is less clear, slower,
 less intuitive, more memory intensive, and harder to debug than
 just using a plain dict with a function call before the lookup:
 d[func(key)]. It was challenging to find any existing code
 that would be made better by the availability of the TD.
* The TD seems to be all about combining data scrubbing
 (case-folding, unicode canonicalization, type-folding, object
 identity, unit-conversion, or finding a canonical member of an
 equivalence class) with a mapping (looking-up a value for a
 given key). Those two operations are conceptually orthogonal.
 The former doesn't get easier when hidden behind a mapping API
 and the latter loses the flexibility of choosing your preferred
 mapping (an ordereddict, a persistentdict, a chainmap, etc) and
 the flexibility of establishing your own rules for whether and
 how to do a reverse lookup.
Raymond Hettinger
P.S. Besides the core conceptual issues listed above, there
are a number of smaller issues with the TD that surfaced
during design review sessions. In no particular order, here
are a few of the observations:
* It seems to require above average skill to figure-out what
 can be used as a transform function. It is more
 expert-friendly than beginner friendly. It takes a little
 while to get used to it. It wasn't self-evident that
 transformations happen both when a key is stored and again
 when it is looked-up (contrast this with key-functions for
 sorting which are called at most once per key).
* The name, TransformDict, suggests that it might transform the
 value instead of the key or that it might transform the
 dictionary into something else. The name TransformDict is so
 general that it would be hard to discover when faced with a
 specific problem. The name also limits perception of what
 could be done with it (i.e. a function that logs accesses
 but doesn't actually change the key).
* The tool doesn't self describe itself well. Looking at the
 help(), or the __repr__(), or the tooltips did not provide
 much insight or clarity. The dir() shows many of the
 _abc implementation details rather than the API itself.
* The original key is stored and if you change it, the change
 isn't stored. The _original dict is private (perhaps to
 reduce the risk of putting the TD in an inconsistent state)
 but this limits access to the stored data.
* The TD is unsuitable for bijections because the API is
 inherently biased with a rich group of operators and methods
 for forward lookup but has only one method for reverse lookup.
* The reverse feature is hard to find (getitem vs __getitem__)
 and its output pair is surprising and a bit awkward to use.
 It provides only one accessor method rather that the full
 dict API that would be given by a second dictionary. The
 API hides the fact that there are two underlying dictionaries.
* It was surprising that when d[k] failed, it failed with
 transformation exception rather than a KeyError, violating
 the expectations of the calling code (for example, if the
 transformation function is int(), the call d["12"]
 transforms to d[12] and either succeeds in returning a value
 or in raising a KeyError, but the call d["12.0"] fails with
 a TypeError). The latter issue limits its substitutability
 into existing code that expects real mappings and for
 exposing to end-users as if it were a normal dictionary.
* There were other issues with dict invariants as well and
 these affected substitutability in a sometimes subtle way.
 For example, the TD does not work with __missing__().
 Also, "k in td" does not imply that "k in list(td.keys())".
* The API is at odds with wanting to access the transformations.
 You pay a transformation cost both when storing and when
 looking up, but you can't access the transformed value itself.
 For example, if the transformation is a function that scrubs
 hand entered mailing addresses and puts them into a standard
 format with standard abbreviations, you have no way of getting
 back to the cleaned-up address.
* One design reviewer summarized her thoughts like this:
 "There is a learning curve to be climbed to figure out what
 it does, how to use it, and what the applications [are].
 But, the [working out the same] examplea with plain dicts
 requires only basic knowledge." -- Patricia
History
Date User Action Args
2022年04月11日 14:57:50adminsetgithub: 63186
2015年05月16日 22:07:19ethan.furmansetstatus: pending -> closed
resolution: rejected
messages: + msg243370

stage: resolved
2015年05月16日 19:18:57serhiy.storchakasetstatus: open -> pending
2015年03月31日 00:58:26martin.panterlinkissue5550 dependencies
2015年02月28日 20:51:16jaracosetmessages: + msg236909
2015年02月18日 15:16:25demian.brechtsetmessages: + msg236178
2015年02月18日 02:28:31martin.pantersetmessages: + msg236163
2015年02月18日 02:11:47rhettingersetmessages: + msg236161
2015年02月16日 16:26:33demian.brechtsetnosy: + demian.brecht
messages: + msg236105
2015年01月15日 17:04:53vstinnersetmessages: + msg234087
2015年01月15日 14:29:10martin.pantersetnosy: + martin.panter
messages: + msg234077
2015年01月15日 07:28:37rhettingersetmessages: + msg234062
2015年01月15日 04:56:50ethan.furmansetmessages: + msg234050
2013年12月14日 18:16:44mark.dickinsonsetmessages: + msg206196
2013年12月14日 17:26:59rhettingersetmessages: + msg206195
2013年12月13日 13:38:44eli.benderskysetnosy: - eli.bendersky
2013年12月13日 08:12:25mark.dickinsonsetmessages: + msg206027
2013年12月13日 00:22:21rhettingersetmessages: + msg205995
versions: + Python 3.5, - Python 3.4
2013年12月12日 20:34:01mark.dickinsonsetmessages: + msg205979
2013年10月13日 18:08:25mark.dickinsonsetnosy: + mark.dickinson
2013年10月13日 14:28:43pitrousetmessages: + msg199708
2013年10月13日 03:41:40rhettingersetmessages: + msg199652
2013年10月04日 22:50:49rhettingersetassignee: rhettinger
2013年10月03日 19:34:44pitrousetmessages: + msg198912
2013年09月22日 14:32:39jaracosetnosy: + jaraco
messages: + msg198281
2013年09月17日 14:19:52eric.smithsetmessages: + msg197990
2013年09月17日 14:12:43eric.smithsetmessages: + msg197989
2013年09月17日 14:08:57serhiy.storchakasetmessages: + msg197987
2013年09月17日 14:03:07r.david.murraysetmessages: + msg197986
2013年09月17日 13:34:17r.david.murraysetmessages: + msg197984
2013年09月17日 13:08:56serhiy.storchakasetmessages: + msg197983
2013年09月17日 12:06:36pitrousetmessages: + msg197982
2013年09月17日 11:15:29serhiy.storchakasetmessages: + msg197981
2013年09月17日 10:16:31georg.brandlsetmessages: + msg197980
2013年09月17日 09:10:38pitrousetmessages: + msg197975
2013年09月17日 08:57:08georg.brandlsetmessages: + msg197973
2013年09月17日 08:11:36pitrousetmessages: + msg197970
2013年09月17日 08:05:09georg.brandlsetnosy: + georg.brandl
messages: + msg197969
2013年09月14日 21:35:24pitrousetfiles: + transformdict3.patch

messages: + msg197733
2013年09月14日 14:18:54pitrousetmessages: + msg197711
2013年09月14日 14:18:12pitrousetfiles: + transformdict2.patch

messages: + msg197710
2013年09月13日 20:38:20serhiy.storchakasetmessages: + msg197648
2013年09月13日 20:23:45pitrousetmessages: + msg197644
2013年09月13日 20:00:04serhiy.storchakasetmessages: + msg197637
2013年09月13日 19:52:19serhiy.storchakasetfiles: + dict__transform__.patch

messages: + msg197635
2013年09月12日 15:12:57pitrousetmessages: + msg197533
2013年09月12日 15:02:39r.david.murraysetmessages: + msg197531
2013年09月12日 14:53:24ethan.furmansetmessages: + msg197529
2013年09月12日 14:50:56pitrousetmessages: + msg197528
2013年09月12日 14:34:57ethan.furmansetmessages: + msg197527
2013年09月12日 14:16:01r.david.murraysetmessages: + msg197526
2013年09月12日 14:08:37ethan.furmansetmessages: + msg197525
2013年09月12日 10:14:58pitrousetmessages: + msg197516
2013年09月11日 09:15:39serhiy.storchakasetfiles: + ctransformdict.patch
2013年09月11日 09:13:50serhiy.storchakasetfiles: - ctransformdict.patch
2013年09月11日 08:41:30serhiy.storchakasetfiles: + ctransformdict.patch

messages: + msg197479
2013年09月11日 02:41:04rhettingersetmessages: + msg197469
2013年09月10日 21:00:42Arfreversetnosy: + Arfrever
2013年09月10日 19:40:27pitrousetfiles: + transformdict.patch

messages: + msg197464
2013年09月10日 17:51:56eric.araujosetnosy: + eric.araujo
messages: + msg197457
2013年09月10日 14:56:48pitrousetmessages: + msg197446
2013年09月10日 14:53:59sbtsetnosy: + sbt
messages: + msg197445
2013年09月10日 12:38:07pitrousetfiles: + transform.patch
keywords: + patch
messages: + msg197434
2013年09月10日 09:35:11pitrousetmessages: + msg197430
2013年09月09日 22:27:00pitrousetmessages: + msg197412
2013年09月09日 22:25:26pitrousetmessages: + msg197411
2013年09月09日 22:22:29serhiy.storchakasetmessages: + msg197410
2013年09月09日 21:54:18pitrousetmessages: + msg197406
2013年09月09日 21:51:20serhiy.storchakasetmessages: + msg197405
2013年09月09日 21:16:12pitrousetmessages: + msg197403
2013年09月09日 21:12:53serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg197402
2013年09月09日 20:46:24pitrousetmessages: + msg197401
2013年09月09日 20:44:59pitrousetmessages: + msg197400
2013年09月09日 20:43:16ethan.furmansetmessages: + msg197399
2013年09月09日 20:41:19r.david.murraysetmessages: + msg197398
2013年09月09日 20:02:54pitrousetmessages: + msg197393
2013年09月09日 20:00:18eli.benderskysetnosy: + eli.bendersky
messages: + msg197392
2013年09月09日 20:00:06pitrousetmessages: + msg197391
2013年09月09日 19:58:05vstinnersetnosy: + vstinner
messages: + msg197390
2013年09月09日 19:51:06pitrousetmessages: + msg197389
2013年09月09日 19:47:29r.david.murraysetmessages: + msg197387
2013年09月09日 19:26:38mrabarnettsetmessages: + msg197381
2013年09月09日 19:24:48ethan.furmansetmessages: + msg197380
2013年09月09日 19:18:56eric.smithsetmessages: + msg197379
2013年09月09日 19:14:12pitrousetmessages: + msg197377
2013年09月09日 19:13:03eric.smithsetmessages: + msg197376
2013年09月09日 18:36:49pitrousetmessages: + msg197370
2013年09月09日 18:34:26eric.smithsetnosy: + eric.smith
messages: + msg197369
2013年09月09日 18:19:01ethan.furmansetmessages: + msg197368
2013年09月09日 18:11:10pitrousetmessages: + msg197367
2013年09月09日 14:59:57r.david.murraysetmessages: + msg197366
2013年09月09日 13:18:41ethan.furmansetnosy: + ethan.furman
2013年09月09日 13:02:44pitrousetmessages: + msg197362
2013年09月09日 11:54:42pitrousetmessages: + msg197361
2013年09月09日 11:50:59mrabarnettsetnosy: + mrabarnett
messages: + msg197360
2013年09月09日 11:38:13thellersetnosy: + theller
2013年09月09日 11:22:01pitroucreate

AltStyle によって変換されたページ (->オリジナル) /