Issue 15814: memoryview: equality-hash invariant

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/60018

classification

Title:	memoryview: equality-hash invariant
Type:	behavior	Stage:	resolved
Components:	Versions:	Python 3.4

process

Dependencies:	Superseder:
Status:	closed	Resolution:	fixed
Assigned To:	skrah	Nosy List:	Alexander.Belopolsky, Arfrever, BreamoreBoy, Dag.Sverre.Seljebotn, belopolsky, christian.heimes, georg.brandl, loewis, mark.dickinson, meador.inge, ncoghlan, pitrou, python-dev, skrah, vstinner
Priority:	normal	Keywords:	patch

Created on 2012年08月29日 17:07 by skrah, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
issue15814-doc.diff	skrah, 2012年08月30日 11:36	review
issue15814-2.diff	skrah, 2012年09月01日 11:22	review

Messages (51)
msg169398 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2012年08月29日 17:07
The new PEP-3118 equality definition from #15573 that is based on element-wise comparisons breaks the equality-hash invariant: >>> from _testbuffer import ndarray >>> x = ndarray([1,2,3], shape=[3], format='f') >>> y = ndarray([1,2,3], shape=[3], format='B') >>> a = memoryview(x) >>> b = memoryview(y) >>> a == b True >>> hash(a) == hash(b) False >>>
msg169403 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2012年08月29日 17:54
I think the proper solution is to make memoryview objects unhashable. Any other approach will have flaws of some kind.
msg169404 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2012年08月29日 18:01
> I think the proper solution is to make memoryview objects unhashable. Disagreed. If memoryviews are to be bytes-like objects they should be hashable (at least when readonly). > Any other approach will have flaws of some kind. Not more so than equality between memoryviews.
msg169408 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2012年08月29日 18:59
Am 29.08.12 20:01, schrieb Antoine Pitrou: >> I think the proper solution is to make memoryview objects unhashable. > > Disagreed. If memoryviews are to be bytes-like objects they should be > hashable (at least when readonly). So what specific hash algorithm do you propose? >> Any other approach will have flaws of some kind. > > Not more so than equality between memoryviews. Well, memoryviews now have a definition of equality as discussed in issue15573. You may disagree whether it's a useful definition, but I don't believe it has actual flaws (in the sense that it is incorrect or inconsistent). My claim is that any hash definition for memoryviews will have a fundamental flaw, failing to provide the basic property that A==B must imply hash(A)==hash(B), making it actually work incorrectly (when used for its primary application, namely keys in dictionaries).
msg169409 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2012年08月29日 19:06
> Am 29.08.12 20:01, schrieb Antoine Pitrou: > >> I think the proper solution is to make memoryview objects unhashable. > > > > Disagreed. If memoryviews are to be bytes-like objects they should be > > hashable (at least when readonly). > > So what specific hash algorithm do you propose? The current algorithm works well in conjunction with bytes objects. > My claim is that any hash definition for memoryviews will have a > fundamental flaw, failing to provide the basic property > that A==B must imply hash(A)==hash(B), making it actually work > incorrectly Why is there such a fundamental flaw?
msg169411 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2012年08月29日 19:33
Am 29.08.12 21:06, schrieb Antoine Pitrou: >> So what specific hash algorithm do you propose? > > The current algorithm works well in conjunction with bytes objects. That's about the only type if works for. >> My claim is that any hash definition for memoryviews will have a >> fundamental flaw, failing to provide the basic property >> that A==B must imply hash(A)==hash(B), making it actually work >> incorrectly > > Why is there such a fundamental flaw? The current algorithm is flawed as described in Stefan's original message: two objects compare equal, yet hash different. That means that if you use them as dictionary keys, you may end up with two different values for the "same" key, depending on the size of the dictionary (as the modulo operation in the dictionary still may map the different hashes to the same dictionary slot). In general, since memoryview(obj)==obj, it would be necessary that hash(memoryview(obj))==hash(obj). However, since memoryview cannot know what hashing algorithm obj uses, it cannot compute the hash value with the same algorithm. IOW, hashing is mutually exclusive with comparison with the underlying object, unless you know what the hash algorithm of the underlying object is. So restricting tp_hash to memoryview objects where the underlying object is the bytes type would work.
msg169415 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2012年08月29日 20:04
Martin v. L??wis <report@bugs.python.org> wrote: > In general, since memoryview(obj)==obj, it would be necessary that > hash(memoryview(obj))==hash(obj). However, since memoryview cannot > know what hashing algorithm obj uses, it cannot compute the hash > value with the same algorithm. In the memoryview-hash thread on python-dev [1] this objection was addressed by demanding from exporters that they all use: hash(x) == hash(x.tobytes()) Since the previous equality concept was also based on x.tobytes() == y.tobytes(), this wasn't a problem. The new equality definition and any possible new hash definition should probably also be part of the buffer API documentation, since they aren't memoryview specific. [1] http://mail.python.org/pipermail/python-dev/2011-November/114459.html
msg169416 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2012年08月29日 20:27
And since memory_richcompare() and any potentially compatible hash function are quite tricky to implement by now, perhaps the generally useful parts could be exposed as PyBuffer_RichCompareBool() and PyBuffer_Hash().
msg169417 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2012年08月29日 20:50
Am 29.08.12 22:04, schrieb Stefan Krah: > In the memoryview-hash thread on python-dev [1] this objection was > addressed by demanding from exporters that they all use: > > hash(x) == hash(x.tobytes()) > > Since the previous equality concept was also based on > x.tobytes() == y.tobytes(), this wasn't a problem. In the light of this requirement, it's even more difficult to ask people that they change their hashing, since some exporters may already comply with that original request. > The new equality definition and any possible new hash definition should > probably also be part of the buffer API documentation, since they > aren't memoryview specific. That's not true: they are memoryview-specific. The notion of equality is entirely one of memoryview objects, not of buffers. I still maintain that specifying hashing for memoryviews under the new equality definition is just not feasible, and that we should give up on it (except perhaps supporting the hashing of bytes views). I also question whether it is useful to hash arbitrarily-shaped read-only buffers (along with questioning whether people will actually have arbitrarily-shaped read-only buffers). Stefan: please do propose a semantics also along with proposing interfaces.
msg169425 - (view)	Author: Alyssa Coghlan (ncoghlan) * (Python committer)	Date: 2012年08月30日 00:26
My perspective is that hashing a memoryview only makes sense when the memoryview is read-only and "m == m.tobytes()" (i.e. it's a C contiguous 1D view of bytes, either because that's what the original object exported as a buffer or because the view has been cast that way) So, I think we should document that restriction in What's New and the memoryview docs for 3.3.0, and actually enforce it in 3.3.1 by raising ValueError from hash(m) when the assumption of a 1D C contiguous bytes view is violated (the "read-only" restriction is already enforced). In previous versions, this constraint didn't need to be explicitly enforced, since memoryview basically treated everything as a 1D C contiguous view of bytes.
msg169440 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2012年08月30日 09:24
Martin v. L??wis <report@bugs.python.org> wrote: > > hash(x) == hash(x.tobytes()) > In the light of this requirement, it's even more difficult to ask > people that they change their hashing, since some exporters may already > comply with that original request. I don't think so. memoryview.__hash__() is new in 3.3 and the requirement is not documented at all in the general PEP-3118 sections. [Adding Stefan Behnel to nosy, since Cython is pretty quick to pick up new features.] > > The new equality definition and any possible new hash definition should > > probably also be part of the buffer API documentation, since they > > aren't memoryview specific. > > That's not true: they are memoryview-specific. The notion of equality > is entirely one of memoryview objects, not of buffers. Could you name a part of the equality definition that is memoryview-specific? > I still maintain that specifying hashing for memoryviews under the > new equality definition is just not feasible, and that we should give > up on it (except perhaps supporting the hashing of bytes views). > I also question whether it is useful to hash arbitrarily-shaped > read-only buffers (along with questioning whether people will actually > have arbitrarily-shaped read-only buffers). Useful, perhaps. I don't know if it is worth the effort though. We could restrict hashing to contiguous bytes views in 3.3.1.
msg169450 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2012年08月30日 11:35
Here's a doc patch restricting the hash to formats 'B', 'b' and 'c'. I think non-contiguous views are fine: both __eq__() and tobytes() handle these, so the equality-hash invariant is preserved.
msg169451 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2012年08月30日 13:01
Am 30.08.12 11:24, schrieb Stefan Krah: >>> The new equality definition and any possible new hash definition should >>> probably also be part of the buffer API documentation, since they >>> aren't memoryview specific. >> >> That's not true: they are memoryview-specific. The notion of equality >> is entirely one of memoryview objects, not of buffers. > > Could you name a part of the equality definition that is memoryview-specific? As literally spelled out, it starts with "a memoryview and a PEP 3118 exporter". But I was actually thinking of the implementation... In any case, I'm skeptical that this is a useful extension to the buffer interface. You would imply "two exporters are equal iff their exported buffers are equal". I think this is asking too much.
msg169497 - (view)	Author: Dag Sverre Seljebotn (Dag.Sverre.Seljebotn)	Date: 2012年08月31日 03:31
It is perfectly possible for an object to export memory in a read-only way that may still change. Another object may have a writeable view: x = obj.readonly_view y = obj.writable_view obj.move_to_next_image() # changes memory under x, y So, hashing using tobytes() doesn't make any sense at all to me. A memoryview != its contents. You could compare memoryviews simply by comparing the Py_buffer structure, but that's going to be confusing for a lot of users. I would really prefer unhashable (+ if needed a method for comparing contents). (FWIW, not sure how relevant this is; in NumPy, == does In [1]: np.array([1,2,3]) == np.array([1,3,3]) Out[1]: array([ True, False, True], dtype=bool) Cython will follow this behaviour for its "typed memory views", which is Cython's builting mechanism for PEP 3118 which is not quite the same as CPython "memoryview". But I understand that following this behaviour is probably out of the question for CPython.)
msg169498 - (view)	Author: Dag Sverre Seljebotn (Dag.Sverre.Seljebotn)	Date: 2012年08月31日 03:47
OK, I can understand the desire to make memoryviews be bytes-like objects (though to my mind, bytes is "frozen" in a very different way...) If so, and it is deemed worth it, my suggestion is to add a new PyBUF_CONST flag to the buffer acquisition in that case (which can not be used together with PyBUF_WRITABLE). Given that flag, the exporter guarantees that the contents does not change (or fails to give away a buffer). Perhaps it could be possible for hash() to try to re-acquire a const buffer, so that some buffers are hashable (by contents) and others not. (I really think relying on buffers that are not writeable to be constant is dangerous. A readonly memoryview could for instance point straight to the live output of a webcam.)
msg169499 - (view)	Author: Stefan Behnel (scoder) * (Python committer)	Date: 2012年08月31日 04:14
To add on Dag's comments, this essentially means that any caching of the hash value is dangerous, unless it can be assured that the underlying buffer definitely has not changed in the meantime. There is no way for users to explicitly tell a memoryview to rehash itself, so putting it into one dict, then modifying the content, then putting it into another is a perfectly reasonable use case for users, but fails when the memory view caches its hash value. As far as I can see it, any "obvious" fix for this creates a whole bath tub full of worms: updating views transitively, adding new APIs, supporting new buffer parameters, ...
msg169508 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2012年08月31日 08:26
Dag Sverre Seljebotn <report@bugs.python.org> wrote: > It is perfectly possible for an object to export memory in a read-only > way that may still change. Another object may have a writeable view: > > x = obj.readonly_view > y = obj.writable_view > obj.move_to_next_image() # changes memory under x, y PEP-3118 always means different things to different people. :) Your distinction between constant, readonly, writable is one way to interpret things. But looking at the use in the Python source tree, readonly does mean immutable. Unfortunately the PEP doesn't specify readonly: Is it the same as "mount a partition readonly", "readonly CD" or O_RDONLY? Also, existing practice was that in response to a PyBUF_FULL_RO request the exporter was free to hand out a writable buffer! This is now documented and subverts reliable tagging for consumers: http://docs.python.org/dev/c-api/buffer.html#readonly-format If any of this is a major inconvenience to NumPy or other users of the buffer API, perhaps you could open a separate issue. > (FWIW, not sure how relevant this is; in NumPy, == does > > In [1]: np.array([1,2,3]) == np.array([1,3,3]) > Out[1]: array([ True, False, True], dtype=bool) Since afa3dedfee18 we compare memoryviews and any exporter in the same manner as NumPy, except that the return value is a single boolean. So I think the new comparison is equivalent to all(a == b) in NumPy.
msg169510 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2012年08月31日 08:42
Dag Sverre Seljebotn <report@bugs.python.org> wrote: > OK, I can understand the desire to make memoryviews be bytes-like objects > (though to my mind, bytes is "frozen" in a very different way...) We have two desires that sometimes conflict: People who want to use memoryview as a buffer legitimately want it to behave like bytes. People who want to have an "arrayview" want it to behave like NumPy or PIL arrays. The new implementation preserves fast paths for bytes and otherwise behaves pretty much like a very restricted version of NumPy's ndarray (with support for suboffsets), except that not all planned multi-dimensional features have been implemented yet. > If so, and it is deemed worth it, my suggestion is to add a new > PyBUF_CONST flag to the buffer acquisition in that case Note that memory_hash() raises an error if the exporter itself is not hashable, so it only hashes immutable objects by design. But the PyBUF_CONST flag might be worth considering (in another issue).
msg169515 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2012年08月31日 10:01
Am 31.08.12 06:14, schrieb Stefan Behnel: > To add on Dag's comments, this essentially means that any caching of > the hash value is dangerous Stefan Krah is right here: the proper test (which is already implemented) is whether the underlying object is hashable; this is a sufficient test. Caching of the hash is always correct, or else the hash function is incorrect: a hash of an object must not change. If it would change, you couldn't find the key anymore in a dictionary. > unless it can be assured that the > underlying buffer definitely has not changed in the meantime. There > is no way for users to explicitly tell a memoryview to rehash itself, > so putting it into one dict, then modifying the content, then putting > it into another is a perfectly reasonable use case for users, No, it's not: d = {} o = some_object m = memoryview(o) d[m] = -1 o.modify() # so that hash(m) changes print(d[m]) # keyerror, even though m in d.keys() > As far as I can see it, any "obvious" fix for this creates a whole > bath tub full of worms: updating views transitively, adding new APIs, > supporting new buffer parameters, ... So above (in this thread): the easiest fix is to just drop hashing support for memoryview objects. This can be extended in steps: - support hashing if the underlying object is bytes - support hashing if the shape is 1D "B", and the underlying object is hashable - support hashing if the exporter has somehow promised to not modify the memory
msg169622 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2012年09月01日 11:22
Here's a patch that enforces byte formats. Does everyone agree on (or tolerate at least) allowing 'B', 'b' and 'c'? I think we should at least commit the doc patch for 3.3.0. Otherwise implementors of exporting objects might waste time on a feature that's deprecated.
msg169626 - (view)	Author: Alyssa Coghlan (ncoghlan) * (Python committer)	Date: 2012年09月01日 12:13
+1 for docs patch for 3.3.0 and then enforcing the format restriction for 3.3.1.
msg169627 - (view)	Author: Roundup Robot (python-dev) (Python triager)	Date: 2012年09月01日 12:40
New changeset 895e123d9476 by Stefan Krah in branch 'default': Issue #15814: Document planned restrictions for memoryview hashes in 3.3.1. http://hg.python.org/cpython/rev/895e123d9476
msg169633 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2012年09月01日 13:52
Am 01.09.12 13:22, schrieb Stefan Krah: > Does everyone agree on (or tolerate at least) allowing 'B', 'b' and 'c'? Why be more permissive than necessary? -0 on the committed version; it should IMO further restrict it to 1D contiguous byte arrays.
msg169636 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2012年09月01日 14:24
Martin v. L??wis <report@bugs.python.org> wrote: > Why be more permissive than necessary? -0 on the committed version; > it should IMO further restrict it to 1D contiguous byte arrays. Does "byte arrays" include 'b' and 'c' or just 'B'? I don't see a reason to allow 'B' but not the others. I'm +-0 on allowing multi-dimensional arrays, but it would be odd to restrict hashing to contiguous arrays: >>> b = b'abcdefhhijkl' >>> m = memoryview(b) >>> b[::-1] b'lkjihhfedcba' >>> bytes(m[::-1]) b'lkjihhfedcba' >>> hash(b[::-1]) == hash(m[::-1]) True My reasoning was: If non-contiguous arrays are allowed (and I think they should be), why not allow multi-dimensional arrays, too? The definition hash(m) == hash(m.tobytes()) is pretty straightforward.
msg169638 - (view)	Author: Alexander Belopolsky (Alexander.Belopolsky)	Date: 2012年09月01日 14:42
On Sep 1, 2012, at 10:24 AM, Stefan Krah <report@bugs.python.org> wrote: > > The definition hash(m) == hash(m.tobytes()) is pretty straightforward. I probably missed something from the early discussion, but doesn't this definition only work for 1d (or 0d) views? Shouldn't shape come into calculation for multidimensional views?
msg169641 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2012年09月01日 15:06
tobytes() is the same as the flattened multi-dimensional list representation with all elements converted to bytes. If I'm not mistaken, that's how NumPy's tostring() behaves.
msg169642 - (view)	Author: Alexander Belopolsky (Alexander.Belopolsky)	Date: 2012年09月01日 15:14
On Sep 1, 2012, at 11:06 AM, Stefan Krah <report@bugs.python.org> wrote: > tobytes() is the same as the flattened multi-dimensional list representation > with all elements converted to bytes. This is correct, but why is it desirable to have deliberate hash collisions between views with different shapes?
msg169643 - (view)	Author: Alyssa Coghlan (ncoghlan) * (Python committer)	Date: 2012年09月01日 15:16
Keep in mind that its OK if hash(m) == hash(m.tobytes()) in some cases where "m != m.tobytes()". The only cases we really need to kill are those that break the hash invariant. I don't like the idea of making the definition of hash more complicated just to rule out cases that aren't actually broken.
msg169644 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2012年09月01日 15:29
> why is it desirable to have deliberate hash collisions between views with different shapes? Since we're now restricting everything to bytes, the multi-dimensional case is probably not useful at all. As I said above, I would leave it in because it actually saves one check for ndim and as Nick said, the definition is simpler. Now the user only has to remember that hashing works for 'B', 'b' and 'c'.
msg169645 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2012年09月01日 16:06
Am 01.09.12 16:24, schrieb Stefan Krah: > Does "byte arrays" include 'b' and 'c' or just 'B'? I don't see a reason > to allow 'B' but not the others. Either type is fine with me. It's the multi-dimensional aspect I'd like to ban. > My reasoning was: If non-contiguous arrays are allowed (and I think they > should be), why not allow multi-dimensional arrays, too? I think neither should be allowed. The explicit request (from Antoine) was that memoryviews of bytes objects should be hashable. I haven't heard anybody asking for something more general (but I could see that people want to hash other memory blocks as well, such as mmap views). I can see that it is tempting to provide the most general definition, but see where this approach got us. It's always easier to widen an interface later than to narrow it.
msg169648 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2012年09月01日 17:20
Disallowing non-contiguous arrays leads to very strange situations though. I'm positive that there will be a bug report about this: >>> x = memoryview(b'abc')[::-1] >>> b = b'cba' >>> d = {b'cba': 101} >>> >>> b in d True >>> x == b True >>> x in d Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: memoryiew: hashing is restricted to C-contiguous arrays Could we perhaps take a small poll? My own vote is: 1) Allow bytes hashing at all: +0.5 2) If 1) is allowed, then also non-contiguous hashing is allowed: +1 3) Allow multi-dimensional hashing: +-0
msg169654 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2012年09月01日 17:44
Am 01.09.12 19:20, schrieb Stefan Krah: > Disallowing non-contiguous arrays leads to very strange situations though. I don't find that strange. That two object compare equal doesn't imply that they both hash - only that if they hash, they should hash equal. In any case, this can happen already: py> x = memoryview(array.array('B',b'cba')) py> b = b'cba' py> d = {b'cba': 101} py> b in d True py> x == b True py> x in d Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: cannot hash writable memoryview object It can also happen with other types (although I had to look around a bit): py> x=set((1,2,3)) py> b=frozenset(x) py> d={b:101} py> b in d True py> x==b True py> x in d Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unhashable type: 'set' > 1) Allow bytes hashing at all: +0.5 +0 > 2) If 1) is allowed, then also non-contiguous hashing is allowed: +1 -1 > 3) Allow multi-dimensional hashing: +-0 -1
msg169655 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2012年09月01日 18:04
> Could we perhaps take a small poll? My own vote is: > > 1) Allow bytes hashing at all: +0.5 +10. The buffer() object in 2.x was hashable, and a very important use case of memoryview is replacing buffer(). > 2) If 1) is allowed, then also non-contiguous hashing is allowed: +1 > 3) Allow multi-dimensional hashing: +-0 I don't care about these. I kind of agree with Martin that hashability should be motivated by an use case, so -0.
msg169657 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2012年09月01日 18:06
> py> x = memoryview(array.array('B',b'cba')) I find the array example is different. The user has to remember one thing: memoryviews based on arrays don't hash. For memoryviews based on bytes one would have to remember: - 'B', 'c' and 'b' hash - only C-contiguous arrays hash - b'abc'[::-1] hashes, but memoryview(b'abc')[::-1] does not
msg169659 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2012年09月01日 18:22
Am 01.09.12 20:06, schrieb Stefan Krah: > - b'abc'[::-1] hashes, but memoryview(b'abc')[::-1] does not I find that memoryview(b'abc')[::-1] is a strange thing to have, anyway, so I'm not bothered by it behaving different. I can accept that it needs to be supported for consistency, but I would recommend against using it in applications. Who needs reversed memory, anyway? The only case were I ever needed reversal is lists (typically sorted, perhaps cronological).
msg169682 - (view)	Author: Alyssa Coghlan (ncoghlan) * (Python committer)	Date: 2012年09月02日 01:47
+1 for allowing bytes hashing. As Antoine noted, the 1D bytes variant of memoryview() fills the role previously handled by buffer(). +1 for allowing 1D non-contiguous hashing. This is from a simplicity perspective, as I don't want to have to explain to people why "hash(memoryview(b'12'[::2]))", "hash(memoryview(b'123'[::2]))" and "hash(memoryview(b'12')[::2])" all work, but "hash(memoryview(b'123')[::2])" fails. -1 for allowing hashing when ndim > 1. We don't have a solid use case, and there's no odd value based corner cases to be considered.
msg169693 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2012年09月02日 12:44
The totals are +11.5 :) for hashing, +1 for allowing non-contiguous and -2 for multi-dimensional. I'll update the docs soon.
msg169694 - (view)	Author: Roundup Robot (python-dev) (Python triager)	Date: 2012年09月02日 12:57
New changeset 3b2597c1fe35 by Stefan Krah in branch 'default': Issue #15814: Documentation: disallow hashing of multi-dimensional memoryviews. http://hg.python.org/cpython/rev/3b2597c1fe35
msg169700 - (view)	Author: Alexander Belopolsky (Alexander.Belopolsky)	Date: 2012年09月02日 14:21
On Sep 2, 2012, at 8:44 AM, Stefan Krah <report@bugs.python.org> wrote: > The totals are +11.5 :) for hashing, +1 for allowing non-contiguous and > -2 for multi-dimensional I have refrained from voting because in my line of work buffers or memoryviews deal with large objects that rarely serve as dictionary keys. As a result, I have zero experince with hashing of buffers. This observation supports the current consensus to limit hashing to 1d and 0d cases. My only concern is that with hash(m) == hash(m.tobytes()) implementing multidimensional restriction will require artificial if ndim > 1 check and an extra sentence in the docs while not simplifying anything.
msg169754 - (view)	Author: Alyssa Coghlan (ncoghlan) * (Python committer)	Date: 2012年09月03日 11:42
The main issue is that it's not quite clear how to deal with problems like C-style vs FORTRAN-style memory layouts and strides vs suboffsets in defining multidimensional hash equality. Without a use case, it's easier to just punt on the question and declare it illegal. The 1D hashing case really just comes from wanting to have 1D bytes views behave as much like a bytes object as is practical.
msg169755 - (view)	Author: Roundup Robot (python-dev) (Python triager)	Date: 2012年09月03日 11:46
New changeset c9c9d890400c by Nick Coghlan in branch 'default': Issue #15814: Add NEWS entry regarding intended memoryview hashing restrictions http://hg.python.org/cpython/rev/c9c9d890400c
msg169757 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2012年09月03日 12:38
Small nitpick: multi-dimensional hashing wasn't really accidental, it was perfectly aligned with the previous statically typed equality definition. When I suggested PyBuffer_Hash = hash(obj.tobytes()) on python-dev for non-contiguous and multi-dimensional arrays, I got a +1 from Raymond (which I now see was a private message). I don't see what could possibly be ill-defined about using the tobytes() definition for ND-arrays. In all places memoryview now uses the logical array, which is displayed by tolist(). Leaving aside static typing, both the previous and the new equality definitions regarded C and Fortran arrays with the same list outputs as equal.
msg169760 - (view)	Author: Alexander Belopolsky (Alexander.Belopolsky)	Date: 2012年09月03日 13:01
On Mon, Sep 3, 2012 at 8:38 AM, Stefan Krah <report@bugs.python.org> wrote: > I don't see what could possibly be ill-defined about using the > tobytes() definition for ND-arrays. In all places memoryview now > uses the logical array, which is displayed by tolist(). +1. The key restriction is that to byte views. ND aspect does not complicate the logic once .tobytes() is properly implemented.
msg169800 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2012年09月03日 19:59
Am 02.09.2012 16:21, schrieb Alexander Belopolsky: > I have refrained from voting because in my line of work buffers or > memoryviews deal with large objects that rarely serve as dictionary > keys. As a result, I have zero experince with hashing of buffers. > This observation supports the current consensus to limit hashing to > 1d and 0d cases. My only concern is that with hash(m) == > hash(m.tobytes()) implementing multidimensional restriction will > require artificial if ndim > 1 check and an extra sentence in the > docs while not simplifying anything. As for the "not simplifying argument": if hashing was restricted to contiguous bytes, then the implementation would certainly be simplified quite a bit: currently, if it's not contiguous, it needs to make a separate copy and hash that. This code could go away if hashing would only work for true memory "blocks".
msg169801 - (view)	Author: Alexander Belopolsky (Alexander.Belopolsky)	Date: 2012年09月03日 20:21
On Mon, Sep 3, 2012 at 3:59 PM, Martin v. Löwis <report@bugs.python.org> wrote: > if hashing was restricted > to contiguous bytes, then the implementation would certainly be > simplified quite a bit: currently, if it's not contiguous, it needs > to make a separate copy and hash that. I would be happy if this whole business of "strided" memory layouts would go away. I find it to be a failed attempt of over-generalization. It is failed because it does not cover many important formats even in 2-d: sparse matrices, symmetrical matrices, banded matrices, etc., etc. (Check level 3 BLAS or NAG for a longer list.) It is an over-generalizarion because in practice it is often more efficient to gather your data in contiguous memory, perform vector calculations and then scatter the results than to operate on scattered data to avoid copying. I would much rather see a generalization of suboffsets that would allow me to have views into arrays of pointers to variable width arrays than strides.
msg170051 - (view)	Author: Roundup Robot (python-dev) (Python triager)	Date: 2012年09月08日 13:41
New changeset ca81b9a3a015 by Stefan Krah in branch 'default': Issue #15814: Update whatsnew to the current state of hashing memoryviews. http://hg.python.org/cpython/rev/ca81b9a3a015
msg170052 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2012年09月08日 13:51
Georg, thanks for including all changes that I've asked for! If you still have the patience for the constant stream of memoryview doc changes, there are three new ones that might be worth including: 3b2597c1fe35 c9c9d890400c ca81b9a3a015
msg170092 - (view)	Author: Roundup Robot (python-dev) (Python triager)	Date: 2012年09月09日 09:18
New changeset 7734eb2707a1 by Stefan Krah in branch 'default': Issue #15814: Document planned restrictions for memoryview hashes in 3.3.1. http://hg.python.org/cpython/rev/7734eb2707a1 New changeset 71f4d80400f2 by Stefan Krah in branch 'default': Issue #15814: Documentation: disallow hashing of multi-dimensional memoryviews. http://hg.python.org/cpython/rev/71f4d80400f2 New changeset 3ffd6ad93fe4 by Nick Coghlan in branch 'default': Issue #15814: Add NEWS entry regarding intended memoryview hashing restrictions http://hg.python.org/cpython/rev/3ffd6ad93fe4 New changeset 88a0792e8ba3 by Stefan Krah in branch 'default': Issue #15814: Update whatsnew to the current state of hashing memoryviews. http://hg.python.org/cpython/rev/88a0792e8ba3
msg173201 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2012年10月17日 20:16
The 3.3.0 docs now state "Note Hashing of memoryviews with formats other than ‘B’, ‘b’ or ‘c’ as well as hashing of multi-dimensional memoryviews is possible in version 3.3.0, but will raise an error in 3.3.1 in order to be compatible with the new memoryview equality definition." I assume 3.3.1 should read 3.4, or have I missed something?
msg173203 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2012年10月17日 20:24
3.3.1 is correct, you apparently missed msg169425.
msg174542 - (view)	Author: Roundup Robot (python-dev) (Python triager)	Date: 2012年11月02日 17:07
New changeset 969069d464bc by Stefan Krah in branch '3.3': Issue #15814: Use hash function that is compatible with the equality http://hg.python.org/cpython/rev/969069d464bc

History
Date	User	Action	Args
2022年04月11日 14:57:35	admin	set	github: 60018
2012年11月02日 17:25:55	skrah	set	status: open -> closed resolution: fixed stage: needs patch -> resolved
2012年11月02日 17:07:02	python-dev	set	messages: + msg174542
2012年10月17日 20:24:04	loewis	set	messages: + msg173203
2012年10月17日 20:16:21	BreamoreBoy	set	nosy: + BreamoreBoy messages: + msg173201
2012年09月09日 09:18:58	python-dev	set	messages: + msg170092
2012年09月08日 13:51:43	skrah	set	messages: + msg170052
2012年09月08日 13:41:24	python-dev	set	messages: + msg170051
2012年09月03日 20:21:49	Alexander.Belopolsky	set	messages: + msg169801
2012年09月03日 19:59:02	loewis	set	messages: + msg169800
2012年09月03日 13:04:14	scoder	set	nosy: - scoder
2012年09月03日 13:01:20	Alexander.Belopolsky	set	messages: + msg169760
2012年09月03日 12:38:18	skrah	set	messages: + msg169757
2012年09月03日 11:46:44	python-dev	set	messages: + msg169755
2012年09月03日 11:42:51	ncoghlan	set	messages: + msg169754
2012年09月02日 14:21:49	Alexander.Belopolsky	set	messages: + msg169700
2012年09月02日 12:57:19	python-dev	set	messages: + msg169694
2012年09月02日 12:44:30	skrah	set	messages: + msg169693
2012年09月02日 01:47:15	ncoghlan	set	messages: + msg169682
2012年09月01日 18:22:28	loewis	set	messages: + msg169659
2012年09月01日 18:06:32	skrah	set	messages: + msg169657
2012年09月01日 18:04:54	pitrou	set	messages: + msg169655
2012年09月01日 17:44:06	loewis	set	messages: + msg169654
2012年09月01日 17:20:22	skrah	set	messages: + msg169648
2012年09月01日 16:06:18	loewis	set	messages: + msg169645
2012年09月01日 15:29:49	skrah	set	messages: + msg169644
2012年09月01日 15:16:24	ncoghlan	set	messages: + msg169643
2012年09月01日 15:14:48	Alexander.Belopolsky	set	messages: + msg169642
2012年09月01日 15:06:29	skrah	set	messages: + msg169641
2012年09月01日 14:42:11	Alexander.Belopolsky	set	nosy: + Alexander.Belopolsky messages: + msg169638
2012年09月01日 14:24:21	skrah	set	messages: + msg169636
2012年09月01日 13:52:15	loewis	set	messages: + msg169633
2012年09月01日 12:40:19	python-dev	set	nosy: + python-dev messages: + msg169627
2012年09月01日 12:13:24	ncoghlan	set	messages: + msg169626
2012年09月01日 11:22:16	skrah	set	files: + issue15814-2.diff messages: + msg169622
2012年08月31日 10:01:43	loewis	set	messages: + msg169515
2012年08月31日 08:42:29	skrah	set	messages: + msg169510
2012年08月31日 08:26:49	skrah	set	messages: + msg169508
2012年08月31日 04:14:01	scoder	set	messages: + msg169499
2012年08月31日 03:47:32	Dag.Sverre.Seljebotn	set	messages: + msg169498
2012年08月31日 03:31:31	Dag.Sverre.Seljebotn	set	nosy: + Dag.Sverre.Seljebotn messages: + msg169497
2012年08月30日 13:01:51	loewis	set	messages: + msg169451
2012年08月30日 11:36:02	skrah	set	files: + issue15814-doc.diff keywords: + patch
2012年08月30日 11:35:31	skrah	set	messages: + msg169450
2012年08月30日 09:26:10	skrah	set	nosy: + scoder
2012年08月30日 09:24:48	skrah	set	messages: + msg169440
2012年08月30日 00:26:02	ncoghlan	set	messages: + msg169425
2012年08月29日 23:36:08	belopolsky	set	nosy: + belopolsky
2012年08月29日 20:50:40	loewis	set	messages: + msg169417
2012年08月29日 20:27:31	skrah	set	messages: + msg169416
2012年08月29日 20:04:09	skrah	set	messages: + msg169415
2012年08月29日 19:33:47	loewis	set	messages: + msg169411
2012年08月29日 19:06:28	pitrou	set	messages: + msg169409
2012年08月29日 18:59:10	loewis	set	messages: + msg169408
2012年08月29日 18:01:27	pitrou	set	messages: + msg169404
2012年08月29日 17:54:45	loewis	set	messages: + msg169403
2012年08月29日 17:07:08	skrah	create

homepage