This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年08月29日 17:07 by skrah, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| issue15814-doc.diff | skrah, 2012年08月30日 11:36 | review | ||
| issue15814-2.diff | skrah, 2012年09月01日 11:22 | review | ||
| Messages (51) | |||
|---|---|---|---|
| msg169398 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年08月29日 17:07 | |
The new PEP-3118 equality definition from #15573 that is based on element-wise comparisons breaks the equality-hash invariant: >>> from _testbuffer import ndarray >>> x = ndarray([1,2,3], shape=[3], format='f') >>> y = ndarray([1,2,3], shape=[3], format='B') >>> a = memoryview(x) >>> b = memoryview(y) >>> a == b True >>> hash(a) == hash(b) False >>> |
|||
| msg169403 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2012年08月29日 17:54 | |
I think the proper solution is to make memoryview objects unhashable. Any other approach will have flaws of some kind. |
|||
| msg169404 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年08月29日 18:01 | |
> I think the proper solution is to make memoryview objects unhashable. Disagreed. If memoryviews are to be bytes-like objects they should be hashable (at least when readonly). > Any other approach will have flaws of some kind. Not more so than equality between memoryviews. |
|||
| msg169408 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2012年08月29日 18:59 | |
Am 29.08.12 20:01, schrieb Antoine Pitrou: >> I think the proper solution is to make memoryview objects unhashable. > > Disagreed. If memoryviews are to be bytes-like objects they should be > hashable (at least when readonly). So what specific hash algorithm do you propose? >> Any other approach will have flaws of some kind. > > Not more so than equality between memoryviews. Well, memoryviews now have a definition of equality as discussed in issue15573. You may disagree whether it's a useful definition, but I don't believe it has actual flaws (in the sense that it is incorrect or inconsistent). My claim is that any hash definition for memoryviews will have a *fundamental* flaw, failing to provide the basic property that A==B must imply hash(A)==hash(B), making it actually work incorrectly (when used for its primary application, namely keys in dictionaries). |
|||
| msg169409 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年08月29日 19:06 | |
> Am 29.08.12 20:01, schrieb Antoine Pitrou: > >> I think the proper solution is to make memoryview objects unhashable. > > > > Disagreed. If memoryviews are to be bytes-like objects they should be > > hashable (at least when readonly). > > So what specific hash algorithm do you propose? The current algorithm works well in conjunction with bytes objects. > My claim is that any hash definition for memoryviews will have a > *fundamental* flaw, failing to provide the basic property > that A==B must imply hash(A)==hash(B), making it actually work > incorrectly Why is there such a fundamental flaw? |
|||
| msg169411 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2012年08月29日 19:33 | |
Am 29.08.12 21:06, schrieb Antoine Pitrou: >> So what specific hash algorithm do you propose? > > The current algorithm works well in conjunction with bytes objects. That's about the only type if works for. >> My claim is that any hash definition for memoryviews will have a >> *fundamental* flaw, failing to provide the basic property >> that A==B must imply hash(A)==hash(B), making it actually work >> incorrectly > > Why is there such a fundamental flaw? The current algorithm is flawed as described in Stefan's original message: two objects compare equal, yet hash different. That means that if you use them as dictionary keys, you may end up with two different values for the "same" key, depending on the size of the dictionary (as the modulo operation in the dictionary still may map the different hashes to the same dictionary slot). In general, since memoryview(obj)==obj, it would be necessary that hash(memoryview(obj))==hash(obj). However, since memoryview cannot know what hashing algorithm obj uses, it cannot compute the hash value with the same algorithm. IOW, hashing is mutually exclusive with comparison with the underlying object, unless you know what the hash algorithm of the underlying object is. So restricting tp_hash to memoryview objects where the underlying object is the bytes type would work. |
|||
| msg169415 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年08月29日 20:04 | |
Martin v. L??wis <report@bugs.python.org> wrote: > In general, since memoryview(obj)==obj, it would be necessary that > hash(memoryview(obj))==hash(obj). However, since memoryview cannot > know what hashing algorithm obj uses, it cannot compute the hash > value with the same algorithm. In the memoryview-hash thread on python-dev [1] this objection was addressed by demanding from exporters that they all use: hash(x) == hash(x.tobytes()) Since the previous equality concept was also based on x.tobytes() == y.tobytes(), this wasn't a problem. The new equality definition and any possible new hash definition should probably also be part of the buffer API documentation, since they aren't memoryview specific. [1] http://mail.python.org/pipermail/python-dev/2011-November/114459.html |
|||
| msg169416 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年08月29日 20:27 | |
And since memory_richcompare() and any potentially compatible hash function are quite tricky to implement by now, perhaps the generally useful parts could be exposed as PyBuffer_RichCompareBool() and PyBuffer_Hash(). |
|||
| msg169417 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2012年08月29日 20:50 | |
Am 29.08.12 22:04, schrieb Stefan Krah: > In the memoryview-hash thread on python-dev [1] this objection was > addressed by demanding from exporters that they all use: > > hash(x) == hash(x.tobytes()) > > Since the previous equality concept was also based on > x.tobytes() == y.tobytes(), this wasn't a problem. In the light of this requirement, it's even more difficult to ask people that they change their hashing, since some exporters may already comply with that original request. > The new equality definition and any possible new hash definition should > probably also be part of the buffer API documentation, since they > aren't memoryview specific. That's not true: they *are* memoryview-specific. The notion of equality is entirely one of memoryview objects, not of buffers. I still maintain that specifying hashing for memoryviews under the new equality definition is just not feasible, and that we should give up on it (except perhaps supporting the hashing of bytes views). I also question whether it is useful to hash arbitrarily-shaped read-only buffers (along with questioning whether people will actually *have* arbitrarily-shaped read-only buffers). Stefan: please do propose a semantics also along with proposing interfaces. |
|||
| msg169425 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2012年08月30日 00:26 | |
My perspective is that hashing a memoryview only makes sense when the memoryview is read-only and "m == m.tobytes()" (i.e. it's a C contiguous 1D view of bytes, either because that's what the original object exported as a buffer or because the view has been cast that way) So, I think we should document that restriction in What's New and the memoryview docs for 3.3.0, and actually enforce it in 3.3.1 by raising ValueError from hash(m) when the assumption of a 1D C contiguous bytes view is violated (the "read-only" restriction is already enforced). In previous versions, this constraint didn't need to be explicitly enforced, since memoryview basically treated *everything* as a 1D C contiguous view of bytes. |
|||
| msg169440 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年08月30日 09:24 | |
Martin v. L??wis <report@bugs.python.org> wrote: > > hash(x) == hash(x.tobytes()) > In the light of this requirement, it's even more difficult to ask > people that they change their hashing, since some exporters may already > comply with that original request. I don't think so. memoryview.__hash__() is new in 3.3 and the requirement is not documented at all in the general PEP-3118 sections. [Adding Stefan Behnel to nosy, since Cython is pretty quick to pick up new features.] > > The new equality definition and any possible new hash definition should > > probably also be part of the buffer API documentation, since they > > aren't memoryview specific. > > That's not true: they *are* memoryview-specific. The notion of equality > is entirely one of memoryview objects, not of buffers. Could you name a part of the equality definition that is memoryview-specific? > I still maintain that specifying hashing for memoryviews under the > new equality definition is just not feasible, and that we should give > up on it (except perhaps supporting the hashing of bytes views). > I also question whether it is useful to hash arbitrarily-shaped > read-only buffers (along with questioning whether people will actually > *have* arbitrarily-shaped read-only buffers). Useful, perhaps. I don't know if it is worth the effort though. We could restrict hashing to contiguous bytes views in 3.3.1. |
|||
| msg169450 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年08月30日 11:35 | |
Here's a doc patch restricting the hash to formats 'B', 'b' and 'c'. I think non-contiguous views are fine: both __eq__() and tobytes() handle these, so the equality-hash invariant is preserved. |
|||
| msg169451 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2012年08月30日 13:01 | |
Am 30.08.12 11:24, schrieb Stefan Krah: >>> The new equality definition and any possible new hash definition should >>> probably also be part of the buffer API documentation, since they >>> aren't memoryview specific. >> >> That's not true: they *are* memoryview-specific. The notion of equality >> is entirely one of memoryview objects, not of buffers. > > Could you name a part of the equality definition that is memoryview-specific? As literally spelled out, it starts with "a memoryview and a PEP 3118 exporter". But I was actually thinking of the implementation... In any case, I'm skeptical that this is a useful extension to the buffer interface. You would imply "two exporters are equal iff their exported buffers are equal". I think this is asking too much. |
|||
| msg169497 - (view) | Author: Dag Sverre Seljebotn (Dag.Sverre.Seljebotn) | Date: 2012年08月31日 03:31 | |
It is perfectly possible for an object to export memory in a read-only way that may still change. Another object may have a writeable view: x = obj.readonly_view y = obj.writable_view obj.move_to_next_image() # changes memory under x, y So, hashing using tobytes() doesn't make any sense at all to me. A memoryview != its contents. You could compare memoryviews simply by comparing the Py_buffer structure, but that's going to be confusing for a lot of users. I would really prefer unhashable (+ if needed a method for comparing contents). (FWIW, not sure how relevant this is; in NumPy, == does In [1]: np.array([1,2,3]) == np.array([1,3,3]) Out[1]: array([ True, False, True], dtype=bool) Cython will follow this behaviour for its "typed memory views", which is Cython's builting mechanism for PEP 3118 which is not quite the same as CPython "memoryview". But I understand that following this behaviour is probably out of the question for CPython.) |
|||
| msg169498 - (view) | Author: Dag Sverre Seljebotn (Dag.Sverre.Seljebotn) | Date: 2012年08月31日 03:47 | |
OK, I can understand the desire to make memoryviews be bytes-like objects (though to my mind, bytes is "frozen" in a very different way...) If so, and it is deemed worth it, my suggestion is to add a new PyBUF_CONST flag to the buffer acquisition in that case (which can not be used together with PyBUF_WRITABLE). Given that flag, the exporter guarantees that the contents does not change (or fails to give away a buffer). Perhaps it could be possible for hash() to try to re-acquire a const buffer, so that some buffers are hashable (by contents) and others not. (I really think relying on buffers that are not writeable to be constant is dangerous. A readonly memoryview could for instance point straight to the live output of a webcam.) |
|||
| msg169499 - (view) | Author: Stefan Behnel (scoder) * (Python committer) | Date: 2012年08月31日 04:14 | |
To add on Dag's comments, this essentially means that any caching of the hash value is dangerous, unless it can be assured that the underlying buffer definitely has not changed in the meantime. There is no way for users to explicitly tell a memoryview to rehash itself, so putting it into one dict, then modifying the content, then putting it into another is a perfectly reasonable use case for users, but fails when the memory view caches its hash value. As far as I can see it, any "obvious" fix for this creates a whole bath tub full of worms: updating views transitively, adding new APIs, supporting new buffer parameters, ... |
|||
| msg169508 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年08月31日 08:26 | |
Dag Sverre Seljebotn <report@bugs.python.org> wrote: > It is perfectly possible for an object to export memory in a read-only > way that may still change. Another object may have a writeable view: > > x = obj.readonly_view > y = obj.writable_view > obj.move_to_next_image() # changes memory under x, y PEP-3118 always means different things to different people. :) Your distinction between constant, readonly, writable is one way to interpret things. But looking at the use in the Python source tree, readonly does mean immutable. Unfortunately the PEP doesn't specify readonly: Is it the same as "mount a partition readonly", "readonly CD" or O_RDONLY? Also, existing practice was that in response to a PyBUF_FULL_RO request the exporter was free to hand out a writable buffer! This is now documented and subverts reliable tagging for consumers: http://docs.python.org/dev/c-api/buffer.html#readonly-format If any of this is a major inconvenience to NumPy or other users of the buffer API, perhaps you could open a separate issue. > (FWIW, not sure how relevant this is; in NumPy, == does > > In [1]: np.array([1,2,3]) == np.array([1,3,3]) > Out[1]: array([ True, False, True], dtype=bool) Since afa3dedfee18 we compare memoryviews and any exporter in the same manner as NumPy, except that the return value is a single boolean. So I think the new comparison is equivalent to all(a == b) in NumPy. |
|||
| msg169510 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年08月31日 08:42 | |
Dag Sverre Seljebotn <report@bugs.python.org> wrote: > OK, I can understand the desire to make memoryviews be bytes-like objects > (though to my mind, bytes is "frozen" in a very different way...) We have two desires that sometimes conflict: People who want to use memoryview as a *buffer* legitimately want it to behave like bytes. People who want to have an "arrayview" want it to behave like NumPy or PIL arrays. The new implementation preserves fast paths for bytes and otherwise behaves pretty much like a very restricted version of NumPy's ndarray (with support for suboffsets), except that not all planned multi-dimensional features have been implemented yet. > If so, and it is deemed worth it, my suggestion is to add a new > PyBUF_CONST flag to the buffer acquisition in that case Note that memory_hash() raises an error if the exporter *itself* is not hashable, so it only hashes immutable objects by design. But the PyBUF_CONST flag might be worth considering (in another issue). |
|||
| msg169515 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2012年08月31日 10:01 | |
Am 31.08.12 06:14, schrieb Stefan Behnel:
> To add on Dag's comments, this essentially means that any caching of
> the hash value is dangerous
Stefan Krah is right here: the proper test (which is already
implemented) is whether the underlying object is hashable; this
is a sufficient test.
Caching of the hash is always correct, or else the hash function
is incorrect: a hash of an object must not change. If it would
change, you couldn't find the key anymore in a dictionary.
> unless it can be assured that the
> underlying buffer definitely has not changed in the meantime. There
> is no way for users to explicitly tell a memoryview to rehash itself,
> so putting it into one dict, then modifying the content, then putting
> it into another is a perfectly reasonable use case for users,
No, it's not:
d = {}
o = some_object
m = memoryview(o)
d[m] = -1
o.modify() # so that hash(m) changes
print(d[m]) # keyerror, even though m in d.keys()
> As far as I can see it, any "obvious" fix for this creates a whole
> bath tub full of worms: updating views transitively, adding new APIs,
> supporting new buffer parameters, ...
So above (in this thread): the easiest fix is to just drop hashing
support for memoryview objects. This can be extended in steps:
- support hashing if the underlying object is bytes
- support hashing if the shape is 1D "B", and the underlying object
is hashable
- support hashing if the exporter has somehow promised to not modify
the memory
|
|||
| msg169622 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年09月01日 11:22 | |
Here's a patch that enforces byte formats. Does everyone agree on (or tolerate at least) allowing 'B', 'b' and 'c'? I think we should at least commit the doc patch for 3.3.0. Otherwise implementors of exporting objects might waste time on a feature that's deprecated. |
|||
| msg169626 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2012年09月01日 12:13 | |
+1 for docs patch for 3.3.0 and then enforcing the format restriction for 3.3.1. |
|||
| msg169627 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年09月01日 12:40 | |
New changeset 895e123d9476 by Stefan Krah in branch 'default': Issue #15814: Document planned restrictions for memoryview hashes in 3.3.1. http://hg.python.org/cpython/rev/895e123d9476 |
|||
| msg169633 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2012年09月01日 13:52 | |
Am 01.09.12 13:22, schrieb Stefan Krah: > Does everyone agree on (or tolerate at least) allowing 'B', 'b' and 'c'? Why be more permissive than necessary? -0 on the committed version; it should IMO further restrict it to 1D contiguous byte arrays. |
|||
| msg169636 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年09月01日 14:24 | |
Martin v. L??wis <report@bugs.python.org> wrote: > Why be more permissive than necessary? -0 on the committed version; > it should IMO further restrict it to 1D contiguous byte arrays. Does "byte arrays" include 'b' and 'c' or just 'B'? I don't see a reason to allow 'B' but not the others. I'm +-0 on allowing multi-dimensional arrays, but it would be odd to restrict hashing to contiguous arrays: >>> b = b'abcdefhhijkl' >>> m = memoryview(b) >>> b[::-1] b'lkjihhfedcba' >>> bytes(m[::-1]) b'lkjihhfedcba' >>> hash(b[::-1]) == hash(m[::-1]) True My reasoning was: If non-contiguous arrays are allowed (and I think they should be), why not allow multi-dimensional arrays, too? The definition hash(m) == hash(m.tobytes()) is pretty straightforward. |
|||
| msg169638 - (view) | Author: Alexander Belopolsky (Alexander.Belopolsky) | Date: 2012年09月01日 14:42 | |
On Sep 1, 2012, at 10:24 AM, Stefan Krah <report@bugs.python.org> wrote: > > The definition hash(m) == hash(m.tobytes()) is pretty straightforward. I probably missed something from the early discussion, but doesn't this definition only work for 1d (or 0d) views? Shouldn't shape come into calculation for multidimensional views? |
|||
| msg169641 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年09月01日 15:06 | |
tobytes() is the same as the flattened multi-dimensional list representation with all elements converted to bytes. If I'm not mistaken, that's how NumPy's tostring() behaves. |
|||
| msg169642 - (view) | Author: Alexander Belopolsky (Alexander.Belopolsky) | Date: 2012年09月01日 15:14 | |
On Sep 1, 2012, at 11:06 AM, Stefan Krah <report@bugs.python.org> wrote: > tobytes() is the same as the flattened multi-dimensional list representation > with all elements converted to bytes. This is correct, but why is it desirable to have deliberate hash collisions between views with different shapes? |
|||
| msg169643 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2012年09月01日 15:16 | |
Keep in mind that its OK if hash(m) == hash(m.tobytes()) in some cases where "m != m.tobytes()". The only cases we really need to kill are those that break the hash invariant. I don't like the idea of making the definition of hash more complicated just to rule out cases that aren't actually broken. |
|||
| msg169644 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年09月01日 15:29 | |
> why is it desirable to have deliberate hash collisions between views with different shapes? Since we're now restricting everything to bytes, the multi-dimensional case is probably not useful at all. As I said above, I would leave it in because it actually saves one check for ndim and as Nick said, the definition is simpler. Now the user only has to remember that hashing works for 'B', 'b' and 'c'. |
|||
| msg169645 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2012年09月01日 16:06 | |
Am 01.09.12 16:24, schrieb Stefan Krah: > Does "byte arrays" include 'b' and 'c' or just 'B'? I don't see a reason > to allow 'B' but not the others. Either type is fine with me. It's the multi-dimensional aspect I'd like to ban. > My reasoning was: If non-contiguous arrays are allowed (and I think they > should be), why not allow multi-dimensional arrays, too? I think neither should be allowed. The explicit request (from Antoine) was that memoryviews of bytes objects should be hashable. I haven't heard anybody asking for something more general (but I could see that people want to hash other memory blocks as well, such as mmap views). I can see that it is tempting to provide the most general definition, but see where this approach got us. It's always easier to widen an interface later than to narrow it. |
|||
| msg169648 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年09月01日 17:20 | |
Disallowing non-contiguous arrays leads to very strange situations though.
I'm positive that there will be a bug report about this:
>>> x = memoryview(b'abc')[::-1]
>>> b = b'cba'
>>> d = {b'cba': 101}
>>>
>>> b in d
True
>>> x == b
True
>>> x in d
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: memoryiew: hashing is restricted to C-contiguous arrays
Could we perhaps take a small poll? My own vote is:
1) Allow bytes hashing at all: +0.5
2) If 1) is allowed, then also non-contiguous hashing is allowed: +1
3) Allow multi-dimensional hashing: +-0
|
|||
| msg169654 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2012年09月01日 17:44 | |
Am 01.09.12 19:20, schrieb Stefan Krah:
> Disallowing non-contiguous arrays leads to very strange situations though.
I don't find that strange. That two object compare equal doesn't imply
that they both hash - only that *if* they hash, they should hash equal.
In any case, this can happen already:
py> x = memoryview(array.array('B',b'cba'))
py> b = b'cba'
py> d = {b'cba': 101}
py> b in d
True
py> x == b
True
py> x in d
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: cannot hash writable memoryview object
It can also happen with other types (although I had to look around
a bit):
py> x=set((1,2,3))
py> b=frozenset(x)
py> d={b:101}
py> b in d
True
py> x==b
True
py> x in d
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'set'
> 1) Allow bytes hashing at all: +0.5
+0
> 2) If 1) is allowed, then also non-contiguous hashing is allowed: +1
-1
> 3) Allow multi-dimensional hashing: +-0
-1
|
|||
| msg169655 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年09月01日 18:04 | |
> Could we perhaps take a small poll? My own vote is: > > 1) Allow bytes hashing at all: +0.5 +10. The buffer() object in 2.x was hashable, and a very important use case of memoryview is replacing buffer(). > 2) If 1) is allowed, then also non-contiguous hashing is allowed: +1 > 3) Allow multi-dimensional hashing: +-0 I don't care about these. I kind of agree with Martin that hashability should be motivated by an use case, so -0. |
|||
| msg169657 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年09月01日 18:06 | |
> py> x = memoryview(array.array('B',b'cba'))
I find the array example is different. The user has to remember one thing:
memoryviews based on arrays don't hash.
For memoryviews based on bytes one would have to remember:
- 'B', 'c' and 'b' hash
- only C-contiguous arrays hash
- b'abc'[::-1] hashes, but memoryview(b'abc')[::-1] does not
|
|||
| msg169659 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2012年09月01日 18:22 | |
Am 01.09.12 20:06, schrieb Stefan Krah: > - b'abc'[::-1] hashes, but memoryview(b'abc')[::-1] does not I find that memoryview(b'abc')[::-1] is a strange thing to have, anyway, so I'm not bothered by it behaving different. I can accept that it needs to be supported for consistency, but I would recommend against using it in applications. Who needs reversed memory, anyway? The only case were I ever needed reversal is lists (typically sorted, perhaps cronological). |
|||
| msg169682 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2012年09月02日 01:47 | |
+1 for allowing bytes hashing. As Antoine noted, the 1D bytes variant of memoryview() fills the role previously handled by buffer(). +1 for allowing 1D non-contiguous hashing. This is from a simplicity perspective, as I don't want to have to explain to people why "hash(memoryview(b'12'[::2]))", "hash(memoryview(b'123'[::2]))" and "hash(memoryview(b'12')[::2])" all work, but "hash(memoryview(b'123')[::2])" fails. -1 for allowing hashing when ndim > 1. We don't have a solid use case, and there's no odd value based corner cases to be considered. |
|||
| msg169693 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年09月02日 12:44 | |
The totals are +11.5 :) for hashing, +1 for allowing non-contiguous and -2 for multi-dimensional. I'll update the docs soon. |
|||
| msg169694 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年09月02日 12:57 | |
New changeset 3b2597c1fe35 by Stefan Krah in branch 'default': Issue #15814: Documentation: disallow hashing of multi-dimensional memoryviews. http://hg.python.org/cpython/rev/3b2597c1fe35 |
|||
| msg169700 - (view) | Author: Alexander Belopolsky (Alexander.Belopolsky) | Date: 2012年09月02日 14:21 | |
On Sep 2, 2012, at 8:44 AM, Stefan Krah <report@bugs.python.org> wrote: > The totals are +11.5 :) for hashing, +1 for allowing non-contiguous and > -2 for multi-dimensional I have refrained from voting because in my line of work buffers or memoryviews deal with large objects that rarely serve as dictionary keys. As a result, I have zero experince with hashing of buffers. This observation supports the current consensus to limit hashing to 1d and 0d cases. My only concern is that with hash(m) == hash(m.tobytes()) implementing multidimensional restriction will require artificial if ndim > 1 check and an extra sentence in the docs while not simplifying anything. |
|||
| msg169754 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2012年09月03日 11:42 | |
The main issue is that it's not quite clear how to deal with problems like C-style vs FORTRAN-style memory layouts and strides vs suboffsets in defining multidimensional hash equality. Without a use case, it's easier to just punt on the question and declare it illegal. The 1D hashing case really just comes from wanting to have 1D bytes views behave as much like a bytes object as is practical. |
|||
| msg169755 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年09月03日 11:46 | |
New changeset c9c9d890400c by Nick Coghlan in branch 'default': Issue #15814: Add NEWS entry regarding intended memoryview hashing restrictions http://hg.python.org/cpython/rev/c9c9d890400c |
|||
| msg169757 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年09月03日 12:38 | |
Small nitpick: multi-dimensional hashing wasn't really accidental, it was perfectly aligned with the previous statically typed equality definition. When I suggested PyBuffer_Hash = hash(obj.tobytes()) on python-dev for non-contiguous and multi-dimensional arrays, I got a +1 from Raymond (which I now see was a private message). I don't see what could possibly be ill-defined about using the tobytes() definition for ND-arrays. In all places memoryview now uses the logical array, which is displayed by tolist(). Leaving aside static typing, both the previous and the new equality definitions regarded C and Fortran arrays with the same list outputs as equal. |
|||
| msg169760 - (view) | Author: Alexander Belopolsky (Alexander.Belopolsky) | Date: 2012年09月03日 13:01 | |
On Mon, Sep 3, 2012 at 8:38 AM, Stefan Krah <report@bugs.python.org> wrote: > I don't see what could possibly be ill-defined about using the > tobytes() definition for ND-arrays. In all places memoryview now > uses the logical array, which is displayed by tolist(). +1. The key restriction is that to byte views. ND aspect does not complicate the logic once .tobytes() is properly implemented. |
|||
| msg169800 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2012年09月03日 19:59 | |
Am 02.09.2012 16:21, schrieb Alexander Belopolsky: > I have refrained from voting because in my line of work buffers or > memoryviews deal with large objects that rarely serve as dictionary > keys. As a result, I have zero experince with hashing of buffers. > This observation supports the current consensus to limit hashing to > 1d and 0d cases. My only concern is that with hash(m) == > hash(m.tobytes()) implementing multidimensional restriction will > require artificial if ndim > 1 check and an extra sentence in the > docs while not simplifying anything. As for the "not simplifying argument": if hashing was restricted to contiguous bytes, then the implementation would certainly be simplified quite a bit: currently, if it's not contiguous, it needs to make a separate copy and hash that. This code could go away if hashing would only work for true memory "blocks". |
|||
| msg169801 - (view) | Author: Alexander Belopolsky (Alexander.Belopolsky) | Date: 2012年09月03日 20:21 | |
On Mon, Sep 3, 2012 at 3:59 PM, Martin v. Löwis <report@bugs.python.org> wrote: > if hashing was restricted > to contiguous bytes, then the implementation would certainly be > simplified quite a bit: currently, if it's not contiguous, it needs > to make a separate copy and hash that. I would be happy if this whole business of "strided" memory layouts would go away. I find it to be a failed attempt of over-generalization. It is failed because it does not cover many important formats even in 2-d: sparse matrices, symmetrical matrices, banded matrices, etc., etc. (Check level 3 BLAS or NAG for a longer list.) It is an over-generalizarion because in practice it is often more efficient to gather your data in contiguous memory, perform vector calculations and then scatter the results than to operate on scattered data to avoid copying. I would much rather see a generalization of suboffsets that would allow me to have views into arrays of pointers to variable width arrays than strides. |
|||
| msg170051 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年09月08日 13:41 | |
New changeset ca81b9a3a015 by Stefan Krah in branch 'default': Issue #15814: Update whatsnew to the current state of hashing memoryviews. http://hg.python.org/cpython/rev/ca81b9a3a015 |
|||
| msg170052 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年09月08日 13:51 | |
Georg, thanks for including all changes that I've asked for! If you still have the patience for the constant stream of memoryview doc changes, there are three new ones that might be worth including: 3b2597c1fe35 c9c9d890400c ca81b9a3a015 |
|||
| msg170092 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年09月09日 09:18 | |
New changeset 7734eb2707a1 by Stefan Krah in branch 'default': Issue #15814: Document planned restrictions for memoryview hashes in 3.3.1. http://hg.python.org/cpython/rev/7734eb2707a1 New changeset 71f4d80400f2 by Stefan Krah in branch 'default': Issue #15814: Documentation: disallow hashing of multi-dimensional memoryviews. http://hg.python.org/cpython/rev/71f4d80400f2 New changeset 3ffd6ad93fe4 by Nick Coghlan in branch 'default': Issue #15814: Add NEWS entry regarding intended memoryview hashing restrictions http://hg.python.org/cpython/rev/3ffd6ad93fe4 New changeset 88a0792e8ba3 by Stefan Krah in branch 'default': Issue #15814: Update whatsnew to the current state of hashing memoryviews. http://hg.python.org/cpython/rev/88a0792e8ba3 |
|||
| msg173201 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2012年10月17日 20:16 | |
The 3.3.0 docs now state "Note Hashing of memoryviews with formats other than ‘B’, ‘b’ or ‘c’ as well as hashing of multi-dimensional memoryviews is possible in version 3.3.0, but will raise an error in 3.3.1 in order to be compatible with the new memoryview equality definition." I assume 3.3.1 should read 3.4, or have I missed something? |
|||
| msg173203 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2012年10月17日 20:24 | |
3.3.1 is correct, you apparently missed msg169425. |
|||
| msg174542 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年11月02日 17:07 | |
New changeset 969069d464bc by Stefan Krah in branch '3.3': Issue #15814: Use hash function that is compatible with the equality http://hg.python.org/cpython/rev/969069d464bc |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:35 | admin | set | github: 60018 |
| 2012年11月02日 17:25:55 | skrah | set | status: open -> closed resolution: fixed stage: needs patch -> resolved |
| 2012年11月02日 17:07:02 | python-dev | set | messages: + msg174542 |
| 2012年10月17日 20:24:04 | loewis | set | messages: + msg173203 |
| 2012年10月17日 20:16:21 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages: + msg173201 |
| 2012年09月09日 09:18:58 | python-dev | set | messages: + msg170092 |
| 2012年09月08日 13:51:43 | skrah | set | messages: + msg170052 |
| 2012年09月08日 13:41:24 | python-dev | set | messages: + msg170051 |
| 2012年09月03日 20:21:49 | Alexander.Belopolsky | set | messages: + msg169801 |
| 2012年09月03日 19:59:02 | loewis | set | messages: + msg169800 |
| 2012年09月03日 13:04:14 | scoder | set | nosy:
- scoder |
| 2012年09月03日 13:01:20 | Alexander.Belopolsky | set | messages: + msg169760 |
| 2012年09月03日 12:38:18 | skrah | set | messages: + msg169757 |
| 2012年09月03日 11:46:44 | python-dev | set | messages: + msg169755 |
| 2012年09月03日 11:42:51 | ncoghlan | set | messages: + msg169754 |
| 2012年09月02日 14:21:49 | Alexander.Belopolsky | set | messages: + msg169700 |
| 2012年09月02日 12:57:19 | python-dev | set | messages: + msg169694 |
| 2012年09月02日 12:44:30 | skrah | set | messages: + msg169693 |
| 2012年09月02日 01:47:15 | ncoghlan | set | messages: + msg169682 |
| 2012年09月01日 18:22:28 | loewis | set | messages: + msg169659 |
| 2012年09月01日 18:06:32 | skrah | set | messages: + msg169657 |
| 2012年09月01日 18:04:54 | pitrou | set | messages: + msg169655 |
| 2012年09月01日 17:44:06 | loewis | set | messages: + msg169654 |
| 2012年09月01日 17:20:22 | skrah | set | messages: + msg169648 |
| 2012年09月01日 16:06:18 | loewis | set | messages: + msg169645 |
| 2012年09月01日 15:29:49 | skrah | set | messages: + msg169644 |
| 2012年09月01日 15:16:24 | ncoghlan | set | messages: + msg169643 |
| 2012年09月01日 15:14:48 | Alexander.Belopolsky | set | messages: + msg169642 |
| 2012年09月01日 15:06:29 | skrah | set | messages: + msg169641 |
| 2012年09月01日 14:42:11 | Alexander.Belopolsky | set | nosy:
+ Alexander.Belopolsky messages: + msg169638 |
| 2012年09月01日 14:24:21 | skrah | set | messages: + msg169636 |
| 2012年09月01日 13:52:15 | loewis | set | messages: + msg169633 |
| 2012年09月01日 12:40:19 | python-dev | set | nosy:
+ python-dev messages: + msg169627 |
| 2012年09月01日 12:13:24 | ncoghlan | set | messages: + msg169626 |
| 2012年09月01日 11:22:16 | skrah | set | files:
+ issue15814-2.diff messages: + msg169622 |
| 2012年08月31日 10:01:43 | loewis | set | messages: + msg169515 |
| 2012年08月31日 08:42:29 | skrah | set | messages: + msg169510 |
| 2012年08月31日 08:26:49 | skrah | set | messages: + msg169508 |
| 2012年08月31日 04:14:01 | scoder | set | messages: + msg169499 |
| 2012年08月31日 03:47:32 | Dag.Sverre.Seljebotn | set | messages: + msg169498 |
| 2012年08月31日 03:31:31 | Dag.Sverre.Seljebotn | set | nosy:
+ Dag.Sverre.Seljebotn messages: + msg169497 |
| 2012年08月30日 13:01:51 | loewis | set | messages: + msg169451 |
| 2012年08月30日 11:36:02 | skrah | set | files:
+ issue15814-doc.diff keywords: + patch |
| 2012年08月30日 11:35:31 | skrah | set | messages: + msg169450 |
| 2012年08月30日 09:26:10 | skrah | set | nosy:
+ scoder |
| 2012年08月30日 09:24:48 | skrah | set | messages: + msg169440 |
| 2012年08月30日 00:26:02 | ncoghlan | set | messages: + msg169425 |
| 2012年08月29日 23:36:08 | belopolsky | set | nosy:
+ belopolsky |
| 2012年08月29日 20:50:40 | loewis | set | messages: + msg169417 |
| 2012年08月29日 20:27:31 | skrah | set | messages: + msg169416 |
| 2012年08月29日 20:04:09 | skrah | set | messages: + msg169415 |
| 2012年08月29日 19:33:47 | loewis | set | messages: + msg169411 |
| 2012年08月29日 19:06:28 | pitrou | set | messages: + msg169409 |
| 2012年08月29日 18:59:10 | loewis | set | messages: + msg169408 |
| 2012年08月29日 18:01:27 | pitrou | set | messages: + msg169404 |
| 2012年08月29日 17:54:45 | loewis | set | messages: + msg169403 |
| 2012年08月29日 17:07:08 | skrah | create | |