homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Document "suboffsets if needed" in 2.7
Type: enhancement Stage: resolved
Components: Documentation Versions: Python 3.4, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, pitrou, python-dev, rhansen, seberg, skrah
Priority: low Keywords: patch

Created on 2015年01月30日 01:33 by rhansen, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
PyBuffer_IsContiguous.patch rhansen, 2015年01月30日 01:33 patch against 2.7 that fixes this bug
PyBuffer_IsContiguous_v2.patch rhansen, 2015年01月30日 05:45 patch v2 (against master) that fixes this bug review
doc_c-api_buffer.patch rhansen, 2015年02月01日 08:14 documentation patch based on msg235141 review
Messages (23)
msg235014 - (view) Author: Richard Hansen (rhansen) * Date: 2015年01月30日 01:33
According to https://docs.python.org/2/c-api/buffer.html#the-new-style-py-buffer-struct if the suboffsets member of Py_buffer is non-NULL and all members of the array are negative, the buffer may be contiguous.
PyBuffer_IsContiguous() does not behave this way. It assumes that if the suboffsets member is non-NULL then at least one member of the array is non-negative, and thus assumes that the buffer is non-contiguous. This is not always correct.
One consequence of this bug is PyBuffer_ToContiguous() (and thus memoryview.tobytes()) falls back to slow (and currently buggy, see issue #23349) memory copying code.
Attached is a patch that fixes this bug. The patch is against 2.7 but can be trivially modified to apply to 3.x.
msg235016 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015年01月30日 01:39
The patch has an obvious syntax error :-)
Other than that, adding a comment would be nice. Bonus points if you can write a test (3.x has infrastructure for that, see Lib/test/test_buffer.py).
msg235017 - (view) Author: Richard Hansen (rhansen) * Date: 2015年01月30日 01:42
> The patch has an obvious syntax error :-)
Doh!
> Other than that, adding a comment would be nice.
Agreed, will do.
> Bonus points if you can write a test (3.x has infrastructure for that, see Lib/test/test_buffer.py).
I'll take a look. Thanks!
msg235020 - (view) Author: Richard Hansen (rhansen) * Date: 2015年01月30日 05:45
I've attached a new version of the patch. Suggestions for simplifying the test code would be appreciated.
msg235032 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015年01月30日 13:10
PEP-3118 says:
Py_buffer.suboffsets:
"If all suboffsets are negative (i.e. no de-referencing is needed, then this must be NULL (the default value)."
I would be inclined to go with the PEP here and make a doc fix instead.
msg235034 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015年01月30日 14:29
For the record: I'm myself guilty of accepting all-negative suboffsets
in memoryview.c (see init_slice()) and I think there's even a test for it.
However, while it's ok for a specific function to accept this
corner case, I'm not sure about is_contiguous().
People might rely on the fact that contiguous implies
suboffsets==NULL.
Sebastian, did this special case ever come up in the context of
NumPy?
msg235052 - (view) Author: Richard Hansen (rhansen) * Date: 2015年01月30日 20:10
> People might rely on the fact that contiguous implies suboffsets==NULL.
Cython (currently) relies on all-negatives being acceptable and equivalent to suboffsets==NULL. See:
https://github.com/cython/cython/pull/367
http://thread.gmane.org/gmane.comp.python.cython.devel/15569 
msg235053 - (view) Author: Sebastian Berg (seberg) * Date: 2015年01月30日 20:36
Numpy does not understand suboffsets. The buffers we create will always have them NULL. The other way around.... To be honest, think it is probably ignoring the whole fact that they might exist at all :/, really needs to be fixed if it is the case.
msg235107 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015年01月31日 13:03
Cython doesn't follow the spec though (use Python 3):
from _testbuffer import *
cpdef foo():
 cdef unsigned char[:] v = bytearray(b"testing")
 nd = ndarray(v, getbuf=PyBUF_ND)
 print(nd.suboffsets)
 nd = ndarray(v, getbuf=PyBUF_FULL)
 print(nd.suboffsets)
Cython hands out suboffsets even for a PyBUF_ND request, which is
definitely wrong. For a PyBUF_FULL request they should only be
provided *if needed*.
msg235111 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015年01月31日 14:53
To summarize, I think this is different from #22445, which also requests
relaxed contiguity tests. #22445 is motivated by the fact that slicing
can create a certain class of contiguous buffers that aren't recognized
as such.
No such reason exists here: All-negative suboffsets are only created
if a buffer provider chooses to hand them out instead of NULL. To my
knowledge only Cython does this, and it's against the PEP (though
strictly speaking not against the docs).
So I'm -0.5 on this change in general and -1 for 2.7.
I'd reject the issue, but let's wait for Antoine's opinion.
msg235122 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015年01月31日 17:53
Prepare for a partial reversal of opinion. :)
Indexing *can* actually create all-negative suboffsets in a natural way:
>>> from _testbuffer import *
>>> nd = ndarray([1,2,3,4,5,6,7,8,9,10,11,12], shape=[3,4], flags=ND_PIL)
>>> nd.tolist()
[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]
>>> 
>>> x = nd[1]
>>> x.tolist()
[5, 6, 7, 8]
>>> x.suboffsets
(-1,)
>>> 
This leaves me +-0 for the change, with the caveat that applications
might break.
msg235125 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015年01月31日 18:51
I don't have an opinion on this. I've never seen suboffsets in use; but it seems reasonable to detect a "dummy suboffsets" array and recognize it as contiguous.
msg235141 - (view) Author: Richard Hansen (rhansen) * Date: 2015年02月01日 02:29
(The following message is mostly off-topic but I think it is relevant to those interested in this issue. This message is about the clarity of the documentation regarding flag semantics, and what I think the flags should mean.)
> Cython doesn't follow the spec though (use Python 3):
> 
> from _testbuffer import *
> cpdef foo():
> cdef unsigned char[:] v = bytearray(b"testing")
> nd = ndarray(v, getbuf=PyBUF_ND)
> print(nd.suboffsets)
> nd = ndarray(v, getbuf=PyBUF_FULL)
> print(nd.suboffsets)
When I compile and run the above (latest Cython from Git master), I get:
 ()
 ()
Looking at the Python source code (Modules/_testbuffer.c ndarray_get_suboffsets() and ssize_array_as_tuple()) the above output is printed if the suboffsets member is NULL.
> Cython hands out suboffsets even for a PyBUF_ND request, which is
> definitely wrong. For a PyBUF_FULL request they should only be
> provided *if needed*.
suboffsets appears to be NULL in both cases in your example, which seems acceptable to me given:
 * the user didn't request suboffsets information in the PyBUF_ND
 case, and
 * there are no indirections in the PyBUF_FULL case.
Even if suboffsets wasn't NULL, I believe the behavior would still be correct for the PyBUF_ND case. My reading of <https://docs.python.org/3/c-api/buffer.html> is that flags=PyBUF_ND implies that shape member MUST NOT be NULL, but strides and suboffsets can be NULL or not (it doesn't matter). This interpretation of mine is due to the sentence, "Note that each flag contains all bits of the flags below it." If flags=PyBUF_ND meant that strides and suboffsets MUST be NULL, then PyBUF_ND and PyBUF_STRIDES would necessarily be mutually exclusive and flags=(PyBUF_ND|PyBUF_STRIDES) would not make sense (the strides member would have to be both NULL and non-NULL).
If (flags & PyBUF_INDIRECT) is false, then the consumer is not interested in the suboffsets member so it shouldn't matter what it points to. (And the consumer should not dereference the pointer in case it points to a junk address.)
IMHO, if the buffer is C-style contiguous then the producer should be allowed to populate the shape, strides, and suboffsets members regardless of whether or not any of the PyBUF_ND, PyBUF_STRIDES, or PyBUF_INDIRECT flags are set. In other words, for C-style contiguous buffers, the producer should be allowed to act as if PyBUF_INDIRECT was always provided because the consumer will always get an appropriate Py_buffer struct regardless of the actual state of the PyBUF_ND, PyBUF_STRIDES, and PyBUF_INDIRECT flags.
It *is* a bug, however, to dereference the strides or suboffsets members with ndarray(v, getbuf=PyBUF_ND) because the consumer didn't ask for strides or suboffsets information and the pointers might be bogus.
Stepping back a bit, I believe that the flags should be thought of as imposing requirements on the producer. I think those requirements should be (assuming ndim > 0):
 * PyBUF_ND: If (flags & PyBUF_ND) is true:
 - If (flags & PyBUF_STRIDES) is false *and* the producer is
 unable to produce a block of memory at [buf,buf+len)
 containing all (len/itemsize) entries in a C-style contiguous
 chunk, then the producer must raise an exception.
 - Otherwise, the producer must provide the shape of buffer.
 If (flags & PyBUF_ND) is false:
 - If the producer is unable to produce a contiguous chunk of
 (len/itemsize) entries (of any shape) in the block of memory
 at [buf,buf+len), then the producer must raise an exception.
 - Otherwise, the producer is permitted to do any of the
 following:
 + don't touch the shape member (don't set it to NULL or any
 other value; just leave it as-is)
 + set the shape member to NULL
 + set the shape member to point to an array describing the
 shape
 + set the shape member to point to any other location, even
 if dereferencing the pointer would result in a segfault
 * PyBUF_STRIDES: If (flags & PyBUF_STRIDES) is true:
 - The producer must provide the appropriate strides information.
 If (flags & PyBUF_STRIDES) is false:
 - If the producer is unable to produce a block of memory at
 [buf,buf+len) containing all (len/itemsize) entries, the
 producer must raise an exception.
 - Otherwise, the producer is permitted to do any of the
 following;
 + don't touch the strides member (don't set it to NULL or
 any other value; just leave it as-is)
 + set the strides member to NULL
 + set the strides member to point to an array describing the
 strides
 + set the strides member to point to any other location,
 even if dereferencing the pointer would result in a
 segfault
 * PyBUF_INDIRECT: If (flags & PyBUF_INDIRECT) is true:
 - If the buffer uses indirections then the producer must set the
 suboffsets member to point to an array with appropriate
 entries.
 - Otherwise, the producer can either set the suboffsets member
 to NULL or set it to point to an array of all negative
 entries.
 If (flags & PyBUF_INDIRECT) is false:
 - If the producer cannot produce a buffer that does not have
 indirections, then the producer must raise an exception.
 - Otherwise, the producer is permitted to do any of the
 following:
 + don't touch the suboffsets member (don't set it to NULL or
 any other value; just leave it as-is)
 + set the suboffsets member to NULL
 + set the suboffsets member to point to an array of all
 negative entries
 + set the suboffsets member to point to any other location,
 even if dereferencing the pointer would result in a
 segfault
msg235145 - (view) Author: Richard Hansen (rhansen) * Date: 2015年02月01日 07:57
Attached is a documentation patch that adds what I said in msg235141. I doubt everyone will agree with the changes, but maybe it will be a useful starting point.
(Despite not having an asterisk next to my username, I have signed the contributor agreement. It just hasn't been processed yet.)
msg235146 - (view) Author: Richard Hansen (rhansen) * Date: 2015年02月01日 08:45
> This leaves me +-0 for the change, with the caveat that applications
> might break.
How might an application break with this change? Compared to the current PyBuffer_IsContiguous(), the patched version is the same except it returns true for a wider range of actually-contiguous buffers.
msg235148 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015年02月01日 08:59
> Richard Hansen added the comment:
> > Cython doesn't follow the spec though (use Python 3):
> > 
> > from _testbuffer import *
> > cpdef foo():
> > cdef unsigned char[:] v = bytearray(b"testing")
> > nd = ndarray(v, getbuf=PyBUF_ND)
> > print(nd.suboffsets)
> > nd = ndarray(v, getbuf=PyBUF_FULL)
> > print(nd.suboffsets)
> 
> When I compile and run the above (latest Cython from Git master), I get:
> 
> ()
> ()
With Cython version 0.20.1post0 I get:
>>> foo.foo()
(-1,)
(-1,)
If you get the correct output from the latest Cython, it looks like this
issue has been fixed.
msg235150 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015年02月01日 09:10
> How might an application break with this change?
func1:
 if (!PyBuffer_IsContiguous(view, 'C'))
 error();
 func2(view);
func2:
 assert(view->suboffsets == NULL);
 ...
Yes, I *did* insert sanity checks like this in some places.
msg235151 - (view) Author: Richard Hansen (rhansen) * Date: 2015年02月01日 09:21
>> When I compile and run the above (latest Cython from Git master), I
>> get:
>> 
>> ()
>> ()
> 
> With Cython version 0.20.1post0 I get:
> 
> >>> foo.foo()
> (-1,)
> (-1,)
> 
> If you get the correct output from the latest Cython, it looks like
> this issue has been fixed.
Oops, I was running a locally modified version of Cython that contained a patch meant to work around issue #23349.
When I run the *actual* upstream master I get the same behavior that you do.
But either way, I don't see why it's a problem that it prints (-1,) for the PyBUF_ND case. The consumer didn't request suboffsets information so I would expect it to be OK for the producer to put whatever it wants in the suboffsets field, including a junk pointer that would segfault upon dereference.
msg235152 - (view) Author: Richard Hansen (rhansen) * Date: 2015年02月01日 09:23
>> How might an application break with this change?
> 
> assert(view->suboffsets == NULL);
Fair point.
msg235153 - (view) Author: Sebastian Berg (seberg) * Date: 2015年02月01日 09:29
I do not have an opinion either way, but I do think there is a small difference here compared to the other issue. With the other issue, there are cases where we cannot set the strides correctly.
If you ask numpy directly whether the array is contiguous (or create it from numpy) and it says "yes", and then you get it without asking for a contiguous array it is, without the change, possible to get an array that would *not* be considered contiguous.
On the other hand, setting suboffsets to NULL when all are -1 is always possible, if tedious I admit.
msg235155 - (view) Author: Richard Hansen (rhansen) * Date: 2015年02月01日 09:39
My preference is to apply the patch, of course. There is a legitimate concern that it will break existing code, but I think there are more points in favor of applying the patch:
 * there exists code that the current behavior is known to break
 * it's easier to remove assert(view->suboffsets == NULL) than to
 change producers to not provide an all-negative array (though I
 admit there may be other more subtle ways the patch would break
 existing code)
 * I may be the only one who has spoken up about this, but I doubt
 I'm the only one who has experienced it
msg235161 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015年02月01日 11:36
But only Cython does not set suboffsets to NULL and you already have a small 
patch to fix that.
The Python 3 docs say "suboffsets only if needed" and the PEP says the same,
so the situation is not completely undocumented.
I think your doc patch goes too far. Buffers propagate in unpredictable ways, 
and since the original consumer request is *not stored* in the buffer, setting
unneeded fields to NULL provides a way to figure out the original request.
It is actually *an optimization* to set suboffsets to NULL: Compliant code
that uses arbitrary buffers always has to check for:
 if (suboffsets != NULL && suboffsets[i] >= 0)
 ...
If suboffsets are consistently NULL, at least hopefully you get branch 
prediction to kick in.
As Sebastian pointed out, it's relatively easy even for slicing/indexing
functions to check for all-negative suboffsets and handle that case.
All this is probably academic, since no one appears to be using suboffsets
at all. :)
Let's keep the status-quo and make this a doc issue for 2.7.
msg235194 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015年02月01日 18:50
New changeset c4c1d68b6301 by Stefan Krah in branch '2.7':
Issue #23352: Document that Py_buffer.suboffsets must be NULL if no suboffsets
https://hg.python.org/cpython/rev/c4c1d68b6301
New changeset de5c8ee002bf by Stefan Krah in branch '3.4':
Issue #23352: Document that Py_buffer.suboffsets must be NULL if no suboffsets
https://hg.python.org/cpython/rev/de5c8ee002bf
New changeset 5d097a74766f by Stefan Krah in branch 'default':
Issue #23352: Merge from 3.4.
https://hg.python.org/cpython/rev/5d097a74766f 
History
Date User Action Args
2022年04月11日 14:58:12adminsetgithub: 67541
2015年02月02日 17:05:41skrahsetstatus: open -> closed
resolution: fixed
stage: resolved
2015年02月01日 18:50:57python-devsetnosy: + python-dev
messages: + msg235194
2015年02月01日 11:36:19skrahsetpriority: normal -> low

type: behavior -> enhancement
assignee: docs@python
components: + Documentation, - Interpreter Core
title: PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL -> Document "suboffsets if needed" in 2.7
nosy: + docs@python

messages: + msg235161
stage: patch review -> (no value)
2015年02月01日 09:39:42rhansensetmessages: + msg235155
2015年02月01日 09:29:13sebergsetmessages: + msg235153
2015年02月01日 09:23:25rhansensetmessages: + msg235152
2015年02月01日 09:21:18rhansensetmessages: + msg235151
2015年02月01日 09:10:04skrahsetmessages: + msg235150
2015年02月01日 08:59:39skrahsetmessages: + msg235148
2015年02月01日 08:45:35rhansensetmessages: + msg235146
2015年02月01日 08:14:30rhansensetfiles: + doc_c-api_buffer.patch
2015年02月01日 08:14:14rhansensetfiles: - doc_c-api_buffer.patch
2015年02月01日 08:05:41rhansensetfiles: + doc_c-api_buffer.patch
2015年02月01日 08:05:16rhansensetfiles: - doc_c-api_buffer.patch
2015年02月01日 07:57:34rhansensetfiles: + doc_c-api_buffer.patch

messages: + msg235145
2015年02月01日 02:29:22rhansensetmessages: + msg235141
2015年01月31日 18:51:52pitrousetmessages: + msg235125
2015年01月31日 17:53:23skrahsetmessages: + msg235122
2015年01月31日 14:53:58skrahsetmessages: + msg235111
2015年01月31日 13:03:16skrahsetmessages: + msg235107
2015年01月30日 20:36:31sebergsetmessages: + msg235053
2015年01月30日 20:10:03rhansensetmessages: + msg235052
2015年01月30日 14:29:25skrahsetnosy: + seberg
messages: + msg235034
2015年01月30日 13:10:57skrahsetmessages: + msg235032
2015年01月30日 05:52:11rhansensetversions: - Python 3.2, Python 3.3
2015年01月30日 05:51:37rhansensetversions: + Python 3.2, Python 3.3
2015年01月30日 05:45:06rhansensetfiles: + PyBuffer_IsContiguous_v2.patch

messages: + msg235020
2015年01月30日 01:42:25rhansensetmessages: + msg235017
2015年01月30日 01:39:50pitrousetversions: - Python 3.2, Python 3.3, Python 3.6
nosy: + skrah, pitrou

messages: + msg235016

stage: patch review
2015年01月30日 01:35:07rhansensettype: behavior
2015年01月30日 01:33:45rhansencreate

AltStyle によって変換されたページ (->オリジナル) /