This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年09月10日 12:23 by sbt, last changed 2022年04月11日 14:57 by admin.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| iobase_read.patch | sbt, 2012年09月10日 12:23 | review | ||
| iobase_read.patch | sbt, 2012年09月17日 20:17 | review | ||
| Messages (20) | |||
|---|---|---|---|
| msg170183 - (view) | Author: Richard Oudkerk (sbt) * (Python committer) | Date: 2012年09月10日 12:23 | |
Currently rawiobase_read() reads to a bytearray object and then copies the data to a bytes object. There is a TODO comment saying that the bytes object should be created directly. The attached patch does that. |
|||
| msg170226 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2012年09月10日 20:01 | |
It works as along as the bytes object cannot leak to Python code, (imagine a custom readinto() method which plays with gc.get_referrers, then calls hash(b)...) This is OK with this patch. |
|||
| msg170624 - (view) | Author: Richard Oudkerk (sbt) * (Python committer) | Date: 2012年09月17日 20:17 | |
New patch which checks the refcount of the memoryview and bytes object after calling readinto(). If either refcount is larger than the expected value of 1, then the data is copied rather than resized. |
|||
| msg170627 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年09月17日 22:30 | |
> If either refcount is larger than the expected value of 1, then the > data is copied rather than resized. I think that's a useless precaution. The bytes object cannot "leak" since you are using PyMemoryView_FromMemory(), which doesn't know about the original object. Out of curiousity, have you done any benchmarks? |
|||
| msg170628 - (view) | Author: Richard Oudkerk (sbt) * (Python committer) | Date: 2012年09月17日 23:04 | |
> I think that's a useless precaution. The bytes object cannot "leak" > since you are using PyMemoryView_FromMemory(), which doesn't know about > the original object. The bytes object cannot "leak" so, as you say, checking that refcount is pointless. But the view might "leak", and since it does not own a reference to the base object we have a problem: we can't deallocate the bytes object for fear of breaking the view. It looks like objects returned by PyMemoryView_FromMemory() must never be allowed to "leak", so I am not sure there are many circumstances in which PyMemoryView_FromMemory() is safe to use. Perhaps using PyBuffer_FillInfo() and PyMemory_FromBuffer() would keep alive the bytes object while the view is alive, without letting the bytes object "leak". > Out of curiousity, have you done any benchmarks? No. |
|||
| msg170629 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年09月17日 23:14 | |
> The bytes object cannot "leak" so, as you say, checking that refcount > is pointless. But the view might "leak", and since it does not own a > reference to the base object we have a problem: we can't deallocate the > bytes object for fear of breaking the view. Indeed, that's a problem (but your patch does deallocate the bytes object). It's quite fishy, I'm not sure how to solve the issue cleanly. Stefan, do you have an idea? |
|||
| msg170638 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年09月18日 08:33 | |
So the problem is that readinto(view) might result in several references to view? I don't think that can be solved on the memoryview side. One could do: view = PyMemoryView_FromObject(b); // Lie about writability ((PyMemoryViewObject *)view)->view.readonly = 0; [...] Then the view owns a reference to the bytes object. But that does not solve the problem that writable memoryviews based on a readonly object might be hanging around. |
|||
| msg170639 - (view) | Author: Richard Oudkerk (sbt) * (Python committer) | Date: 2012年09月18日 10:47 | |
> Then the view owns a reference to the bytes object. But that does not
> solve the problem that writable memoryviews based on a readonly object
> might be hanging around.
How about doing
PyObject_GetBuffer(b, &buf, PyBUF_WRITABLE);
view = PyMemoryView_FromBuffer(&buf);
// readinto view
PyBuffer_Release(&buf);
Would attempts to access a "leaked" reference to view now result in ValueError("operation forbidden on released memoryview object")? If so then I think this would be safe.
|
|||
| msg170642 - (view) | Author: Richard Oudkerk (sbt) * (Python committer) | Date: 2012年09月18日 12:49 | |
The current non-test uses of PyMemoryView_FromBuffer() are in _io.BufferedReader.read(), _io.BufferedWriter.write(), PyUnicode_Decode().
It looks like they can each be made to leak a memoryview that references a deallocated buffer. (Maybe the answer is Don't Do That.)
import codecs, sys
def decode(buf):
global view
view = buf
return codecs.latin_1_decode(buf)
def getregentry():
return codecs.CodecInfo(name='foobar', decode=decode,
encode=codecs.latin_1_encode)
@codecs.register
def search_function(encoding):
if encoding == 'foobar':
return codecs.CodecInfo(*getregentry())
b = b'hello'.upper()
b.decode('foobar')
print(view.tobytes()) # => b'HELLO'
del b
x = b'dump'.upper()
print(view.tobytes()) # => b'DUMP\x00'
import io, sys
class File(io.RawIOBase):
def readinto(self, buf):
global view
view = buf
n = len(buf)
buf[:] = b'x'*n
return n
def readable(self):
return True
f = io.BufferedReader(File())
f.read(1)
print(view[:5].tobytes()) # => b'xxxxx'
del f
print(view[:5].tobytes()) # => b'\xdd\xdd\xdd\xdd\xdd'
|
|||
| msg170658 - (view) | Author: Richard Oudkerk (sbt) * (Python committer) | Date: 2012年09月18日 18:19 | |
I am rather confused about the ownership semantics when one uses PyMemoryView_FromBuffer(). It looks as though PyMemoryView_FromBuffer() "steals" ownership of the buffer since, when the associated _PyManagedBufferObject is garbage collected, PyBuffer_Release() is called on its copy of the buffer info. However, the _PyManagedBufferObject does not own a reference of the base object, so one still needs to decref the base object (at some time when it is safe to do so). So am I right in thinking that PyObject_GetBuffer(obj, &buf, ...); view = PyMemoryView_FromBuffer(&buf); // view->master owns the buffer, but view->master->obj == NULL ... Py_DECREF(view); // releases buffer (assuming no other exports) Py_XDECREF(buf.obj); has balanced refcounting and is more or less equivalent to view = PyMemoryView_FromObject(obj); ... Py_DECREF(view); The documentation is not very helpful. It just says that calls to PyObject_GetBuffer() must be matched with calls to PyBuffer_Release(). |
|||
| msg170659 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年09月18日 18:20 | |
Richard Oudkerk <report@bugs.python.org> wrote: > PyObject_GetBuffer(b, &buf, PyBUF_WRITABLE); > view = PyMemoryView_FromBuffer(&buf); > // readinto view > PyBuffer_Release(&buf); > > Would attempts to access a "leaked" reference to view now result in ValueError("operation forbidden on released memoryview object")? If so then I think this would be safe. You would need to call memory_release(). Perhaps we can just expose it on the C-API level as PyMemoryView_Release(). IMO the use of PyObject_GetBuffer() should be discouraged. The semantics aren't clear (see #15821). I'd suggest using: 1) A buffer provider is involved (the case here): PyMemoryView_From Object() 2) A piece of memory needs to be wrapped temporarily and no references to the memoryview are "leaked" on the Python level: PyMemoryView_FromMemory() 3) A piece of memory needs to be packaged as a memoryview with automatic cleanup in mbuf_dealloc(): PyMemoryView_FromBufferWithCleanup() (proposed in msg169613) So I think the combination of PyMemoryView_FromObject() with a call to PyMemoryView_Release() should indeed work here. |
|||
| msg170660 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年09月18日 18:22 | |
Richard Oudkerk <report@bugs.python.org> wrote: > The documentation is not very helpful. It just says that calls > to PyObject_GetBuffer() must be matched with calls to PyBuffer_Release(). Yes, we need to sort that out, see #15821. |
|||
| msg170675 - (view) | Author: Richard Oudkerk (sbt) * (Python committer) | Date: 2012年09月18日 20:47 | |
> You would need to call memory_release(). Perhaps we can just expose it on the > C-API level as PyMemoryView_Release(). Should PyMemoryView_Release() release the _PyManagedBufferObject by doing mbuf_release(view->mbuf) even if view->mbuf->exports > 0? Doing Py_TYPE(view->mbuf)->tp_clear((PyObject *)view->mbuf); seems to have the desired effect of causing ValueError when I try to access any associated memoryview. > 3) A piece of memory needs to be packaged as a memoryview with automatic > cleanup in mbuf_dealloc(): > > PyMemoryView_FromBufferWithCleanup() (proposed in msg169613) Maybe this should also handle decrefing the base object (given a flag PyManagedBuffer_FreeObj). I do worry about creating memoryviews that survive deallocation of the base object. |
|||
| msg170676 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年09月18日 20:56 | |
> So I think the combination of PyMemoryView_FromObject() with a call to > PyMemoryView_Release() should indeed work here. I don't think we want to expose a mutable bytes object to outside code, so IMO PyMemoryView_FromMemory() is preferrable. |
|||
| msg170678 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年09月18日 21:19 | |
Richard Oudkerk <report@bugs.python.org> wrote: > Should PyMemoryView_Release() release the _PyManagedBufferObject by doing mbuf_release(view->mbuf) even if view->mbuf->exports > 0? No, I think it should really just be a wrapper: diff --git a/Objects/memoryobject.c b/Objects/memoryobject.c --- a/Objects/memoryobject.c +++ b/Objects/memoryobject.c @@ -1093,6 +1093,12 @@ return memory_release((PyMemoryViewObject *)self, NULL); } +PyObject * +PyMemoryView_Release(PyObject *m) +{ + return memory_release((PyMemoryViewObject *)m, NULL); +} + We decided in #10181 not to allow releasing a view with exports, since the logic is already quite complex. Is there a reasonable expectation that existing code creates memoryviews of the readinto() argument? |
|||
| msg170679 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2012年09月18日 21:41 | |
Antoine Pitrou <report@bugs.python.org> wrote: > I don't think we want to expose a mutable bytes object to outside code, > so IMO PyMemoryView_FromMemory() is preferrable. I agree, but PyMemoryView_FromMemory(PyBytes_AS_STRING(b), n, PyBUF_WRITE) just hides the fact that a mutable bytes object is exposed. Are we talking about a big speedup here or could we perhaps just keep the existing code? |
|||
| msg170680 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年09月18日 21:51 | |
> Antoine Pitrou <report@bugs.python.org> wrote: > > I don't think we want to expose a mutable bytes object to outside code, > > so IMO PyMemoryView_FromMemory() is preferrable. > > I agree, but PyMemoryView_FromMemory(PyBytes_AS_STRING(b), n, PyBUF_WRITE) > just hides the fact that a mutable bytes object is exposed. Except that the mutable bytes object is not exposed to any outside code, so that weird behaviour can't be observed. |
|||
| msg170681 - (view) | Author: Richard Oudkerk (sbt) * (Python committer) | Date: 2012年09月18日 22:02 | |
> Are we talking about a big speedup here or could we perhaps just keep > the existing code? I doubt it is worth the hassle. But I did want to know if there was a clean way to do what I wanted. |
|||
| msg190457 - (view) | Author: Dwight Guth (dwight.guth) | Date: 2013年06月01日 23:23 | |
I was programming something today and thought I should let you know I came across a situation where the current behavior of this function is able to expose what seems to be raw memory to the user. import io class A(io.RawIOBase): def readinto(self, b): return len(b) A().read(100) |
|||
| msg223225 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2014年07月16日 15:48 | |
Please note this is also referred to from #15994. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:35 | admin | set | github: 60107 |
| 2019年04月08日 13:06:30 | BreamoreBoy | set | nosy:
- BreamoreBoy |
| 2019年04月08日 13:02:20 | methane | set | nosy:
+ methane |
| 2014年10月14日 18:01:58 | skrah | set | nosy:
- skrah |
| 2014年07月16日 15:48:19 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages: + msg223225 versions: + Python 3.5 |
| 2013年06月01日 23:23:23 | dwight.guth | set | nosy:
+ dwight.guth messages: + msg190457 |
| 2012年09月18日 22:02:18 | sbt | set | messages: + msg170681 |
| 2012年09月18日 21:51:38 | pitrou | set | messages: + msg170680 |
| 2012年09月18日 21:41:23 | skrah | set | messages: + msg170679 |
| 2012年09月18日 21:19:27 | skrah | set | messages: + msg170678 |
| 2012年09月18日 20:56:43 | pitrou | set | messages: + msg170676 |
| 2012年09月18日 20:50:10 | vstinner | set | nosy:
+ vstinner |
| 2012年09月18日 20:47:55 | sbt | set | messages: + msg170675 |
| 2012年09月18日 18:22:55 | skrah | set | messages: + msg170660 |
| 2012年09月18日 18:20:41 | skrah | set | messages: + msg170659 |
| 2012年09月18日 18:19:43 | sbt | set | messages: + msg170658 |
| 2012年09月18日 12:49:29 | sbt | set | messages: + msg170642 |
| 2012年09月18日 10:47:49 | sbt | set | messages: + msg170639 |
| 2012年09月18日 08:33:54 | skrah | set | messages: + msg170638 |
| 2012年09月17日 23:14:26 | pitrou | set | nosy:
+ skrah messages: + msg170629 |
| 2012年09月17日 23:04:56 | sbt | set | messages: + msg170628 |
| 2012年09月17日 22:30:01 | pitrou | set | nosy:
+ pitrou messages: + msg170627 |
| 2012年09月17日 20:59:07 | serhiy.storchaka | set | type: performance components: + Library (Lib), IO |
| 2012年09月17日 20:17:09 | sbt | set | files:
+ iobase_read.patch messages: + msg170624 |
| 2012年09月10日 20:01:10 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc messages: + msg170226 |
| 2012年09月10日 18:01:46 | jcea | set | nosy:
+ jcea stage: patch review versions: + Python 3.4 |
| 2012年09月10日 12:23:03 | sbt | create | |