homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Make rawiobase_read() read directly to bytes object
Type: performance Stage: patch review
Components: IO, Library (Lib) Versions: Python 3.4, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, dwight.guth, jcea, methane, pitrou, sbt, vstinner
Priority: normal Keywords: patch

Created on 2012年09月10日 12:23 by sbt, last changed 2022年04月11日 14:57 by admin.

Files
File name Uploaded Description Edit
iobase_read.patch sbt, 2012年09月10日 12:23 review
iobase_read.patch sbt, 2012年09月17日 20:17 review
Messages (20)
msg170183 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012年09月10日 12:23
Currently rawiobase_read() reads to a bytearray object and then copies the data to a bytes object.
There is a TODO comment saying that the bytes object should be created directly. The attached patch does that.
msg170226 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2012年09月10日 20:01
It works as along as the bytes object cannot leak to Python code, (imagine a custom readinto() method which plays with gc.get_referrers, then calls hash(b)...)
This is OK with this patch.
msg170624 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012年09月17日 20:17
New patch which checks the refcount of the memoryview and bytes object after calling readinto().
If either refcount is larger than the expected value of 1, then the data is copied rather than resized.
msg170627 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年09月17日 22:30
> If either refcount is larger than the expected value of 1, then the
> data is copied rather than resized.
I think that's a useless precaution. The bytes object cannot "leak" since you are using PyMemoryView_FromMemory(), which doesn't know about the original object.
Out of curiousity, have you done any benchmarks?
msg170628 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012年09月17日 23:04
> I think that's a useless precaution. The bytes object cannot "leak" 
> since you are using PyMemoryView_FromMemory(), which doesn't know about 
> the original object.
The bytes object cannot "leak" so, as you say, checking that refcount is pointless. But the view might "leak", and since it does not own a reference to the base object we have a problem: we can't deallocate the bytes object for fear of breaking the view.
It looks like objects returned by PyMemoryView_FromMemory() must never be allowed to "leak", so I am not sure there are many circumstances in which PyMemoryView_FromMemory() is safe to use.
Perhaps using PyBuffer_FillInfo() and PyMemory_FromBuffer() would keep alive the bytes object while the view is alive, without letting the bytes object "leak".
> Out of curiousity, have you done any benchmarks?
No.
msg170629 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年09月17日 23:14
> The bytes object cannot "leak" so, as you say, checking that refcount
> is pointless. But the view might "leak", and since it does not own a
> reference to the base object we have a problem: we can't deallocate the 
> bytes object for fear of breaking the view.
Indeed, that's a problem (but your patch does deallocate the bytes object).
It's quite fishy, I'm not sure how to solve the issue cleanly. Stefan, do you have an idea?
msg170638 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012年09月18日 08:33
So the problem is that readinto(view) might result in several references
to view? I don't think that can be solved on the memoryview side.
One could do:
 view = PyMemoryView_FromObject(b);
 // Lie about writability
 ((PyMemoryViewObject *)view)->view.readonly = 0;
 [...]
Then the view owns a reference to the bytes object. But that does not
solve the problem that writable memoryviews based on a readonly object
might be hanging around.
msg170639 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012年09月18日 10:47
> Then the view owns a reference to the bytes object. But that does not
> solve the problem that writable memoryviews based on a readonly object
> might be hanging around.
How about doing
 PyObject_GetBuffer(b, &buf, PyBUF_WRITABLE);
 view = PyMemoryView_FromBuffer(&buf);
 // readinto view
 PyBuffer_Release(&buf);
Would attempts to access a "leaked" reference to view now result in ValueError("operation forbidden on released memoryview object")? If so then I think this would be safe.
msg170642 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012年09月18日 12:49
The current non-test uses of PyMemoryView_FromBuffer() are in _io.BufferedReader.read(), _io.BufferedWriter.write(), PyUnicode_Decode().
It looks like they can each be made to leak a memoryview that references a deallocated buffer. (Maybe the answer is Don't Do That.)
import codecs, sys
def decode(buf):
 global view
 view = buf
 return codecs.latin_1_decode(buf)
def getregentry():
 return codecs.CodecInfo(name='foobar', decode=decode,
 encode=codecs.latin_1_encode)
@codecs.register
def search_function(encoding):
 if encoding == 'foobar':
 return codecs.CodecInfo(*getregentry())
b = b'hello'.upper()
b.decode('foobar')
print(view.tobytes()) # => b'HELLO'
del b
x = b'dump'.upper()
print(view.tobytes()) # => b'DUMP\x00'
import io, sys
class File(io.RawIOBase):
 def readinto(self, buf):
 global view
 view = buf
 n = len(buf)
 buf[:] = b'x'*n
 return n
 def readable(self):
 return True
f = io.BufferedReader(File())
f.read(1)
print(view[:5].tobytes()) # => b'xxxxx'
del f
print(view[:5].tobytes()) # => b'\xdd\xdd\xdd\xdd\xdd'
msg170658 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012年09月18日 18:19
I am rather confused about the ownership semantics when one uses PyMemoryView_FromBuffer().
It looks as though PyMemoryView_FromBuffer() "steals" ownership of the buffer since, when the associated _PyManagedBufferObject is garbage collected, PyBuffer_Release() is called on its copy of the buffer info. However, the _PyManagedBufferObject does not own a reference of the base object, so one still needs to decref the base object (at some time when it is safe to do so).
So am I right in thinking that
 PyObject_GetBuffer(obj, &buf, ...);
 view = PyMemoryView_FromBuffer(&buf); // view->master owns the buffer, but view->master->obj == NULL
 ...
 Py_DECREF(view); // releases buffer (assuming no other exports)
 Py_XDECREF(buf.obj);
has balanced refcounting and is more or less equivalent to
 view = PyMemoryView_FromObject(obj);
 ...
 Py_DECREF(view);
The documentation is not very helpful. It just says that calls to PyObject_GetBuffer() must be matched with calls to PyBuffer_Release().
msg170659 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012年09月18日 18:20
Richard Oudkerk <report@bugs.python.org> wrote:
> PyObject_GetBuffer(b, &buf, PyBUF_WRITABLE);
> view = PyMemoryView_FromBuffer(&buf);
> // readinto view
> PyBuffer_Release(&buf);
> 
> Would attempts to access a "leaked" reference to view now result in ValueError("operation forbidden on released memoryview object")? If so then I think this would be safe.
You would need to call memory_release(). Perhaps we can just expose it on the
C-API level as PyMemoryView_Release().
IMO the use of PyObject_GetBuffer() should be discouraged. The semantics
aren't clear (see #15821). I'd suggest using:
 1) A buffer provider is involved (the case here):
 PyMemoryView_From Object()
 2) A piece of memory needs to be wrapped temporarily and no references
 to the memoryview are "leaked" on the Python level:
 PyMemoryView_FromMemory()
 3) A piece of memory needs to be packaged as a memoryview with automatic
 cleanup in mbuf_dealloc():
 PyMemoryView_FromBufferWithCleanup() (proposed in msg169613)
So I think the combination of PyMemoryView_FromObject() with a call to
PyMemoryView_Release() should indeed work here.
msg170660 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012年09月18日 18:22
Richard Oudkerk <report@bugs.python.org> wrote:
> The documentation is not very helpful. It just says that calls
> to PyObject_GetBuffer() must be matched with calls to PyBuffer_Release().
Yes, we need to sort that out, see #15821.
msg170675 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012年09月18日 20:47
> You would need to call memory_release(). Perhaps we can just expose it on the
> C-API level as PyMemoryView_Release().
Should PyMemoryView_Release() release the _PyManagedBufferObject by doing mbuf_release(view->mbuf) even if view->mbuf->exports > 0?
Doing
 Py_TYPE(view->mbuf)->tp_clear((PyObject *)view->mbuf);
seems to have the desired effect of causing ValueError when I try to access any associated memoryview.
> 3) A piece of memory needs to be packaged as a memoryview with automatic
> cleanup in mbuf_dealloc():
>
> PyMemoryView_FromBufferWithCleanup() (proposed in msg169613)
Maybe this should also handle decrefing the base object (given a flag PyManagedBuffer_FreeObj). I do worry about creating memoryviews that survive deallocation of the base object.
msg170676 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年09月18日 20:56
> So I think the combination of PyMemoryView_FromObject() with a call to
> PyMemoryView_Release() should indeed work here.
I don't think we want to expose a mutable bytes object to outside code, so IMO PyMemoryView_FromMemory() is preferrable.
msg170678 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012年09月18日 21:19
Richard Oudkerk <report@bugs.python.org> wrote:
> Should PyMemoryView_Release() release the _PyManagedBufferObject by doing mbuf_release(view->mbuf) even if view->mbuf->exports > 0?
No, I think it should really just be a wrapper:
diff --git a/Objects/memoryobject.c b/Objects/memoryobject.c
--- a/Objects/memoryobject.c
+++ b/Objects/memoryobject.c
@@ -1093,6 +1093,12 @@
 return memory_release((PyMemoryViewObject *)self, NULL);
 }
+PyObject *
+PyMemoryView_Release(PyObject *m)
+{
+ return memory_release((PyMemoryViewObject *)m, NULL);
+}
+
We decided in #10181 not to allow releasing a view with exports, since the
logic is already quite complex. Is there a reasonable expectation that
existing code creates memoryviews of the readinto() argument?
msg170679 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012年09月18日 21:41
Antoine Pitrou <report@bugs.python.org> wrote:
> I don't think we want to expose a mutable bytes object to outside code,
> so IMO PyMemoryView_FromMemory() is preferrable.
I agree, but PyMemoryView_FromMemory(PyBytes_AS_STRING(b), n, PyBUF_WRITE)
just hides the fact that a mutable bytes object is exposed.
Are we talking about a big speedup here or could we perhaps just keep
the existing code?
msg170680 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年09月18日 21:51
> Antoine Pitrou <report@bugs.python.org> wrote:
> > I don't think we want to expose a mutable bytes object to outside code,
> > so IMO PyMemoryView_FromMemory() is preferrable.
> 
> I agree, but PyMemoryView_FromMemory(PyBytes_AS_STRING(b), n, PyBUF_WRITE)
> just hides the fact that a mutable bytes object is exposed.
Except that the mutable bytes object is not exposed to any outside code,
so that weird behaviour can't be observed.
msg170681 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012年09月18日 22:02
> Are we talking about a big speedup here or could we perhaps just keep
> the existing code?
I doubt it is worth the hassle. But I did want to know if there was a clean way to do what I wanted.
msg190457 - (view) Author: Dwight Guth (dwight.guth) Date: 2013年06月01日 23:23
I was programming something today and thought I should let you know I came across a situation where the current behavior of this function is able to expose what seems to be raw memory to the user.
import io
class A(io.RawIOBase):
 def readinto(self, b):
 return len(b)
A().read(100)
msg223225 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014年07月16日 15:48
Please note this is also referred to from #15994.
History
Date User Action Args
2022年04月11日 14:57:35adminsetgithub: 60107
2019年04月08日 13:06:30BreamoreBoysetnosy: - BreamoreBoy
2019年04月08日 13:02:20methanesetnosy: + methane
2014年10月14日 18:01:58skrahsetnosy: - skrah
2014年07月16日 15:48:19BreamoreBoysetnosy: + BreamoreBoy

messages: + msg223225
versions: + Python 3.5
2013年06月01日 23:23:23dwight.guthsetnosy: + dwight.guth
messages: + msg190457
2012年09月18日 22:02:18sbtsetmessages: + msg170681
2012年09月18日 21:51:38pitrousetmessages: + msg170680
2012年09月18日 21:41:23skrahsetmessages: + msg170679
2012年09月18日 21:19:27skrahsetmessages: + msg170678
2012年09月18日 20:56:43pitrousetmessages: + msg170676
2012年09月18日 20:50:10vstinnersetnosy: + vstinner
2012年09月18日 20:47:55sbtsetmessages: + msg170675
2012年09月18日 18:22:55skrahsetmessages: + msg170660
2012年09月18日 18:20:41skrahsetmessages: + msg170659
2012年09月18日 18:19:43sbtsetmessages: + msg170658
2012年09月18日 12:49:29sbtsetmessages: + msg170642
2012年09月18日 10:47:49sbtsetmessages: + msg170639
2012年09月18日 08:33:54skrahsetmessages: + msg170638
2012年09月17日 23:14:26pitrousetnosy: + skrah
messages: + msg170629
2012年09月17日 23:04:56sbtsetmessages: + msg170628
2012年09月17日 22:30:01pitrousetnosy: + pitrou
messages: + msg170627
2012年09月17日 20:59:07serhiy.storchakasettype: performance
components: + Library (Lib), IO
2012年09月17日 20:17:09sbtsetfiles: + iobase_read.patch

messages: + msg170624
2012年09月10日 20:01:10amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg170226
2012年09月10日 18:01:46jceasetnosy: + jcea
stage: patch review

versions: + Python 3.4
2012年09月10日 12:23:03sbtcreate

AltStyle によって変換されたページ (->オリジナル) /