Issue 28973: [doc] The fact that multiprocess.Queue uses serialization should be documented.

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/73159

classification

Title:	[doc] The fact that multiprocess.Queue uses serialization should be documented.
Type:	enhancement	Stage:	needs patch
Components:	Documentation	Versions:	Python 3.11, Python 3.10, Python 3.9

process

Dependencies:	Superseder:
Status:	open	Resolution:
Assigned To:	docs@python	Nosy List:	Bernhard10, davin, docs@python, iritkatriel, r.david.murray
Priority:	normal	Keywords:	easy

Created on 2016年12月14日 14:38 by Bernhard10, last changed 2022年04月11日 14:58 by admin.

Files
File name	Uploaded	Description	Edit
mwe.py	Bernhard10, 2016年12月14日 14:38	Minimal working example to reproduce this bug/ surprising behaviour.

Messages (8)
msg283192 - (view)	Author: Bernhard10 (Bernhard10)	Date: 2016年12月14日 14:38
When I did some tests involving unittest.mock.sentinel and multiprocessing.Queue, I noticed that multiprocessing.Queue changes the id of the sentinel. This behaviour is definitely surprising and not documented.
msg283193 - (view)	Author: Bernhard10 (Bernhard10)	Date: 2016年12月14日 15:05
See http://stackoverflow.com/a/925241/5069869 Apparently multiprocessing.Queue uses pickle to serialize the objects in the queue, which explains the change of identity, but is absolutely unclear from the documentation.
msg283195 - (view)	Author: R. David Murray (r.david.murray) * (Python committer)	Date: 2016年12月14日 15:12
That fact that this is so is implicit in the name multiprocessing and the documented restrictions of the id function. That is, it is the purpose of the module is to manage computation across multiple processes. Since different processes have distinct memory spaces, you cannot depend on object identity between processes, by the definition of object identity (it is constant only for the lifetime of the object in memory, and the different processes have different memory spaces, therefore the object id may be different in the different processes). By construction this applies also to any multiprocessing mechanism that is used to transmit objects, even if the transmission turns out to be to the same process in a particular case. You can't depend on the id in that case, because the transmission mechanism must be free to change the object identity in order to work in the general case. Should we document this explicitly? Perhaps so. Maybe in the multiprocessing introduction?
msg283198 - (view)	Author: Bernhard10 (Bernhard10)	Date: 2016年12月14日 15:18
My first thought was that Queue was implemented using shared memory. I guess from the fact that the "Shared memory" section is separate in the multiprocessing documentation I should have known better, though. So I guess some clarification in the documentation would be helpful.
msg283199 - (view)	Author: R. David Murray (r.david.murray) * (Python committer)	Date: 2016年12月14日 15:37
Yeah, that's why I said "in the general case". Making it clear in the overview seems reasonable to me.
msg283206 - (view)	Author: Davin Potts (davin) * (Python committer)	Date: 2016年12月14日 16:33
All communication between processes in multiprocessing has consistently used pickle to serialize the data being communicated (this includes what is described in the "Shared memory" section of the docs). The documentation has not done a great job of making this clear, instead only describing the requirement that data be pickleable in select places. For example, in the section on Queues: Note: When an object is put on a queue, the object is pickled and a background thread later flushes the pickled data to an underlying pipe. Though it only applies to 3.6+, issue28053 still needs its own documentation improvement to make clear that the mechanism for communicating data defaults to serialization by pickle but that this can be replaced by alternatives. I agree that the documentation around the use of pickle in multiprocessing deserves improvement.
msg398809 - (view)	Author: Irit Katriel (iritkatriel) * (Python committer)	Date: 2021年08月02日 23:08
There is a note mentioning pickle in this section: https://docs.python.org/3/library/multiprocessing.html#pipes-and-queues It starts with "When an object is put on a queue, the object is pickled and..." A comment about the object ids can be added there.
msg399179 - (view)	Author: R. David Murray (r.david.murray) * (Python committer)	Date: 2021年08月07日 13:17
Mentioning ids would be pretty much redundant with mentioning pickle. If it is pickled its id is going to change. I think Davin was suggesting that while the use of serialization is documented, it is not documented consistently. Everywhere serialization happens it should be mentioned in the docs. Regardless, a proposed doc PR is the way forward here.

History
Date	User	Action	Args
2022年04月11日 14:58:40	admin	set	github: 73159
2021年08月07日 13:17:33	r.david.murray	set	messages: + msg399179
2021年08月02日 23:08:41	iritkatriel	set	nosy: + iritkatriel versions: + Python 3.9, Python 3.10, Python 3.11, - Python 2.7, Python 3.5, Python 3.6, Python 3.7 messages: + msg398809 keywords: + easy title: The fact that multiprocess.Queue uses serialization should be documented. -> [doc] The fact that multiprocess.Queue uses serialization should be documented.
2016年12月14日 16:33:49	davin	set	versions: + Python 3.6, Python 3.7 nosy: + davin messages: + msg283206 type: behavior -> enhancement stage: needs patch
2016年12月14日 15:37:09	r.david.murray	set	messages: + msg283199
2016年12月14日 15:18:24	Bernhard10	set	messages: + msg283198
2016年12月14日 15:12:31	r.david.murray	set	nosy: + r.david.murray messages: + msg283195
2016年12月14日 15:05:57	Bernhard10	set	title: multiprocess.Queue changes objects identity -> The fact that multiprocess.Queue uses serialization should be documented. nosy: + docs@python messages: + msg283193 assignee: docs@python components: + Documentation
2016年12月14日 14:38:29	Bernhard10	create

homepage