homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: [doc] File protocol should document if writelines must handle generators sensibly
Type: Stage: resolved
Components: Documentation, IO Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: JanKanis, JelleZijlstra, benjamin.peterson, dhaffey, dlesco, docs@python, hynek, josh.r, lemburg, miss-islington, pitrou, slateny, stutzbach, terry.reedy
Priority: normal Keywords: easy, patch

Created on 2014年07月03日 09:38 by JanKanis, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 31245 merged slateny, 2022年02月10日 07:00
PR 31647 merged miss-islington, 2022年03月03日 01:21
PR 31648 merged miss-islington, 2022年03月03日 01:21
Messages (8)
msg222165 - (view) Author: Jan Kanis (JanKanis) Date: 2014年07月03日 09:38
The resolution of issue 5445 should be documented somewhere properly, so people can depend on it or not.
IOBase.writelines handles generator arguments without problems, i.e. without first draining the entire generator and then writing the result in one go. That would require large amounts of memory if the generator is large, and fail entirely if the generator is infinite. 
codecs.StreamWriter.writelines uses self.write(''.join(argument)) as implementation, which fails on very large or infinite arguments.
According to issue 5445 it is not part of the file protocol that .writelines must handle (large/infinite) generators, only list-like iterables. However as far as I know this is not documented anywhere, and sometimes people assume that writelines is meant for this case. E.g. jinja (https://github.com/mitsuhiko/jinja2/blob/master/jinja2/environment.py#L1153, the dump method is explicitly documented to stream). The guarantees that .writelines makes or does not make in this regard should be documented somewhere, so that either .writeline implementations that don't handle large generators can be pointed out as bugs, or code that makes assumptions on .writeline handling large generators can be.
I personally think .writelines should handle large generators, since in the python 3 world a lot of apis were iterator-ified and it is wat a lot of people would probably expect. But having a clear and documented decision on this is more important. 
(note: I've copied most of the nosy list from #5445)
msg222252 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2014年07月04日 00:48
+1. I've been assuming writelines handled arbitrary generators without an issue; guess I've gotten lucky and only used the ones that do. I've fed stuff populated by enormous (though not infinite) generators created from stuff like itertools.product and the like into it on the assumption that it would safely write it without generating len(seq) ** repeat values in memory.
I'd definitely appreciate a documented guarantee of this. I don't need it to explicitly guarantee that each item is written before the next item is pulled off the iterator or anything; if it wants to buffer a reasonable amount of data in memory before triggering a real I/O that's fine (generators returning mutable objects and mutating them when the next object comes along are evil anyway, and forcing one-by-one output can prevent some useful optimizations). But anything that uses argument unpacking, collection as a list, ''.join (or at the C level, PySequence_Fast and the like), forcing the whole generator to exhaust before writing byte one, is a bad idea.
msg226120 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014年08月30日 04:17
Security fix only versions do not get doc fixes.
msg261399 - (view) Author: Dan Haffey (dhaffey) Date: 2016年03月09日 03:00
+1, I just lost an hour-plus compute job to this. It sure violates POLA. I've been passing large generators to file.writelines since about as long as generators have existed, so I never would have guessed that a class named "StreamWriter" of all things wouldn't, you know, stream its writelines argument.
msg409248 - (view) Author: Stanley (slateny) * Date: 2021年12月28日 04:47
I'd be interested in taking a look at this - would these changes clarify things?
Current (https://docs.python.org/3/library/codecs.html#codecs.StreamWriter):
Writes the concatenated list of strings to the stream (possibly by reusing the write() method). The standard bytes-to-bytes codecs do not support this method.
Proposed:
Writes the concatenated list of strings to the stream by reusing the write() method, and thus does not support infinite or very large generators. The standard bytes-to-bytes codecs do not support this method.
msg414395 - (view) Author: Jelle Zijlstra (JelleZijlstra) * (Python committer) Date: 2022年03月03日 01:21
New changeset a8c87a239ee1414d6dd0b062fe9ec3e5b0c50cb8 by slateny in branch 'main':
bpo-21910: Clarify docs for codecs writelines method (GH-31245)
https://github.com/python/cpython/commit/a8c87a239ee1414d6dd0b062fe9ec3e5b0c50cb8
msg414396 - (view) Author: miss-islington (miss-islington) Date: 2022年03月03日 01:43
New changeset 60b561c246da2073672a016340457e4534dfdf5b by Miss Islington (bot) in branch '3.10':
bpo-21910: Clarify docs for codecs writelines method (GH-31245)
https://github.com/python/cpython/commit/60b561c246da2073672a016340457e4534dfdf5b
msg414397 - (view) Author: miss-islington (miss-islington) Date: 2022年03月03日 01:45
New changeset cf8aff6319794807aa578215710e6caa4479516f by Miss Islington (bot) in branch '3.9':
bpo-21910: Clarify docs for codecs writelines method (GH-31245)
https://github.com/python/cpython/commit/cf8aff6319794807aa578215710e6caa4479516f
History
Date User Action Args
2022年04月11日 14:58:05adminsetgithub: 66109
2022年03月03日 01:45:50miss-islingtonsetmessages: + msg414397
2022年03月03日 01:43:04miss-islingtonsetmessages: + msg414396
2022年03月03日 01:22:45JelleZijlstrasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2022年03月03日 01:21:56miss-islingtonsetpull_requests: + pull_request29768
2022年03月03日 01:21:52miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request29767
2022年03月03日 01:21:44JelleZijlstrasetnosy: + JelleZijlstra
messages: + msg414395
2022年02月10日 07:00:34slatenysetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request29414
2021年12月28日 04:47:00slatenysetnosy: + slateny
messages: + msg409248
2021年12月13日 18:23:05iritkatrielsetkeywords: + easy
title: File protocol should document if writelines must handle generators sensibly -> [doc] File protocol should document if writelines must handle generators sensibly
versions: + Python 3.9, Python 3.10, Python 3.11, - Python 2.7, Python 3.4, Python 3.5
2016年03月12日 00:51:09martin.pantersetstage: needs patch
2016年03月09日 03:00:24dhaffeysetnosy: + dhaffey
messages: + msg261399
2014年08月30日 04:17:47terry.reedysetmessages: + msg226120
versions: - Python 3.1, Python 3.2, Python 3.3
2014年07月04日 00:48:40josh.rsetnosy: + josh.r
messages: + msg222252
2014年07月03日 09:38:29JanKaniscreate

AltStyle によって変換されたページ (->オリジナル) /