homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: tempfile.NamedTemporaryFile can close too early if used as iterator
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.4, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Arfrever, bkabrda, ethan.furman, georg.brandl, ncoghlan, paul.moore, python-dev, r.david.murray, sYnfo, serhiy.storchaka, vstinner, wolma
Priority: normal Keywords: patch

Created on 2015年03月18日 15:17 by bkabrda, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
python-3.4-tempfile-iter.patch bkabrda, 2015年03月18日 15:17 review
python-3.4-tempfile-iter-v2.patch bkabrda, 2015年03月19日 12:50 review
tempfile_iter_fix.patch serhiy.storchaka, 2015年03月20日 06:18 Works, but I don't know why review
csv_iter_commemt.patch r.david.murray, 2015年03月22日 16:22 review
Messages (35)
msg238451 - (view) Author: Bohuslav "Slavek" Kabrda (bkabrda) * Date: 2015年03月18日 15:17
This bug is very similar to #18879, the only difference is that _TemporaryFileWrapper.__iter__ is the problem (in #18879, __getattr__ was fixed, but __iter__ was not). The real world use case that helped me find this bug is at the bottom of this report, this is a simple reproducer:
>>> import tempfile
>>> for l in tempfile.NamedTemporaryFile(mode='a+b'):
... print(l)
... 
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
ValueError: readline of closed file
I'm attaching a patch that fixes this (+ testcase).
Note: I actually discovered this while using
>>> from urllib.request import urlopen
>>> for l in urlopen('ftp://<some_ftp>'):
... print(l)
... 
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
ValueError: readline of closed file
Opening FTP uses urllib.response, which in turn uses tempfile._TemporaryFileWrapper, which makes this example fail.
msg238505 - (view) Author: Bohuslav "Slavek" Kabrda (bkabrda) * Date: 2015年03月19日 12:50
I'm attaching second version of the patch. It now contains link to this bug and a more real test case according to suggestion from the review.
One of the reviews of first patch version mentioned that there should be a better explanation of how this issue can be provoked. I think that the test shows it nice and there's no need to explain it in greater detail. Also, the comment references this bug, where anyone can find the explanation. (Fix for #18879 fixes pretty much the same in a different method and has very similar comment, so I think this should be fine.)
msg238506 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年03月19日 13:07
LGTM.
msg238510 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2015年03月19日 13:22
Agreed, the test is sufficient documentation. However, I can't make the test fail here (Windows 7, Python 3.4.3):
>py ti.py
b'spam\n' b'spam\n'
b'eggs\n' b'eggs\n'
b'beans\n' b'beans\n'
>cat ti.py
import tempfile
def test_iter():
 # getting iterator from a temporary file should keep it alive
 # as long as it's being iterated over
 lines = [b'spam\n', b'eggs\n', b'beans\n']
 def make_file():
 f = tempfile.NamedTemporaryFile(mode='w+b')
 f.write(b''.join(lines))
 f.seek(0)
 return iter(f)
 for i, l in enumerate(make_file()):
 print(l, lines[i])
test_iter()
Is it somehow OS-specific?
Regardless, the patch seems fine and I have no problem with it being applied.
msg238511 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015年03月19日 13:25
New changeset 7fa741fe9425 by Serhiy Storchaka in branch '3.4':
Issue #23700: Iterator of NamedTemporaryFile now keeps a reference to
https://hg.python.org/cpython/rev/7fa741fe9425
New changeset c84a0b35999a by Serhiy Storchaka in branch 'default':
Issue #23700: Iterator of NamedTemporaryFile now keeps a reference to
https://hg.python.org/cpython/rev/c84a0b35999a 
msg238512 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年03月19日 13:27
Thank you for your contribution Bohuslav.
msg238513 - (view) Author: Bohuslav "Slavek" Kabrda (bkabrda) * Date: 2015年03月19日 13:31
Thank you!
To answer Paul's question: I honestly have no idea why this can't be reproduced on Windows. I managed to reproduce this in 100 % cases on various RPM-flavour Linux distros (Fedora, CentOS, RHEL) as well as on Debian and Ubuntu.
msg238518 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2015年03月19日 13:37
Cool, no problem.
msg238575 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015年03月19日 22:47
test_csv now fails on Windows:
http://buildbot.python.org/all/builders/x86 Windows7 3.x/builds/9421/
======================================================================
ERROR: test_read_dict_fieldnames_from_file (test.test_csv.TestDictFields)
----------------------------------------------------------------------
Traceback (most recent call last):
 File "D:\cygwin\home\db3l\buildarea3円.x.bolen-windows7\build\lib\test\test_csv.py", line 629, in test_read_dict_fieldnames_from_file
 self.assertEqual(next(reader), {"f1": '1', "f2": '2', "f3": 'abc'})
 File "D:\cygwin\home\db3l\buildarea3円.x.bolen-windows7\build\lib\csv.py", line 110, in __next__
 row = next(self.reader)
 File "D:\cygwin\home\db3l\buildarea3円.x.bolen-windows7\build\lib\tempfile.py", line 431, in __iter__
 yield from iter(self.file)
ValueError: I/O operation on closed file.
msg238606 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年03月20日 06:18
Following patch fixes the issue, but I don't understand why.
msg238611 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015年03月20日 07:24
Maybe we need to keep explicitly a reference to self.file in the method
(file = self.file) to keep it alive in the frame?
msg238613 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年03月20日 07:53
No, it doesn't help.
msg238614 - (view) Author: Wolfgang Maier (wolma) * Date: 2015年03月20日 07:53
I think this is a consequence of PEP380 and its decision to finalize the subgenerator when the delegating generator is closed.
Consider this simple example without tempfile:
def yielder (fileobj):
 yield from fileobj
with open('some_test_file', 'w') as f:
 f.write('line one\nline two\nline three')
with open('some_test_file', 'r') as f:
 line = next(yielder(f))
 nline = next(f)
==>
Traceback (most recent call last):
 File "<pyshell#11>", line 3, in <module>
 nline = next(f)
ValueError: I/O operation on closed file.
I think test_csv does the file-closing operation on lines 626/627 when it creates the temporary csv.reader(fileobj).
 def test_read_dict_fieldnames_from_file(self):
 with TemporaryFile("w+") as fileobj:
 fileobj.write("f1,f2,f3\r\n1,2,abc\r\n")
 fileobj.seek(0)
 reader = csv.DictReader(fileobj,
 fieldnames=next(csv.reader(fileobj)))
 self.assertEqual(reader.fieldnames, ["f1", "f2", "f3"])
 self.assertEqual(next(reader), {"f1": '1', "f2": '2', "f3": 'abc'})
msg238615 - (view) Author: Wolfgang Maier (wolma) * Date: 2015年03月20日 08:02
Actually, its scary that use of yield from can have such a subtle side-effect. Maybe PEP380 should have taken this more seriously?
msg238617 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015年03月20日 08:40
Ah yes, correct: when a generator using "yield from obj" is destroyed while yield from is not done, obj.close() is called if the method exists.
So "yield from file" *is* different than "for line in file: yield file" when we don't consume the whole generator.
A workaround is to create a wrapper class and returns it in the _TemporaryFileWrapper.__iter__() method:
---
class Iterator:
 def __init__(self, obj):
 self.obj = obj
 def __next__(self):
 if self.obj is None:
 raise StopIteration
 return next(self.obj)
 def __iter__(self):
 return self
 def close(self):
 self.obj = None
---
Or simply:
---
class Iterator:
 def __init__(self, obj):
 self.obj = obj
 def __next__(self):
 return next(self.obj)
 def __iter__(self):
 return self
---
This solution looks more complex than tempfile_iter_fix.patch.
@Serhiy: Maybe add a short comment to explain why yield from is not used and must be used. (By the way, the current comment contains "yields from" which is confusing :-))
msg238619 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年03月20日 08:56
> Ah yes, correct: when a generator using "yield from obj" is destroyed while
> yield from is not done, obj.close() is called if the method exists.
But why obj.close() is called? The reference to fileobj is live, it shouldn't 
be closed.
> This solution looks more complex than tempfile_iter_fix.patch.
Why you prefer more complex solution to simple solution?
msg238621 - (view) Author: Wolfgang Maier (wolma) * Date: 2015年03月20日 09:04
@Serhiy
in this line of code:
reader = csv.DictReader(fileobj, fieldnames=next(csv.reader(fileobj)))
csv.reader(fileobj) returns the generator created by fileobj.__iter__, but no reference to it is kept so the object gets destroyed right afterwards. This closes the generator and because it uses yield from also the contained subgenerator, which is the file itself.
msg238622 - (view) Author: Bohuslav "Slavek" Kabrda (bkabrda) * Date: 2015年03月20日 09:06
@wolma any idea why this only happens on Windows? I can't reproduce the CSV failing test on Linux.
msg238623 - (view) Author: Wolfgang Maier (wolma) * Date: 2015年03月20日 09:08
@bkabrda
not sure, but it may have to do with when exactly the object gets garbage collected
msg238664 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015年03月20日 13:15
tempfile_iter_fix.patch looks good to me, can you commit it please?
msg238677 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年03月20日 14:13
> csv.reader(fileobj) returns the generator created by fileobj.__iter__, but no reference to it is kept so the object gets destroyed right afterwards. This closes the generator and because it uses yield from also the contained subgenerator, which is the file itself.
Yes, there are no references to to the generator, created by fileobj.__iter__, but there are references to fileobj itself and to the file fileobj.file. I still don't understand why the file is closed. This looks as a bug.
Committed existing fix only to make buildbots green.
msg238678 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015年03月20日 14:13
New changeset a90ec6b96af2 by Serhiy Storchaka in branch '3.4':
Issue #23700: NamedTemporaryFile iterator closed underlied file object in
https://hg.python.org/cpython/rev/a90ec6b96af2
New changeset e639750ecd92 by Serhiy Storchaka in branch 'default':
Issue #23700: NamedTemporaryFile iterator closed underlied file object in
https://hg.python.org/cpython/rev/e639750ecd92 
msg238683 - (view) Author: Wolfgang Maier (wolma) * Date: 2015年03月20日 14:38
so let's look at this step-by-step (and I hope I fully understood this myself):
- calling fileobj.__iter__ creates a generator because the method uses yield from
- that generator does not get assigned to any reference so it will be garbage-collected
- when the generator is garbage-collected, the subgenerator specified to the rigth of the yield from is finalized (that is PEP380-mandated behavior) and, in this case, that is iter(self.file)
- for an io module-based fileobject, iter(f) is f and finalizing it means that its close method will be called
So this is not about the file object getting garbage-collected, it is about it getting closed.
Since PEP380 explicitly mentions that problem with yield from and a shared subiterator, I don't think you can call it a bug, but I think it is very problematic behavior as illustrated by this issue because client code is supposed to know whether a particular generator uses yield from or not.
msg238691 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年03月20日 15:22
Thank you for your explanation Wolfgang! Now it is clear to me. The issue is that the generator calls the close() method of the subgenerator, but if the subgenerator is a file, the close() method closes (surprise!) the file. Two different protocols use the same method.
Interesting, how many similar bugs was introduced by blindly replacing "for/yield" with "yield from"?
msg238697 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015年03月20日 15:51
Isn't there some discussion somewhere that if iter(x) returns x you probably have buggy code? Maybe it is io that is broken, design-wise. I think there was another issue related to iter(file) recently...someone surprised by the fact that you can't iterate a file twice without reopening it...the source of that problem is similar. Not that it necessarily soluble, but it is certainly *interesting* :)
msg238698 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015年03月20日 16:02
> Isn't there some discussion somewhere that if iter(x) returns x you probably have buggy code?
I agree that the issue comes from TextIOWrapper.__iter__(),
BufferedReader.__iter__() and FileIO.__iter__() returns simply return
self *and* have a close method. The issue is not "yield from".
msg238699 - (view) Author: Wolfgang Maier (wolma) * Date: 2015年03月20日 16:13
You are probably right that the io classes are broken.
From https://docs.python.org/3/library/stdtypes.html#iterator-types:
Once an iterator’s __next__() method raises StopIteration, it must continue to do so on subsequent calls. Implementations that do not obey this property are deemed broken.
One consequence of __iter__ returning self is that the above is not guaranteed:
>>> with open('somefile', 'w') as f:
	f.write('some text')
	
9
>>> with open('somefile', 'r') as f:
	i = iter(f)
	assert f is i
	for line in i:
		print(line)
	try:
		next(i)
	except StopIteration:
		print('exhausted iterator')
	f.seek(0)
	print(next(i))
	
some text
exhausted iterator
0
some text
So the io classes are *officially* broken.
msg238700 - (view) Author: Wolfgang Maier (wolma) * Date: 2015年03月20日 16:24
OTOH, it would be very hard to change the way fileobjects compared to designing yield from differently so I'd still blame it partly.
Maybe it is unfortunate that generators have a close method instead of, say, __close__ ?
msg238757 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015年03月21日 04:12
There's actually an issue about exactly that broken-per-docs issue for io. IMO if the goal of calling close is to close only things that are generator objects or pretending to be one, the method should have been named close_generator or something.
msg238895 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) Date: 2015年03月22日 13:08
Comment in Lib/tempfile.py mentions issue #23000, but should mention issue #23700.
msg238911 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年03月22日 14:55
Indeed. And all the comment could be better.
Does anyone want to write better comment in the light of recent estimations?
msg238917 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015年03月22日 16:22
How's this?
msg238918 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年03月22日 16:27
LGTM
msg238919 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015年03月22日 16:35
New changeset e9f03315d66c by R David Murray in branch '3.4':
#23700: fix/improve comment
https://hg.python.org/cpython/rev/e9f03315d66c
New changeset 64f4dbac9d07 by R David Murray in branch 'default':
Merge: #23700: fix/improve comment
https://hg.python.org/cpython/rev/64f4dbac9d07 
msg238929 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015年03月22日 19:28
Yeah, the new comment is better :-) Thanks.
History
Date User Action Args
2022年04月11日 14:58:14adminsetgithub: 67888
2015年03月22日 19:28:57vstinnersetmessages: + msg238929
2015年03月22日 16:35:07python-devsetmessages: + msg238919
2015年03月22日 16:27:02serhiy.storchakasetmessages: + msg238918
2015年03月22日 16:22:39r.david.murraysetfiles: + csv_iter_commemt.patch

messages: + msg238917
2015年03月22日 14:55:09serhiy.storchakasetmessages: + msg238911
2015年03月22日 13:08:45Arfreversetnosy: + Arfrever
messages: + msg238895
2015年03月21日 04:12:59r.david.murraysetmessages: + msg238757
2015年03月20日 16:24:00wolmasetmessages: + msg238700
2015年03月20日 16:13:38wolmasetmessages: + msg238699
2015年03月20日 16:02:55vstinnersetmessages: + msg238698
2015年03月20日 15:51:42r.david.murraysetnosy: + r.david.murray
messages: + msg238697
2015年03月20日 15:22:24serhiy.storchakasetstatus: open -> closed
resolution: fixed
2015年03月20日 15:22:08serhiy.storchakasetmessages: + msg238691
2015年03月20日 14:38:16wolmasetmessages: + msg238683
2015年03月20日 14:13:44python-devsetmessages: + msg238678
2015年03月20日 14:13:16serhiy.storchakasetmessages: + msg238677
2015年03月20日 13:15:30vstinnersetmessages: + msg238664
2015年03月20日 09:08:49wolmasetmessages: + msg238623
2015年03月20日 09:06:22bkabrdasetmessages: + msg238622
2015年03月20日 09:04:13wolmasetmessages: + msg238621
2015年03月20日 08:56:27serhiy.storchakasetmessages: + msg238619
2015年03月20日 08:40:33vstinnersetmessages: + msg238617
2015年03月20日 08:02:17wolmasetmessages: + msg238615
2015年03月20日 07:53:52wolmasetmessages: + msg238614
2015年03月20日 07:53:36serhiy.storchakasetmessages: + msg238613
2015年03月20日 07:24:05vstinnersetmessages: + msg238611
2015年03月20日 06:18:30serhiy.storchakasetfiles: + tempfile_iter_fix.patch

messages: + msg238606
2015年03月19日 22:47:19vstinnersetstatus: closed -> open

nosy: + vstinner
messages: + msg238575

resolution: fixed -> (no value)
2015年03月19日 13:37:38paul.mooresetmessages: + msg238518
2015年03月19日 13:31:34bkabrdasetmessages: + msg238513
2015年03月19日 13:27:42serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg238512

stage: commit review -> resolved
2015年03月19日 13:25:28python-devsetnosy: + python-dev
messages: + msg238511
2015年03月19日 13:22:10paul.mooresetnosy: + paul.moore
messages: + msg238510
2015年03月19日 13:07:18serhiy.storchakasetassignee: serhiy.storchaka
messages: + msg238506
stage: patch review -> commit review
2015年03月19日 12:50:36bkabrdasetfiles: + python-3.4-tempfile-iter-v2.patch

messages: + msg238505
2015年03月19日 07:54:03wolmasetnosy: + wolma
2015年03月18日 15:48:51serhiy.storchakasetnosy: + serhiy.storchaka
stage: patch review

versions: - Python 3.6
2015年03月18日 15:24:18ethan.furmansetnosy: + georg.brandl, ncoghlan, ethan.furman
2015年03月18日 15:23:28sYnfosetnosy: + sYnfo
2015年03月18日 15:17:06bkabrdacreate

AltStyle によって変換されたページ (->オリジナル) /