Issue 26158: File truncate() not defaulting to current position as documented

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/70346

classification

Title:	File truncate() not defaulting to current position as documented
Type:	behavior	Stage:	needs patch
Components:	Documentation, IO	Versions:	Python 3.10, Python 3.9, Python 3.8

process

Dependencies:	Superseder:
Status:	open	Resolution:
Assigned To:	docs@python	Nosy List:	Zero, benjamin.peterson, docs@python, eryksun, fornax, martin.panter, pitrou, random832, serhiy.storchaka, socketpair, steve.dower, stutzbach
Priority:	normal	Keywords:

Created on 2016年01月19日 15:19 by fornax, last changed 2022年04月11日 14:58 by admin.

Messages (15)
msg258600 - (view)	Author: Fornax (fornax)	Date: 2016年01月19日 15:19
io.IOBase.truncate() documentation says: "Resize the stream to the given size in bytes (or the current position if size is not specified). The current stream position isn’t changed. This resizing can extend or reduce the current file size. In case of extension, the contents of the new file area depend on the platform (on most systems, additional bytes are zero-filled). The new file size is returned." However: >>> open('temp.txt', 'w').write('ABCDE\nFGHIJ\nKLMNO\nPQRST\nUVWXY\nZ\n') 32 >>> f = open('temp.txt', 'r+') >>> f.readline() 'ABCDE\n' >>> f.tell() 6 # As expected, current position is 6 after the readline >>> f.truncate() 32 # ?! Verified that the document does not get truncated to 6 bytes as expected. Adding an explicit f.seek(6) before the truncate causes it to work properly (truncate to 6). It also works as expected using a StringIO rather than a file, or in Python 2 (used 2.7.9). Tested in 3.4.3/Windows, 3.4.1/Linux, 3.5.1/Linux.
msg258611 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2016年01月19日 17:24
This is because the file is buffered. >>> open('temp.txt', 'w').write('ABCDE\nFGHIJ\nKLMNO\nPQRST\nUVWXY\nZ\n') 32 >>> f = open('temp.txt', 'r+') >>> f.readline() 'ABCDE\n' >>> f.tell() 6 >>> f.buffer.tell() 32 >>> f.buffer.raw.tell() 32 The documentation needs a clarification.
msg258612 - (view)	Author: Fornax (fornax)	Date: 2016年01月19日 17:46
To clarify... the intended behavior is for truncate to default to the current position of the buffer, rather than the current position as reported directly from the stream by tell? That seems... surprising.
msg258613 - (view)	Author: Eryk Sun (eryksun) * (Python triager)	Date: 2016年01月19日 17:46
Serhiy, why doesn't truncate do a seek(0, SEEK_CUR) to synchronize the buffer's file pointer before calling its truncate method? This also affects writing in "+" modes when the two file pointers are out of sync.
msg258614 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2016年01月19日 18:04
This is not always possible. Consider following example: >>> open('temp.txt', 'wb').write(b'+BDAEMQQyBDMENA-') 16 >>> f = open('temp.txt', 'r+', encoding='utf-7') >>> f.read(2) 'аб' What should be the result of truncating? I think it would be better to not implement truncate() for text files at all.
msg258615 - (view)	Author: Fornax (fornax)	Date: 2016年01月19日 18:23
Heh... building on Serhiy's example: >>> f.tell() 680564735109527527154978616360239628288 I'm way out of my depth here. The results seem surprising, but I lack the experience to be able to say what they "should" look like. So I guess if it's working as intended and just needs clarification in the documentation, so be it. Thanks for pointing out what's actually causing the behavior.
msg258617 - (view)	Author: Fornax (fornax)	Date: 2016年01月19日 19:24
After taking a little time to let this sink in, I'm going to play Devil's Advocate just a little more. It sounds like you're basically saying that any read-write text-based modes (e.g. r+, w+) should be used at your own peril. While I understand your UTF-7 counterexample, and it's a fair point, is it out of line to expect that for encodings that operate on full bytes, file positioning should work a bit more intuitively? (Which is to say, a write/truncate after a read should take place in the position immediately following the end of the read.)
msg258618 - (view)	Author: Fornax (fornax)	Date: 2016年01月19日 19:38
Another surprising result: >>> open('temp.txt', 'w').write('ABCDE\nFGHIJ\nKLMNO\nPQRST\nUVWXY\nZ\n') 32 >>> f = open('temp.txt', 'r+') >>> f.write('test') 4 >>> f.close() >>> open('temp.txt').read() 'testE\nFGHIJ\nKLMNO\nPQRST\nUVWXY\nZ\n' >>> open('temp.txt', 'w').write('ABCDE\nFGHIJ\nKLMNO\nPQRST\nUVWXY\nZ\n') 32 >>> f = open('temp.txt', 'r+') >>> f.write('test') 4 >>> f.read(1) 'A' >>> f.close() >>> open('temp.txt').read() 'ABCDE\nFGHIJ\nKLMNO\nPQRST\nUVWXY\nZ\ntest' The position of the write in the file depends on whether or not there is a subsequent read.
msg258619 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2016年01月19日 20:00
May be. Looking at the code, both Python and C implementations of TextIOWrapper look incorrect. Python implementation: def truncate(self, pos=None): self.flush() if pos is None: pos = self.tell() return self.buffer.truncate(pos) If pos is not specified, self.tell() is used as truncating position for underlying binary file. But self.tell() is not an offset in bytes, as seen from UTF-7 example. This is complex cookie that includes starting position in binary file, a number of bytes that should be read and feed to the decoder, and other decoder flags. Needed at least unpack the cookie returned by self.tell(), and raise an exception if it doesn't ambiguously point to binary file position. C implementation is equivalent to: def truncate(self, pos=None): self.flush() return self.buffer.truncate(pos) It just ignores decoder buffer.
msg258622 - (view)	Author: Martin Panter (martin.panter) * (Python committer)	Date: 2016年01月19日 21:38
In theory, TextIOWrapper could rewrite the last bit of the file (or the whole file) to have the requested number of characters. But I wonder if it is worth it; maybe deprecation is better. Do you have a use case for any of these bugs, or are you just playing around to see what the methods do? In Issue 12922, seek() and tell() were (re-)defined for TextIOBase, but the situation with truncate() was apparently not considered. Perhaps the write()–read() bug is related to Issue 12215.
msg258624 - (view)	Author: Fornax (fornax)	Date: 2016年01月19日 21:56
I don't have a specific use case. This spawned from a tangentially-related StackOverflow question (http://stackoverflow.com/questions/34858088), where in the answers a behavior difference between Python 2 and 3 was noted. I couldn't find any documentation to explain it, so I opened a follow-up question (http://stackoverflow.com/questions/34879318), and based on some feedback I got there, I opened up this issue. Just to be sure I understand, you're suggesting deprecating truncate on text-mode file objects? And the interleaved read-writes are likely related to an issue that will be dealt with elsewhere?
msg258626 - (view)	Author: Марк Коренберг (socketpair) *	Date: 2016年01月19日 22:11
text files and seek() offset: issue25849
msg258634 - (view)	Author: Eryk Sun (eryksun) * (Python triager)	Date: 2016年01月19日 23:34
FYI, you can parse the cookie using struct or ctypes. For example: class Cookie(ctypes.Structure): _fields_ = (('start_pos', ctypes.c_longlong), ('dec_flags', ctypes.c_int), ('bytes_to_feed', ctypes.c_int), ('chars_to_skip', ctypes.c_int), ('need_eof', ctypes.c_byte)) In the simple case only the buffer start_pos is non-zero, and the result of tell() is just the 64-bit file pointer. In Serhiy's UTF-7 example it needs to also convey the bytes_to_feed and chars_to_skip values: >>> f.tell() 680564735109527527154978616360239628288 >>> cookie_bytes = f.tell().to_bytes(ctypes.sizeof(Cookie), sys.byteorder) >>> state = Cookie.from_buffer_copy(cookie_bytes) >>> state.start_pos 0 >>> state.dec_flags 0 >>> state.bytes_to_feed 16 >>> state.chars_to_skip 2 >>> state.need_eof 0 So a seek(0, SEEK_CUR) in this case has to seek the buffer to 0, read and decode 16 bytes, and skip 2 characters. Isn't this solvable at least for the case of truncating, Martin? It could do a tell(), seek to the start_pos, read and decode the bytes_to_feed, re-encode the chars_to_skip, seek back to the start_pos, write the encoded characters, and then truncate. >>> f = open('temp.txt', 'w+', encoding='utf-7') >>> f.write(b'+BDAEMQQyBDMENA-'.decode('utf-7')) 5 >>> _ = f.seek(0); f.read(2) 'аб' >>> cookie_bytes = f.tell().to_bytes(sizeof(Cookie), byteorder) >>> state = Cookie.from_buffer_copy(cookie_bytes) >>> f.buffer.seek(state.start_pos) 0 >>> buf = f.buffer.read(state.bytes_to_feed) >>> s = buf.decode(f.encoding)[:state.chars_to_skip] >>> f.buffer.seek(state.start_pos) 0 >>> f.buffer.write(s.encode(f.encoding)) 8 >>> f.buffer.truncate() 8 >>> f.close() >>> open('temp.txt', encoding='utf-7').read() 'аб' Rewriting the encoded bytes is necessary to properly terminate the UTF-7 sequence, which makes me doubt whether this simple approach will work for all codecs. But something like this is possible, no?
msg258637 - (view)	Author: Martin Panter (martin.panter) * (Python committer)	Date: 2016年01月20日 03:04
Fornax: Yes, I was suggesting the idea of deprecating truncate() for text files! Although a blanket deprecation of all cases may not be realistic. Quickly reading the Stack Overflow pages, it seems like there is demand for this to work in some cases. Deprecating it in the more awkward situations, such as after after reading, and with specific kinds of codecs, might be an option though. Now I think Issue 12215 (read then write) is more closely related to the read-then-truncate problem. For the write-then-read bug, it might be a separate problem with an easy fix: call flush() before changing to reader mode. Eryk: If there is no decoder state, and the file data hasn’t changed, maybe it is solvable. But I realize now it won’t work in general. We would have to construct the encoder state from the decoder state. The same problem as Issue 12215.
msg258639 - (view)	Author: (random832)	Date: 2016年01月20日 04:31
In the analogous C operations, ftell (analogous to .tell) actually causes the underlying file descriptor's position (analogous to the raw stream's position) to be reset to be at the same value that ftell has returned. Which means, yes, that you lose the benefits of buffering if you're so foolish as to call ftell after every read. But in this case the sequence "read / tell / truncate" would be analogous to "fread(f) / ftell(f) / ftruncate(fileno(f)) Though, the fact that fread operates on the FILE * whereas truncate operates on a file descriptor serves as a red flag to C programmers... arguably since this is not the case with Python, truncate on a buffered stream should implicitly include this same "reset underlying position" operation before actually performing the truncate.

History
Date	User	Action	Args
2022年04月11日 14:58:26	admin	set	github: 70346
2021年02月26日 17:25:06	eryksun	set	type: behavior versions: + Python 3.8, Python 3.9, Python 3.10, - Python 2.7, Python 3.5, Python 3.6
2019年03月13日 03:41:28	josh.r	link	issue36278 superseder
2018年11月24日 01:11:06	martin.panter	link	issue35304 superseder
2016年01月20日 04:31:30	random832	set	nosy: + random832 messages: + msg258639
2016年01月20日 03:04:51	martin.panter	set	messages: + msg258637
2016年01月19日 23:34:46	eryksun	set	messages: + msg258634
2016年01月19日 22:11:11	socketpair	set	nosy: + socketpair messages: + msg258626
2016年01月19日 22:07:34	vstinner	set	nosy: - vstinner
2016年01月19日 21:56:22	fornax	set	type: behavior -> (no value) messages: + msg258624
2016年01月19日 21:38:29	martin.panter	set	nosy: + martin.panter messages: + msg258622
2016年01月19日 20:00:14	serhiy.storchaka	set	type: behavior messages: + msg258619 stage: needs patch
2016年01月19日 19:38:15	fornax	set	messages: + msg258618
2016年01月19日 19:24:28	fornax	set	messages: + msg258617
2016年01月19日 18:23:00	fornax	set	messages: + msg258615
2016年01月19日 18:04:26	serhiy.storchaka	set	messages: + msg258614
2016年01月19日 17:46:56	eryksun	set	nosy: + eryksun messages: + msg258613
2016年01月19日 17:46:14	fornax	set	messages: + msg258612
2016年01月19日 17:24:43	serhiy.storchaka	set	versions: + Python 2.7, Python 3.6, - Python 3.4 nosy: + stutzbach, benjamin.peterson, pitrou, docs@python, vstinner, serhiy.storchaka, steve.dower messages: + msg258611 assignee: docs@python components: + Documentation
2016年01月19日 16:41:17	Zero	set	nosy: + Zero
2016年01月19日 15:19:44	fornax	create

homepage