This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2016年01月19日 15:19 by fornax, last changed 2022年04月11日 14:58 by admin.
| Messages (15) | |||
|---|---|---|---|
| msg258600 - (view) | Author: Fornax (fornax) | Date: 2016年01月19日 15:19 | |
io.IOBase.truncate() documentation says:
"Resize the stream to the given size in bytes (or the current position if size is not specified). The current stream position isn’t changed. This resizing can extend or reduce the current file size. In case of extension, the contents of the new file area depend on the platform (on most systems, additional bytes are zero-filled). The new file size is returned."
However:
>>> open('temp.txt', 'w').write('ABCDE\nFGHIJ\nKLMNO\nPQRST\nUVWXY\nZ\n')
32
>>> f = open('temp.txt', 'r+')
>>> f.readline()
'ABCDE\n'
>>> f.tell()
6 # As expected, current position is 6 after the readline
>>> f.truncate()
32 # ?!
Verified that the document does not get truncated to 6 bytes as expected. Adding an explicit f.seek(6) before the truncate causes it to work properly (truncate to 6). It also works as expected using a StringIO rather than a file, or in Python 2 (used 2.7.9).
Tested in 3.4.3/Windows, 3.4.1/Linux, 3.5.1/Linux.
|
|||
| msg258611 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年01月19日 17:24 | |
This is because the file is buffered.
>>> open('temp.txt', 'w').write('ABCDE\nFGHIJ\nKLMNO\nPQRST\nUVWXY\nZ\n')
32
>>> f = open('temp.txt', 'r+')
>>> f.readline()
'ABCDE\n'
>>> f.tell()
6
>>> f.buffer.tell()
32
>>> f.buffer.raw.tell()
32
The documentation needs a clarification.
|
|||
| msg258612 - (view) | Author: Fornax (fornax) | Date: 2016年01月19日 17:46 | |
To clarify... the intended behavior is for truncate to default to the current position of the buffer, rather than the current position as reported directly from the stream by tell? That seems... surprising. |
|||
| msg258613 - (view) | Author: Eryk Sun (eryksun) * (Python triager) | Date: 2016年01月19日 17:46 | |
Serhiy, why doesn't truncate do a seek(0, SEEK_CUR) to synchronize the buffer's file pointer before calling its truncate method? This also affects writing in "+" modes when the two file pointers are out of sync. |
|||
| msg258614 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年01月19日 18:04 | |
This is not always possible. Consider following example:
>>> open('temp.txt', 'wb').write(b'+BDAEMQQyBDMENA-')
16
>>> f = open('temp.txt', 'r+', encoding='utf-7')
>>> f.read(2)
'аб'
What should be the result of truncating?
I think it would be better to not implement truncate() for text files at all.
|
|||
| msg258615 - (view) | Author: Fornax (fornax) | Date: 2016年01月19日 18:23 | |
Heh... building on Serhiy's example: >>> f.tell() 680564735109527527154978616360239628288 I'm way out of my depth here. The results seem surprising, but I lack the experience to be able to say what they "should" look like. So I guess if it's working as intended and just needs clarification in the documentation, so be it. Thanks for pointing out what's actually causing the behavior. |
|||
| msg258617 - (view) | Author: Fornax (fornax) | Date: 2016年01月19日 19:24 | |
After taking a little time to let this sink in, I'm going to play Devil's Advocate just a little more. It sounds like you're basically saying that any read-write text-based modes (e.g. r+, w+) should be used at your own peril. While I understand your UTF-7 counterexample, and it's a fair point, is it out of line to expect that for encodings that operate on full bytes, file positioning should work a bit more intuitively? (Which is to say, a write/truncate after a read should take place in the position immediately following the end of the read.) |
|||
| msg258618 - (view) | Author: Fornax (fornax) | Date: 2016年01月19日 19:38 | |
Another surprising result:
>>> open('temp.txt', 'w').write('ABCDE\nFGHIJ\nKLMNO\nPQRST\nUVWXY\nZ\n')
32
>>> f = open('temp.txt', 'r+')
>>> f.write('test')
4
>>> f.close()
>>> open('temp.txt').read()
'testE\nFGHIJ\nKLMNO\nPQRST\nUVWXY\nZ\n'
>>> open('temp.txt', 'w').write('ABCDE\nFGHIJ\nKLMNO\nPQRST\nUVWXY\nZ\n')
32
>>> f = open('temp.txt', 'r+')
>>> f.write('test')
4
>>> f.read(1)
'A'
>>> f.close()
>>> open('temp.txt').read()
'ABCDE\nFGHIJ\nKLMNO\nPQRST\nUVWXY\nZ\ntest'
The position of the write in the file depends on whether or not there is a subsequent read.
|
|||
| msg258619 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2016年01月19日 20:00 | |
May be. Looking at the code, both Python and C implementations of TextIOWrapper look incorrect. Python implementation: def truncate(self, pos=None): self.flush() if pos is None: pos = self.tell() return self.buffer.truncate(pos) If pos is not specified, self.tell() is used as truncating position for underlying binary file. But self.tell() is not an offset in bytes, as seen from UTF-7 example. This is complex cookie that includes starting position in binary file, a number of bytes that should be read and feed to the decoder, and other decoder flags. Needed at least unpack the cookie returned by self.tell(), and raise an exception if it doesn't ambiguously point to binary file position. C implementation is equivalent to: def truncate(self, pos=None): self.flush() return self.buffer.truncate(pos) It just ignores decoder buffer. |
|||
| msg258622 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2016年01月19日 21:38 | |
In theory, TextIOWrapper could rewrite the last bit of the file (or the whole file) to have the requested number of characters. But I wonder if it is worth it; maybe deprecation is better. Do you have a use case for any of these bugs, or are you just playing around to see what the methods do? In Issue 12922, seek() and tell() were (re-)defined for TextIOBase, but the situation with truncate() was apparently not considered. Perhaps the write()–read() bug is related to Issue 12215. |
|||
| msg258624 - (view) | Author: Fornax (fornax) | Date: 2016年01月19日 21:56 | |
I don't have a specific use case. This spawned from a tangentially-related StackOverflow question (http://stackoverflow.com/questions/34858088), where in the answers a behavior difference between Python 2 and 3 was noted. I couldn't find any documentation to explain it, so I opened a follow-up question (http://stackoverflow.com/questions/34879318), and based on some feedback I got there, I opened up this issue. Just to be sure I understand, you're suggesting deprecating truncate on text-mode file objects? And the interleaved read-writes are likely related to an issue that will be dealt with elsewhere? |
|||
| msg258626 - (view) | Author: Марк Коренберг (socketpair) * | Date: 2016年01月19日 22:11 | |
text files and seek() offset: issue25849 |
|||
| msg258634 - (view) | Author: Eryk Sun (eryksun) * (Python triager) | Date: 2016年01月19日 23:34 | |
FYI, you can parse the cookie using struct or ctypes. For example:
class Cookie(ctypes.Structure):
_fields_ = (('start_pos', ctypes.c_longlong),
('dec_flags', ctypes.c_int),
('bytes_to_feed', ctypes.c_int),
('chars_to_skip', ctypes.c_int),
('need_eof', ctypes.c_byte))
In the simple case only the buffer start_pos is non-zero, and the result of tell() is just the 64-bit file pointer. In Serhiy's UTF-7 example it needs to also convey the bytes_to_feed and chars_to_skip values:
>>> f.tell()
680564735109527527154978616360239628288
>>> cookie_bytes = f.tell().to_bytes(ctypes.sizeof(Cookie), sys.byteorder)
>>> state = Cookie.from_buffer_copy(cookie_bytes)
>>> state.start_pos
0
>>> state.dec_flags
0
>>> state.bytes_to_feed
16
>>> state.chars_to_skip
2
>>> state.need_eof
0
So a seek(0, SEEK_CUR) in this case has to seek the buffer to 0, read and decode 16 bytes, and skip 2 characters.
Isn't this solvable at least for the case of truncating, Martin? It could do a tell(), seek to the start_pos, read and decode the bytes_to_feed, re-encode the chars_to_skip, seek back to the start_pos, write the encoded characters, and then truncate.
>>> f = open('temp.txt', 'w+', encoding='utf-7')
>>> f.write(b'+BDAEMQQyBDMENA-'.decode('utf-7'))
5
>>> _ = f.seek(0); f.read(2)
'аб'
>>> cookie_bytes = f.tell().to_bytes(sizeof(Cookie), byteorder)
>>> state = Cookie.from_buffer_copy(cookie_bytes)
>>> f.buffer.seek(state.start_pos)
0
>>> buf = f.buffer.read(state.bytes_to_feed)
>>> s = buf.decode(f.encoding)[:state.chars_to_skip]
>>> f.buffer.seek(state.start_pos)
0
>>> f.buffer.write(s.encode(f.encoding))
8
>>> f.buffer.truncate()
8
>>> f.close()
>>> open('temp.txt', encoding='utf-7').read()
'аб'
Rewriting the encoded bytes is necessary to properly terminate the UTF-7 sequence, which makes me doubt whether this simple approach will work for all codecs. But something like this is possible, no?
|
|||
| msg258637 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2016年01月20日 03:04 | |
Fornax: Yes, I was suggesting the idea of deprecating truncate() for text files! Although a blanket deprecation of all cases may not be realistic. Quickly reading the Stack Overflow pages, it seems like there is demand for this to work in some cases. Deprecating it in the more awkward situations, such as after after reading, and with specific kinds of codecs, might be an option though. Now I think Issue 12215 (read then write) is more closely related to the read-then-truncate problem. For the write-then-read bug, it might be a separate problem with an easy fix: call flush() before changing to reader mode. Eryk: If there is no decoder state, and the file data hasn’t changed, maybe it is solvable. But I realize now it won’t work in general. We would have to construct the encoder state from the decoder state. The same problem as Issue 12215. |
|||
| msg258639 - (view) | Author: (random832) | Date: 2016年01月20日 04:31 | |
In the analogous C operations, ftell (analogous to .tell) actually causes the underlying file descriptor's position (analogous to the raw stream's position) to be reset to be at the same value that ftell has returned. Which means, yes, that you lose the benefits of buffering if you're so foolish as to call ftell after every read. But in this case the sequence "read / tell / truncate" would be analogous to "fread(f) / ftell(f) / ftruncate(fileno(f)) Though, the fact that fread operates on the FILE * whereas truncate operates on a file descriptor serves as a red flag to C programmers... arguably since this is not the case with Python, truncate on a buffered stream should implicitly include this same "reset underlying position" operation before actually performing the truncate. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:26 | admin | set | github: 70346 |
| 2021年02月26日 17:25:06 | eryksun | set | type: behavior versions: + Python 3.8, Python 3.9, Python 3.10, - Python 2.7, Python 3.5, Python 3.6 |
| 2019年03月13日 03:41:28 | josh.r | link | issue36278 superseder |
| 2018年11月24日 01:11:06 | martin.panter | link | issue35304 superseder |
| 2016年01月20日 04:31:30 | random832 | set | nosy:
+ random832 messages: + msg258639 |
| 2016年01月20日 03:04:51 | martin.panter | set | messages: + msg258637 |
| 2016年01月19日 23:34:46 | eryksun | set | messages: + msg258634 |
| 2016年01月19日 22:11:11 | socketpair | set | nosy:
+ socketpair messages: + msg258626 |
| 2016年01月19日 22:07:34 | vstinner | set | nosy:
- vstinner |
| 2016年01月19日 21:56:22 | fornax | set | type: behavior -> (no value) messages: + msg258624 |
| 2016年01月19日 21:38:29 | martin.panter | set | nosy:
+ martin.panter messages: + msg258622 |
| 2016年01月19日 20:00:14 | serhiy.storchaka | set | type: behavior messages: + msg258619 stage: needs patch |
| 2016年01月19日 19:38:15 | fornax | set | messages: + msg258618 |
| 2016年01月19日 19:24:28 | fornax | set | messages: + msg258617 |
| 2016年01月19日 18:23:00 | fornax | set | messages: + msg258615 |
| 2016年01月19日 18:04:26 | serhiy.storchaka | set | messages: + msg258614 |
| 2016年01月19日 17:46:56 | eryksun | set | nosy:
+ eryksun messages: + msg258613 |
| 2016年01月19日 17:46:14 | fornax | set | messages: + msg258612 |
| 2016年01月19日 17:24:43 | serhiy.storchaka | set | versions:
+ Python 2.7, Python 3.6, - Python 3.4 nosy: + stutzbach, benjamin.peterson, pitrou, docs@python, vstinner, serhiy.storchaka, steve.dower messages: + msg258611 assignee: docs@python components: + Documentation |
| 2016年01月19日 16:41:17 | Zero | set | nosy:
+ Zero |
| 2016年01月19日 15:19:44 | fornax | create | |