homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: io.Buffer*.seek() doesn't seek if "seeking leaves us inside the current buffer"
Type: Stage:
Components: IO Versions: Python 3.3
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: pitrou, vstinner
Priority: normal Keywords:

Created on 2011年05月19日 16:04 by vstinner, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Messages (5)
msg136296 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年05月19日 16:04
Example:
with open("setup.py", "rb") as f:
 # read smaller than the file size to fill the readahead buffer
 f.read(1)
 # seek doesn't seek
 f.seek(0)
 print("f pos=", f.tell())
 print("f.raw pos=", f.raw.tell())
Output:
f pos= 0
f.raw pos= 4096
I expect f.raw.tell() to be 0.
Extract of Modules/_io/buffered.c:
 if (whence != 2 && self->readable) {
 Py_off_t current, avail;
 /* Check if seeking leaves us inside the current buffer,
 so as to return quickly if possible. Also, we needn't take the
 lock in this fast path.
 Don't know how to do that when whence == 2, though. */
 /* NOTE: RAW_TELL() can release the GIL but the object is in a stable
 state at this point. */
 current = RAW_TELL(self);
 avail = READAHEAD(self);
 printf("current=%" PY_PRIdOFF ", avail=%" PY_PRIdOFF "\n", current, avail);
 if (avail > 0) {
 Py_off_t offset;
 if (whence == 0)
 offset = target - (current - RAW_OFFSET(self));
 else
 offset = target;
 printf("offset=%" PY_PRIdOFF "\n", offset);
 if (offset >= -self->pos && offset <= avail) {
 printf("NO SEEK!\n");
 self->pos += offset;
 return PyLong_FromOff_t(current - avail + offset);
 }
 }
 }
I found this weird behaviour when trying to understand why:
 with open("setup.py", 'rb') as f:
 encoding, lines = tokenize.detect_encoding(f.readline)
 with open("setup.py", 'r', encoding=encoding) as f:
 imp.load_module("setup", f, "setup.py", (".py", "r", imp.PY_SOURCE))
is different than:
 with tokenize.open("setup.py") as f:
 imp.load_module("setup", f, "setup.py", (".py", "r", imp.PY_SOURCE))
imp.load_module() clones the file using something like fd = os.dup(f.fileno()); clone = os.fdopen(fd, "r").
For tokenizer.open(), a workaround is to replace:
 buffer.seek(0)
by
 buffer.seek(0); buffer.raw.seek(0)
msg136297 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年05月19日 16:07
Note: _pyio.BufferedReader(), _pyio.BufferedWriter(), _pyio.BufferedRandom() don't use this optimization. They might be patched too.
msg136298 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年05月19日 16:16
This is by design.
msg136306 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年05月19日 16:39
And how can I seek the raw file to zero?
Using buffer.raw.seek(0), buffer.tell() becomes inconsistent:
$ ./python 
Python 3.2.1b1 (3.2:bd5e4d8c8080, May 15 2011, 10:22:54) 
>>> buffer=open('setup.py', 'rb')
>>> buffer.read(1)
>>> buffer.tell()
1
>>> buffer.raw.tell()
4096
>>> buffer.raw.seek(0)
0
>>> buffer.raw.tell()
0
>>> buffer.tell()
-4095
Same problem with os.lseek():
$ ./python 
Python 3.2.1b1 (3.2:bd5e4d8c8080, May 15 2011, 10:22:54) 
>>> import os
>>> buffer=open("setup.py", "rb")
>>> buffer.read(1)
>>> os.lseek(buffer.fileno(), 0, 0)
0
>>> buffer.raw.tell()
0
>>> buffer.tell()
-4095
msg136309 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年05月19日 16:44
Simple: you are not supposed to use the raw file if you wrapped it inside a buffered file.
History
Date User Action Args
2022年04月11日 14:57:17adminsetgithub: 56325
2011年05月19日 16:44:17pitrousetmessages: + msg136309
2011年05月19日 16:39:34vstinnersetmessages: + msg136306
2011年05月19日 16:16:41pitrousetstatus: open -> closed
resolution: not a bug
messages: + msg136298
2011年05月19日 16:07:19vstinnersetmessages: + msg136297
2011年05月19日 16:04:45vstinnercreate

AltStyle によって変換されたページ (->オリジナル) /