homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: TextIOWrapper.tell extremely slow
Type: performance Stage: resolved
Components: IO Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Laurens, amaury.forgeotdarc, dsm001, eric.smith, pitrou
Priority: normal Keywords: patch

Created on 2011年02月04日 08:52 by Laurens, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
tell v01.py Laurens, 2011年02月04日 10:40 example program
textiotell.patch pitrou, 2011年02月04日 18:10
textiotell2.patch pitrou, 2011年02月04日 20:06
Messages (15)
msg127874 - (view) Author: Laurens (Laurens) Date: 2011年02月04日 08:52
file.tell() has become extremely slow in version 3.2, both rc1 and rc2. This problem did not exist in version 2.7.1, nor in version 3.1. It could be reproduced both on mac and windows xp.
msg127877 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2011年02月04日 09:39
Do you have a benchmark program you can post?
msg127881 - (view) Author: Laurens (Laurens) Date: 2011年02月04日 10:40
Correction: the problem also exists in version 3.1. I created a benchmark program an ran it on my machine (iMac, snow leopard 10.6), with the following results:
------------------------------------------
2.6.6 (r266:84292, Dec 30 2010, 09:20:14) 
[GCC 4.2.1 (Apple Inc. build 5664)]
result: 0.0009 s.
------------------------------------------
2.7.1 (r271:86832, Jan 13 2011, 07:38:03) 
[GCC 4.2.1 (Apple Inc. build 5664)]
result: 0.0008 s.
------------------------------------------
3.1.3 (r313:86882M, Nov 30 2010, 09:55:56) 
[GCC 4.0.1 (Apple Inc. build 5494)]
result: 9.5682 s.
------------------------------------------
3.2rc2 (r32rc2:88269, Jan 30 2011, 14:30:28) 
[GCC 4.2.1 (Apple Inc. build 5664)]
result: 8.3531 s.
Removing the line containing "tell" gives the following results:
------------------------------------------
2.6.6 (r266:84292, Dec 30 2010, 09:20:14) 
[GCC 4.2.1 (Apple Inc. build 5664)]
result: 0.0007 s.
------------------------------------------
2.7.1 (r271:86832, Jan 13 2011, 07:38:03) 
[GCC 4.2.1 (Apple Inc. build 5664)]
result: 0.0006 s.
------------------------------------------
3.1.3 (r313:86882M, Nov 30 2010, 09:55:56) 
[GCC 4.0.1 (Apple Inc. build 5494)]
result: 0.0093 s.
------------------------------------------
3.2rc2 (r32rc2:88269, Jan 30 2011, 14:30:28) 
[GCC 4.2.1 (Apple Inc. build 5664)]
result: 0.0007 s.
(Apparently, reading a file became a lot faster from 3.1 to 3.2.)
Conclusion: Execution of file.tell() makes the program about 10000 times slower.
Remark: the file mdutch.txt is a dummy text file containing 1000 lines with one word on each line.
msg127889 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011年02月04日 13:53
I found that adding "infile._CHUNK_SIZE = 20" makes the test much faster - 'only' 5 times slower than 2.7.
msg127892 - (view) Author: DSM (dsm001) Date: 2011年02月04日 14:02
With a similar setup (OS X 10.6) I see the same problem. It seems to go away if the file is opened in binary mode for reading. @Laurens, can you confirm?
msg127893 - (view) Author: DSM (dsm001) Date: 2011年02月04日 14:04
(By "go away" I mean "stop being pathological", not "stop differing": I still see a factor of 2.)
msg127894 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年02月04日 14:10
That's expected. seek() and tell() on text (unicode) files are slow by construction. You should open your file in binary mode instead, if you want to do any seeking.
Maybe I should add a note in http://docs.python.org/dev/library/io.html#performance 
msg127901 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年02月04日 16:26
That said, I think it is possible to make algorithmic improvements to TextIOWrapper.tell() so that at least performance becomes acceptable.
msg127907 - (view) Author: Laurens (Laurens) Date: 2011年02月04日 16:54
First of all, thanks to all for your cooperation, it is very much appreciated.
I made some minor changes to the benchmark program. Conclusions are:
 * setting file._CHUNK_SIZE to 20 has a dramatic effect, changing execution time in 3.2rc2 from 8.4s to 0.06s, so more than a factor 100 improvement. It's a bit clumsy, but acceptable as a performance work around.
 * opening file binary has a dramatic effect as well, I would say. After 2 minutes I stopped execution of the program, concluding that this change made the program at least a factor 10 *slower* instead of faster. So I cannot confirm DSM's statement that the performance hit would be a factor 2. Instead, I see a performance hit of at least a factor 100000 (10e5) compared tot 2.7.1, which is presumably not by construction ;-).
msg127908 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年02月04日 16:57
> opening file binary has a dramatic effect as well, I would say. After 2 
> minutes I stopped execution of the program
Hint: b'' is not equal to '' ;)
msg127914 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年02月04日 18:10
Here is a proof-of-concept patch for the pure Python version of TextIOWrapper.tell(). It turns the O(CHUNK_SIZE) operation into an O(1) operation most of time (still O(CHUNK_SIZE) worst-case - weird decoders and/or crazy input).
msg127915 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年02月04日 18:19
> Here is a proof-of-concept patch for the pure Python version of
> TextIOWrapper.tell(). It turns the O(CHUNK_SIZE) operation into an
> O(1) operation most of time (still O(CHUNK_SIZE) worst-case - weird
> decoders and/or crazy input).
Actually, that's wrong. The patch is still O(CHUNK_SIZE) but with a
vastly different multiplier (and, optimistically, much smaller, as
common codecs are codecs in C).
msg127933 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年02月04日 20:06
New patch also optimizing the C version. tell() can be more than 100x faster now (still much slower than binary tell()).
msg127934 - (view) Author: Laurens (Laurens) Date: 2011年02月04日 20:09
All,
thanks for your help. Opening the file in binary mode worked immediately in the toy program (that is, the benchmark code I sent you). (Antoine, thanks for the hint.) In my real world program, I solved the problem by reading a line from a binary input file, and decode it explicitly and immediately to a string. The performance has become acceptable now, and I propose to close this issue.
Thanks again, cheers,
Laurens
msg129418 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年02月25日 20:29
Committed in r88607 (3.3).
History
Date User Action Args
2022年04月11日 14:57:12adminsetgithub: 55323
2011年02月25日 20:29:37pitrousetstatus: open -> closed
nosy: amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens
messages: + msg129418

resolution: accepted -> fixed
stage: patch review -> resolved
2011年02月04日 20:09:29Laurenssetresolution: accepted
messages: + msg127934
nosy: amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens
2011年02月04日 20:06:01pitrousetfiles: + textiotell2.patch

messages: + msg127933
nosy: amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens
stage: needs patch -> patch review
2011年02月04日 18:20:00pitrousetnosy: amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens
messages: + msg127915
2011年02月04日 18:10:04pitrousetfiles: + textiotell.patch

messages: + msg127914
keywords: + patch
nosy: amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens
2011年02月04日 16:57:11pitrousetnosy: amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens
messages: + msg127908
2011年02月04日 16:54:35Laurenssetnosy: amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens
messages: + msg127907
2011年02月04日 16:26:08pitrousetversions: + Python 3.3, - Python 3.2
nosy: amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens
title: file.tell extremely slow -> TextIOWrapper.tell extremely slow
messages: + msg127901

type: performance
stage: needs patch
2011年02月04日 14:10:49pitrousetnosy: amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens
messages: + msg127894
2011年02月04日 14:04:58dsm001setnosy: amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens
messages: + msg127893
2011年02月04日 14:02:28dsm001setnosy: + dsm001
messages: + msg127892
2011年02月04日 13:53:36amaury.forgeotdarcsetnosy: + amaury.forgeotdarc, pitrou
messages: + msg127889
2011年02月04日 10:40:18Laurenssetfiles: + tell v01.py
nosy: eric.smith, Laurens
messages: + msg127881
2011年02月04日 09:39:55eric.smithsetnosy: + eric.smith
messages: + msg127877
2011年02月04日 08:52:36Laurenscreate

AltStyle によって変換されたページ (->オリジナル) /