This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011年02月04日 08:52 by Laurens, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| tell v01.py | Laurens, 2011年02月04日 10:40 | example program | ||
| textiotell.patch | pitrou, 2011年02月04日 18:10 | |||
| textiotell2.patch | pitrou, 2011年02月04日 20:06 | |||
| Messages (15) | |||
|---|---|---|---|
| msg127874 - (view) | Author: Laurens (Laurens) | Date: 2011年02月04日 08:52 | |
file.tell() has become extremely slow in version 3.2, both rc1 and rc2. This problem did not exist in version 2.7.1, nor in version 3.1. It could be reproduced both on mac and windows xp. |
|||
| msg127877 - (view) | Author: Eric V. Smith (eric.smith) * (Python committer) | Date: 2011年02月04日 09:39 | |
Do you have a benchmark program you can post? |
|||
| msg127881 - (view) | Author: Laurens (Laurens) | Date: 2011年02月04日 10:40 | |
Correction: the problem also exists in version 3.1. I created a benchmark program an ran it on my machine (iMac, snow leopard 10.6), with the following results: ------------------------------------------ 2.6.6 (r266:84292, Dec 30 2010, 09:20:14) [GCC 4.2.1 (Apple Inc. build 5664)] result: 0.0009 s. ------------------------------------------ 2.7.1 (r271:86832, Jan 13 2011, 07:38:03) [GCC 4.2.1 (Apple Inc. build 5664)] result: 0.0008 s. ------------------------------------------ 3.1.3 (r313:86882M, Nov 30 2010, 09:55:56) [GCC 4.0.1 (Apple Inc. build 5494)] result: 9.5682 s. ------------------------------------------ 3.2rc2 (r32rc2:88269, Jan 30 2011, 14:30:28) [GCC 4.2.1 (Apple Inc. build 5664)] result: 8.3531 s. Removing the line containing "tell" gives the following results: ------------------------------------------ 2.6.6 (r266:84292, Dec 30 2010, 09:20:14) [GCC 4.2.1 (Apple Inc. build 5664)] result: 0.0007 s. ------------------------------------------ 2.7.1 (r271:86832, Jan 13 2011, 07:38:03) [GCC 4.2.1 (Apple Inc. build 5664)] result: 0.0006 s. ------------------------------------------ 3.1.3 (r313:86882M, Nov 30 2010, 09:55:56) [GCC 4.0.1 (Apple Inc. build 5494)] result: 0.0093 s. ------------------------------------------ 3.2rc2 (r32rc2:88269, Jan 30 2011, 14:30:28) [GCC 4.2.1 (Apple Inc. build 5664)] result: 0.0007 s. (Apparently, reading a file became a lot faster from 3.1 to 3.2.) Conclusion: Execution of file.tell() makes the program about 10000 times slower. Remark: the file mdutch.txt is a dummy text file containing 1000 lines with one word on each line. |
|||
| msg127889 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2011年02月04日 13:53 | |
I found that adding "infile._CHUNK_SIZE = 20" makes the test much faster - 'only' 5 times slower than 2.7. |
|||
| msg127892 - (view) | Author: DSM (dsm001) | Date: 2011年02月04日 14:02 | |
With a similar setup (OS X 10.6) I see the same problem. It seems to go away if the file is opened in binary mode for reading. @Laurens, can you confirm? |
|||
| msg127893 - (view) | Author: DSM (dsm001) | Date: 2011年02月04日 14:04 | |
(By "go away" I mean "stop being pathological", not "stop differing": I still see a factor of 2.) |
|||
| msg127894 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年02月04日 14:10 | |
That's expected. seek() and tell() on text (unicode) files are slow by construction. You should open your file in binary mode instead, if you want to do any seeking. Maybe I should add a note in http://docs.python.org/dev/library/io.html#performance |
|||
| msg127901 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年02月04日 16:26 | |
That said, I think it is possible to make algorithmic improvements to TextIOWrapper.tell() so that at least performance becomes acceptable. |
|||
| msg127907 - (view) | Author: Laurens (Laurens) | Date: 2011年02月04日 16:54 | |
First of all, thanks to all for your cooperation, it is very much appreciated. I made some minor changes to the benchmark program. Conclusions are: * setting file._CHUNK_SIZE to 20 has a dramatic effect, changing execution time in 3.2rc2 from 8.4s to 0.06s, so more than a factor 100 improvement. It's a bit clumsy, but acceptable as a performance work around. * opening file binary has a dramatic effect as well, I would say. After 2 minutes I stopped execution of the program, concluding that this change made the program at least a factor 10 *slower* instead of faster. So I cannot confirm DSM's statement that the performance hit would be a factor 2. Instead, I see a performance hit of at least a factor 100000 (10e5) compared tot 2.7.1, which is presumably not by construction ;-). |
|||
| msg127908 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年02月04日 16:57 | |
> opening file binary has a dramatic effect as well, I would say. After 2 > minutes I stopped execution of the program Hint: b'' is not equal to '' ;) |
|||
| msg127914 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年02月04日 18:10 | |
Here is a proof-of-concept patch for the pure Python version of TextIOWrapper.tell(). It turns the O(CHUNK_SIZE) operation into an O(1) operation most of time (still O(CHUNK_SIZE) worst-case - weird decoders and/or crazy input). |
|||
| msg127915 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年02月04日 18:19 | |
> Here is a proof-of-concept patch for the pure Python version of > TextIOWrapper.tell(). It turns the O(CHUNK_SIZE) operation into an > O(1) operation most of time (still O(CHUNK_SIZE) worst-case - weird > decoders and/or crazy input). Actually, that's wrong. The patch is still O(CHUNK_SIZE) but with a vastly different multiplier (and, optimistically, much smaller, as common codecs are codecs in C). |
|||
| msg127933 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年02月04日 20:06 | |
New patch also optimizing the C version. tell() can be more than 100x faster now (still much slower than binary tell()). |
|||
| msg127934 - (view) | Author: Laurens (Laurens) | Date: 2011年02月04日 20:09 | |
All, thanks for your help. Opening the file in binary mode worked immediately in the toy program (that is, the benchmark code I sent you). (Antoine, thanks for the hint.) In my real world program, I solved the problem by reading a line from a binary input file, and decode it explicitly and immediately to a string. The performance has become acceptable now, and I propose to close this issue. Thanks again, cheers, Laurens |
|||
| msg129418 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年02月25日 20:29 | |
Committed in r88607 (3.3). |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:12 | admin | set | github: 55323 |
| 2011年02月25日 20:29:37 | pitrou | set | status: open -> closed nosy: amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens messages: + msg129418 resolution: accepted -> fixed stage: patch review -> resolved |
| 2011年02月04日 20:09:29 | Laurens | set | resolution: accepted messages: + msg127934 nosy: amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens |
| 2011年02月04日 20:06:01 | pitrou | set | files:
+ textiotell2.patch messages: + msg127933 nosy: amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens stage: needs patch -> patch review |
| 2011年02月04日 18:20:00 | pitrou | set | nosy:
amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens messages: + msg127915 |
| 2011年02月04日 18:10:04 | pitrou | set | files:
+ textiotell.patch messages: + msg127914 keywords: + patch nosy: amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens |
| 2011年02月04日 16:57:11 | pitrou | set | nosy:
amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens messages: + msg127908 |
| 2011年02月04日 16:54:35 | Laurens | set | nosy:
amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens messages: + msg127907 |
| 2011年02月04日 16:26:08 | pitrou | set | versions:
+ Python 3.3, - Python 3.2 nosy: amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens title: file.tell extremely slow -> TextIOWrapper.tell extremely slow messages: + msg127901 type: performance stage: needs patch |
| 2011年02月04日 14:10:49 | pitrou | set | nosy:
amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens messages: + msg127894 |
| 2011年02月04日 14:04:58 | dsm001 | set | nosy:
amaury.forgeotdarc, pitrou, dsm001, eric.smith, Laurens messages: + msg127893 |
| 2011年02月04日 14:02:28 | dsm001 | set | nosy:
+ dsm001 messages: + msg127892 |
| 2011年02月04日 13:53:36 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc, pitrou messages: + msg127889 |
| 2011年02月04日 10:40:18 | Laurens | set | files:
+ tell v01.py nosy: eric.smith, Laurens messages: + msg127881 |
| 2011年02月04日 09:39:55 | eric.smith | set | nosy:
+ eric.smith messages: + msg127877 |
| 2011年02月04日 08:52:36 | Laurens | create | |