Issue 4561: Optimize new io library

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/48811

classification

Title:	Optimize new io library
Type:	performance	Stage:	test needed
Components:	Extension Modules, Library (Lib)	Versions:	Python 3.0, Python 3.1

process

Status:	closed	Resolution:	duplicate
Dependencies:	Superseder:	Rewrite the IO stack in C View: 4565
Assigned To:	Nosy List:	amaury.forgeotdarc, beazley, christian.heimes, donmez, georg.brandl, giampaolo.rodola, pitrou, rhettinger, wplappert
Priority:	critical	Keywords:	patch

Created on 2008年12月06日 12:14 by christian.heimes, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
count_linenendings2.patch	christian.heimes, 2008年12月06日 22:05
test_write.txt	christian.heimes, 2008年12月06日 22:31

Messages (30)
msg77116 - (view)	Author: Christian Heimes (christian.heimes) * (Python committer)	Date: 2008年12月06日 12:14
The new io library needs some serious profiling and optimization work. I've already fixed a severe slowdown in _fileio.FileIO's read buffer allocation algorithm (#4533). More profiling tests have shown a speed problem in write() files opened in text mode. For example three str.count() calls are taking up 20% of the time. The str.count calls can be replaced with an optimized C function that returns the count of (\r\n, \n, \r) in one pass instead of three passes.
msg77117 - (view)	Author: David M. Beazley (beazley)	Date: 2008年12月06日 12:30
I've done some profiling and the performance of reading line-by-line is considerably worse in Python 3 than in Python 2. For example, this code: for line in open("somefile.txt"): pass Ran 35 times slower in Python 3.0 than Python 2.6 when I tested it on a big text file (100 Megabytes). If you disable Unicode by opening the file in binary mode, it runs even slower. This slowdown is really unacceptable for anyone who uses Python for parsing big non-Unicode text files (and I would claim that there are many such people).
msg77118 - (view)	Author: Christian Heimes (christian.heimes) * (Python committer)	Date: 2008年12月06日 12:32
Your issue is most like caused by #4533. Please download the latest svn version of Python 3.0 (branches/release30_maint) and try again.
msg77120 - (view)	Author: Christian Heimes (christian.heimes) * (Python committer)	Date: 2008年12月06日 13:22
Here is a patch againt the py3k branch that reduces the time for the line ending detection from 0.55s to 0.22s for a 50MB file on my test system.
msg77124 - (view)	Author: David M. Beazley (beazley)	Date: 2008年12月06日 13:42
Tried this using projects/python/branches/release30-maint and using the patch that was just attached. With a 66MB input file, here are the results of this code fragment: for line in open("BIGFILE): pass Python 2.6: 0.67s Python 3.0: 32.687s (48 times slower) This is running on a MacBook with a warm disk cache. For what it's worth, I didn't see any improvement with that patch.
msg77125 - (view)	Author: David M. Beazley (beazley)	Date: 2008年12月06日 13:45
Just as one other followup, if you change the code in the last example to use binary mode like this: for line in open("BIG","rb"): pass You get the following results: Python 2.6: 0.64s Python 3.0: 42.26s (66 times slower)
msg77129 - (view)	Author: Georg Brandl (georg.brandl) * (Python committer)	Date: 2008年12月06日 14:33
David, the reading bug fix/optimization is not (yet?) on release30-maint, only on branches/py3k.
msg77131 - (view)	Author: David M. Beazley (beazley)	Date: 2008年12月06日 14:57
Just checked it with branches/py3k and the performance is the same.
msg77145 - (view)	Author: Christian Heimes (christian.heimes) * (Python committer)	Date: 2008年12月06日 16:51
What's your OS, David? Please post the output of "uname -r" and ./python -c "import sys; print(sys.version)"
msg77160 - (view)	Author: David M. Beazley (beazley)	Date: 2008年12月06日 18:45
bash-3.2$ uname -a Darwin david-beazleys-macbook.local 9.5.1 Darwin Kernel Version 9.5.1: Fri Sep 19 16:19:24 PDT 2008; root:xnu-1228年8月30日~1/RELEASE_I386 i386 bash-3.2$ ./python.exe -c "import sys; print(sys.version)" 3.1a0 (py3k:67609, Dec 6 2008, 08:47:06) [GCC 4.0.1 (Apple Inc. build 5465)] bash-3.2$
msg77172 - (view)	Author: Christian Heimes (christian.heimes) * (Python committer)	Date: 2008年12月06日 22:05
I've updated the patch with proper formatting, some minor cleanups and a unit test.
msg77176 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2008年12月06日 22:19
I don't think this is a public API, so the function should probably be renamed _count_lineendings. Also, are there some benchmark numbers?
msg77177 - (view)	Author: Christian Heimes (christian.heimes) * (Python committer)	Date: 2008年12月06日 22:30
I'll come up with some reading benchmarks tomorrow. For now here is a benchmark of write(). You can clearly see the excessive usage of closed, len() and isinstance().
msg77178 - (view)	Author: Christian Heimes (christian.heimes) * (Python committer)	Date: 2008年12月06日 22:31
Roundup doesn't display .log files as plain text files.
msg77762 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2008年12月13日 19:49
Christian, by benchmarks I meant a measurement of text reading with and without the patch.
msg77884 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2008年12月15日 22:27
I've written a small file IO benchmark, available here: http://svn.python.org/view/sandbox/trunk/iobench/ It runs under both 2.6 and 3.x, so that we can compare speeds of respective implementations.
msg77894 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2008年12月16日 01:23
Without Christian's patch: [400KB.txt] read one byte/char at a time... 0.2685 MB/s (100% CPU) [400KB.txt] read 20 bytes/chars at a time... 4.536 MB/s (98% CPU) [400KB.txt] read one line at a time... 3.805 MB/s (99% CPU) [400KB.txt] read 4096 bytes/chars at a time... 29.23 MB/s (100% CPU) [ 20KB.txt] read whole contents at once... 52.42 MB/s (99% CPU) [400KB.txt] read whole contents at once... 45.83 MB/s (100% CPU) [ 10MB.txt] read whole contents at once... 48.78 MB/s (99% CPU) With the patch: [400KB.txt] read one byte/char at a time... 0.2761 MB/s (100% CPU) [400KB.txt] read 20 bytes/chars at a time... 4.656 MB/s (99% CPU) [400KB.txt] read one line at a time... 3.956 MB/s (98% CPU) [400KB.txt] read 4096 bytes/chars at a time... 33.85 MB/s (100% CPU) [ 20KB.txt] read whole contents at once... 66.17 MB/s (99% CPU) [400KB.txt] read whole contents at once... 56.65 MB/s (99% CPU) [ 10MB.txt] read whole contents at once... 63.69 MB/s (99% CPU) Python 2.6's builtin file object: [400KB.txt] read one byte/char at a time... 1.347 MB/s (97% CPU) [400KB.txt] read 20 bytes/chars at a time... 26.65 MB/s (99% CPU) [400KB.txt] read one line at a time... 184.4 MB/s (100% CPU) [400KB.txt] read 4096 bytes/chars at a time... 1163 MB/s (99% CPU) [ 20KB.txt] read whole contents at once... 1072 MB/s (100% CPU) [400KB.txt] read whole contents at once... 889.1 MB/s (100% CPU) [ 10MB.txt] read whole contents at once... 600 MB/s (100% CPU)
msg77895 - (view)	Author: Raymond Hettinger (rhettinger) * (Python committer)	Date: 2008年12月16日 05:12
I'm getting caught-up with the IO changes in 3.0 and am a bit confused. The PEP says, "programmers who don't want to muck about in the new I/O world can expect that the open() factory method will produce an object backwards-compatible with old-style file objects." So, I would have expected that the old implementation could have remained in-place and the resultant object registered as matching the appropriate IO ABC. If that had been done, the performance would be unchanged. Does anyone know why the entire old implementation had to be thrown-out in cases where the API was unchanged? Is there anything about New IO that is fundamentally different so that the old implementation had to be tossed in all cases?
msg77896 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)	Date: 2008年12月16日 08:55
The previous implementation only returns bytes and does not translate newlines. For this particular case, indeed, the plain old FILE* based object is faster.
msg77900 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2008年12月16日 13:19
I seem to recall one of the design principles of the new IO stack was to avoid relying on the C stdlib's buffered API, which has too many platform-dependant behaviours. In any case, binary reading has acceptable performance in py3k (although 3x-4x slower than in 2.x), it's text I/O which is truely horrendous.
msg77901 - (view)	Author: Raymond Hettinger (rhettinger) * (Python committer)	Date: 2008年12月16日 13:26
I don't agree that that was a worthy design goal. Tons of code (incl the old CPython successfully used the stdlib functions). IMO, a 3x or 4x falloff for binary reads/writes is a significant disincentive for adopting Py3.0. For binary reads/writes, I would like to see the open() factory function return the old, fast object instead to trying to slowly simulate it without adding any benefits noticeable to an end-user. IMO, it's a case of practicality beating purity.
msg77903 - (view)	Author: David M. Beazley (beazley)	Date: 2008年12月16日 13:47
I agree with Raymond. For binary reads, I'll go farther and say that even a 10% slowdown in performance would be surprising if not unacceptable to some people. I know that as hard as it might be for everyone to believe, there are a lot of people who crank lots of non- Unicode data with Python. In fact, Python 2.X is pretty good at it. It's fine that text mode now uses Unicode, but if I don't want that, I would certainly expect the binary file modes to run at virtually the same speed as Python 2 (e.g., okay, they work with bytes instead of strings, but is the bytes type really all that different from the old Python 2 str type?).
msg77904 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2008年12月16日 13:50
> I don't agree that that was a worthy design goal. I don't necessarily agree either, but it's probably too late now. The py3k buffered IO object has additional methods (e.g. peek(), read1()) which can be used by upper layers (text IO) and so can't be replaced with the old 2.x file object. In any case, Amaury has started rewriting the IO lib in C () and getting good binary IO performance shouldn't be too difficult. () http://svn.python.org/view/sandbox/trunk/io-c/
msg77906 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2008年12月16日 13:56
> I know that as hard as it might be for > everyone to believe, there are a lot of people who crank lots of non- > Unicode data with Python. But "cranking data" implies you'll do something useful with it, and therefore spend CPU time doing those useful things (likely much more CPU time than you spent read()ing the data in the first place). In any case, you can try to open your file in unbuffered mode: open("foobar", "rb", buffering=0) it will bypass the Python buffering layer and will go directly to the raw C unbuffered object. > (e.g., okay, they work with bytes instead of > strings, but is the bytes type really all that different from the old > Python 2 str type?) No. It's a bit more limited, doesn't support autoconversion to/from unicode, but that's all.
msg77910 - (view)	Author: David M. Beazley (beazley)	Date: 2008年12月16日 14:07
Good luck with that. Most people who get bright ideas such as "gee, maybe I'll write my own version of X" where "X" is some part of the standard C library pertaining to I/O, end up fighting a losing battle. Of course, I'd love to be proven wrong, but I don't think I will in this case. As for cranking data, that does not necessarily imply heavy-duty CPU processing. Someone might be reading large datafiles simply to perform some kind of data extraction, filtering, minor translation, or other operation that is easily carried out in Python, but where the programs are still I/O bound. For example, the kinds of processing one might otherwise do using awk, sed, perl, etc.
msg77912 - (view)	Author: Christian Heimes (christian.heimes) * (Python committer)	Date: 2008年12月16日 14:20
David: Amaury's work is going to be a part of the standard library as soon as his work is done. I'm confident that we can reach the old speed of the 2.x file type by carefully moving code to C modules.
msg77915 - (view)	Author: David M. Beazley (beazley)	Date: 2008年12月16日 15:58
I wish I shared your optimism about this, but I don't. Here's a short explanation why. The problem of I/O and the associated interface between hardware, the operating system kernel, and user applications is one of the most fundamental and carefully studied problems in all of computer systems. The C library and its associated I/O functionality provide the user- space implementation of this interface. However, if you peel the covers off of the C library, you're going to find a lot of really hairy stuff in there. Examples might include: 1. Low-level optimization related to the system hardware (processor architecture, caching, I/O bus, etc.). 2. Hand-written finely tuned assembly code. 3. Low-level platform-specific system calls such as ioctl(). 4. System calls related to shared memory regions, kernel buffers, etc. (i.e., optimizations that try to eliminate buffer copies). 5. Undocumented vendor-specific "proprietary" system calls (i.e., unknown "magic"). So, you'll have to forgive me for being skeptical, but I just don't think any programmer is going to sit down and bang out a new implementation of buffered I/O that is going to match the performance of what's provided by the C library. Again, I would love to be proven wrong.
msg77917 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2008年12月16日 18:23
[...] Although I agree all this is important, I'd challenge the assumption it has its place in the buffered IO library rather than in lower-level layers (i.e. kernel & userspace unbuffered IO). In any case, it will be difficult to undo the current design decisions (however misguided they may or may not be) of the py3k IO library and we'll have to make the best out of them!
msg78097 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2008年12月20日 12:47
We can't solve this for 3.0.1, downgrading to critical.
msg80094 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2009年01月18日 12:33
Marking this as a duplicate of #4565 "Rewrite the IO stack in C".

History
Date	User	Action	Args
2022年04月11日 14:56:42	admin	set	github: 48811
2009年01月18日 12:33:27	pitrou	set	status: open -> closed resolution: duplicate superseder: Rewrite the IO stack in C messages: + msg80094
2008年12月20日 12:47:04	pitrou	set	priority: release blocker -> critical messages: + msg78097
2008年12月20日 02:42:06	loewis	set	priority: deferred blocker -> release blocker
2008年12月16日 18:23:08	pitrou	set	messages: + msg77917
2008年12月16日 15:58:23	beazley	set	messages: + msg77915
2008年12月16日 14:20:55	christian.heimes	set	messages: + msg77912
2008年12月16日 14:07:39	beazley	set	messages: + msg77910
2008年12月16日 13:56:15	pitrou	set	messages: + msg77906
2008年12月16日 13:50:35	pitrou	set	messages: + msg77904
2008年12月16日 13:47:53	beazley	set	messages: + msg77903
2008年12月16日 13:26:59	rhettinger	set	messages: + msg77901
2008年12月16日 13:19:58	pitrou	set	messages: + msg77900
2008年12月16日 08:55:19	amaury.forgeotdarc	set	nosy: + amaury.forgeotdarc messages: + msg77896
2008年12月16日 05:12:05	rhettinger	set	nosy: + rhettinger messages: + msg77895
2008年12月16日 01:23:11	pitrou	set	messages: + msg77894
2008年12月15日 22:27:59	pitrou	set	messages: + msg77884
2008年12月13日 19:49:49	pitrou	set	messages: + msg77762
2008年12月10日 20:40:53	donmez	set	nosy: + donmez
2008年12月10日 08:24:53	loewis	set	priority: release blocker -> deferred blocker
2008年12月07日 08:04:56	wplappert	set	nosy: + wplappert
2008年12月06日 23:35:47	barry	set	priority: high -> release blocker
2008年12月06日 22:31:18	christian.heimes	set	files: + test_write.txt messages: + msg77178
2008年12月06日 22:30:48	christian.heimes	set	files: - test_write.log
2008年12月06日 22:30:28	christian.heimes	set	files: + test_write.log messages: + msg77177
2008年12月06日 22:19:33	pitrou	set	messages: + msg77176
2008年12月06日 22:05:25	christian.heimes	set	files: - count_linenendings.patch
2008年12月06日 22:05:18	christian.heimes	set	files: + count_linenendings2.patch messages: + msg77172
2008年12月06日 21:40:50	pitrou	set	nosy: + pitrou
2008年12月06日 21:36:39	giampaolo.rodola	set	nosy: + giampaolo.rodola
2008年12月06日 18:45:33	beazley	set	messages: + msg77160
2008年12月06日 16:51:29	christian.heimes	set	messages: + msg77145
2008年12月06日 14:57:25	beazley	set	messages: + msg77131
2008年12月06日 14:33:02	georg.brandl	set	nosy: + georg.brandl messages: + msg77129
2008年12月06日 13:45:46	beazley	set	messages: + msg77125
2008年12月06日 13:42:18	beazley	set	messages: + msg77124
2008年12月06日 13:22:08	christian.heimes	set	files: + count_linenendings.patch keywords: + patch messages: + msg77120
2008年12月06日 12:32:29	christian.heimes	set	messages: + msg77118
2008年12月06日 12:30:26	beazley	set	nosy: + beazley messages: + msg77117
2008年12月06日 12:14:51	christian.heimes	create

homepage