Issue 23693: timeit accuracy could be better

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/67881

classification

Title:	timeit accuracy could be better
Type:	enhancement	Stage:	needs patch
Components:	Library (Lib)	Versions:	Python 3.6

process

Dependencies:	Superseder:
Status:	closed	Resolution:	third party
Assigned To:	Nosy List:	rbcollins, serhiy.storchaka, tim.peters, vstinner
Priority:	normal	Keywords:

Created on 2015年03月17日 22:34 by rbcollins, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Messages (5)
msg238353 - (view)	Author: Robert Collins (rbcollins) * (Python committer)	Date: 2015年03月17日 22:34
In #6422 Haypo suggested making the timeit reports much better. This is a new ticket just for that. See https://bitbucket.org/haypo/misc/src/tip/python/benchmark.py and http://bugs.python.org/issue6422?@ok_message=issue%206422%20nosy%2C%20nosy_count%2C%20stage%20edited%20ok&@template=item#msg164216
msg238361 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2015年03月17日 23:29
See also issue21988.
msg238364 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2015年03月17日 23:53
Not only I'm too lazy to compute manually the number of loops and repeat, but also I don't trust myself. It's even worse when someone publishs results of a micro-benchmark. I don't trust how the benchmark was calibrated. In my experience, micro-benchmark are polluted by noise in timings, so results are not reliable. benchmarks.py calibration is based on time, whereas timeit uses hardcoded constants (loops=1000000, repeat=3) which can be modified on the command line. benchmarks.py has 3 main parameters: - minimum duration of a single run (--min-time): 100 ms by default - maximum total duration of the benchmark: benchmark.py does its best to respect this duration, but it can be longer: 1 second by default - minimum repeat: 5 by default The minimum duration is increased if the clock resolution is bad (1 ms or more). It's the case on Windows for time.clock() on Python 2 for example. Extract of benchmark.py: min_time = max(self.config.min_time, timer_precision * 100) The estimation of the number of loops is not reliable, but it's written to be "fast". Since I run a micro-benchmark many times, I don't want to wait too long. It's not a power of 10, but an arbitrary integer number. Usually, when running benchmark.py multiple times, the number of loops is different each time. It's not really a big issue, but it probably makes results more difficult to compare. My constrain is max_time. The tested function may not have a linear duration (time = time_one_iteration * loops). https://bitbucket.org/haypo/misc/src/348bfd6108e9985b3c2298d2745eb5ddfe7042e6/python/benchmark.py?at=default#cl-416 Repeat a test at least 5 times is a compromise between the stability of the result and the total duration of the benchmark. Feel free to reuse my code to enhance time.py.
msg268067 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2016年06月09日 23:07
Hi, I develop a new implementation of timeit which should be more reliable: http://perf.readthedocs.io/en/latest/ * Run 25 processes instead of just 1 * Compute average and standard deviation rather than the minimum * Don't disable the garbage collector * Skip the first timing to "warmup" the benchmark Using the minimum and disable the garbage collector is a bad practice, it is not reliable: * multiple processes are need to test different random hash functions, since Python hash function is now randomized by default in Python 3 * Linux also randomizes the address space by default (ASLR) and so the exact timing of memory accesses is different in each process My following blog post "My journey to stable benchmark, part 3 (average)" explains in depth the multiple issues of using the minimum: https://haypo.github.io/journey-to-stable-benchmark-average.html My perf module is very yound, it's still a work-in-progress. It should be better than timeit right now. It works on Python 2.7 and 3 (I tested 3.4). We may pick the best ideas into the timeit module. See also my article explaining how to tune Linux to reduce the "noise" of the operating system on microbenchmarks: https://haypo.github.io/journey-to-stable-benchmark-system.html
msg279959 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2016年11月03日 01:59
I wrote a whole new project "perf" to fix root issues of this issue. It includes a timeit command. I suggest you to use "perf timeit" rather than "timeit" because perf is more reliable: http://perf.readthedocs.io/en/latest/cli.html#timeit

History
Date	User	Action	Args
2022年04月11日 14:58:14	admin	set	github: 67881
2016年11月03日 01:59:53	vstinner	set	status: open -> closed resolution: third party messages: + msg279959
2016年06月10日 23:54:52	gvanrossum	set	nosy: - gvanrossum
2016年06月10日 17:06:43	rhettinger	set	nosy: + gvanrossum, tim.peters
2016年06月09日 23:07:33	vstinner	set	messages: + msg268067
2015年03月17日 23:53:27	vstinner	set	messages: + msg238364
2015年03月17日 23:29:50	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg238361
2015年03月17日 22:34:37	rbcollins	create

homepage