I am learning about timing analysis in Python. I used time
module before it was pointed out that timeit
module would be better for timing analysis.
My main focus on timing analysis is for conducting many test cases on 2 variations of the same function. In this example code I used 2 variations of function for reversing a number.
Can the timing analysis be done in a better way? Is the method scalable for many test cases?
def reverse_num(num):
'''returns the reverse of an integer '''
if num < 0:
return - int(str(-num)[::-1])
else:
return int(str(num)[::-1])
def reverse_num2(num):
rev = 0
while num > 0:
rev *= 10
rev += num % 10
num /= 10
return rev
if __name__ == "__main__":
from timeit import Timer
def test(f):
for i in xrange(1000):
f(i)
print Timer(lambda: test(reverse_num)).timeit(number = 100)
print Timer(lambda: test(reverse_num2)).timeit(number = 100)
I am posting sample test runs for my usage of timeit
module for timing for the following code. I wrote it for this question.
def mysort(words):
mylist1 = sorted([i for i in words if i[:1] == "s"])
mylist2 = sorted([i for i in words if i[:1] != "s"])
list = mylist1 + mylist2
return list
def mysort3(words):
ans = []
p = ans.append
q = words.remove
words.sort()
for i in words[:]:
if i[0] == 's':
p(i)
q(i)
return ans + words
def mysort4(words):
ans1 = []
ans2 = []
p = ans1.append
q = ans2.append
for i in words:
if i[0] == 's':
p(i)
else:
q(i)
ans1.sort()
ans2.sort()
return ans1 + ans2
def mysort6(words):
return ( sorted([i for i in words if i[:1] == "s"]) +
sorted([i for i in words if i[:1] != "s"])
)
if __name__ == "__main__":
from timeit import Timer
def test(f):
f(['a','b','c','abcd','s','se', 'ee', 'as'])
print Timer(lambda: test(mysort)).timeit(number = 10000)
print Timer(lambda: test(mysort3)).timeit(number = 10000)
print Timer(lambda: test(mysort4)).timeit(number = 10000)
print Timer(lambda: test(mysort6)).timeit(number = 10000)
The timing results are
>>> ================================ RESTART ================================ >>> 0.0643831414457 0.0445699517515 0.0446611241927 0.0616633662243 >>> ================================ RESTART ================================ >>> 0.0625827896417 0.045101588386 0.0443568108447 0.0607123363607 >>> ================================ RESTART ================================ >>> 0.0647336488305 0.0596154305912 0.0445711673841 0.0614101094434 >>> ================================ RESTART ================================ >>> 0.0689924148581 0.0502542495475 0.0443466805735 0.0903267660811 >>> ================================ RESTART ================================ >>> 0.0695374234506 0.0579001730656 0.0443790974415 0.0835670386907 >>> ================================ RESTART ================================ >>> 0.0675612101379 0.05079925814 0.044170413854 0.0681050030978
As you can see that the results vary each time. I understand that the times are small but that is the whole purpose of number = 10000
in timeit
to measure consistently by doing timing many times and finding average or the best.
In most runs the 2nd function takes more time than 3rd function but sometimes it doesn't. The 4th function takes more time than 1st function in some cases and sometimes less.
I think it is correct according to usage but why these variable results? Is this the proper way of doing timing or not? I am really confused about this.
After some discussion in the chat room and some more thinking I came up with for testing functions that have integer inputs. Additional nested functions can be defined for more tests and the repetitions can be made more. It may take some time but it gives good results as far as I can see. Simple function calls need to be added for variations of the same function.
def testing():
from timeit import Timer
import random
def tests(f, times):
def test1(f):
f(random.randint(1, 1000))
def test2(f):
f(random.randint(100000, 1000000))
print(f.__name__)
print(Timer(lambda: test1(f)).timeit(number = times))
print(Timer(lambda: test2(f)).timeit(number = times//10))
print()
tests(reverse_num, 10000)
tests(reverse_num2, 10000)
Please note that I started using Python 3 while I was still discussing the question. I won't be using anything specific to Python 3 so all codes can be run on Python 2. Anyone wanting to run new codes will just need to change the
print
statements. Nothing else.
-
\$\begingroup\$ "My usage of timeit module gave variable results on every run." How many tests/how long runs were they? What was the relative difference? \$\endgroup\$Lstor– Lstor2013年07月17日 11:47:45 +00:00Commented Jul 17, 2013 at 11:47
-
\$\begingroup\$ Updated the question. \$\endgroup\$Aseem Bansal– Aseem Bansal2013年07月17日 14:20:43 +00:00Commented Jul 17, 2013 at 14:20
1 Answer 1
Software timing benchmarks are notoriously noisy because your computer isn't a very good controlled experimental platform.
- Some algorithms are nondeterministic. For instance, some quicksort implementations choose a random pivot element.
- While you're running your benchmarks, your python process may get preempted by other running applications, operating system threads, etc.
- Even if your process got the same amount of CPU during every run, the timing may be different because a different mix of applications is running so your processor's memory caches have different hit/miss patterns.
- Some runs may get better branch prediction than others (again, due to the mix of other processes running). This will cause more pipeline stalls.
I would suspect that the effects I've given are given in order of decreasing strength (except that I believe Python's sort algorithm is deterministic, so the first effect is not a factor here). I also suspect that I've just scratched the surface.
EDIT:
OK, for reversing a number, suppose that your typical input will be 5 digits then you can use this.
if __name__ == "__main__":
from timeit import Timer
import random
def test(f):
f(random.randint(10000,99999))
print Timer(lambda: test(reverse_num)).timeit(number = 10000000)
print Timer(lambda: test(reverse_num2)).timeit(number = 10000000)
-
\$\begingroup\$ Thanks for answering. I thought it was never going to be answered by anyone. Just one thing. If I comment could you please reply? You never replied to any of my comments on your last answer. \$\endgroup\$Aseem Bansal– Aseem Bansal2013年07月18日 06:52:37 +00:00Commented Jul 18, 2013 at 6:52
-
\$\begingroup\$ Does this mean there is no hope of any proper timing analysis for finding better implementations of an algorithm? Large difference are obviously noticeable but I wanted to know about timing of many test cases. How can I find which one is better at test cases? How do I test small functions? How does anyone do proper timing for various test cases? How do people find that this function works better for small values and that function works for large values? I think this much questions will be able to explain my confusion. If not, please reply I'll try to elaborate. \$\endgroup\$Aseem Bansal– Aseem Bansal2013年07月18日 06:55:50 +00:00Commented Jul 18, 2013 at 6:55
-
\$\begingroup\$ I think that "no hope" is a bit strong. If your benchmarks are so close that sometimes they cross one another, you can probably conclude that neither approach is significantly more efficient than the other. \$\endgroup\$ruds– ruds2013年07月18日 07:39:01 +00:00Commented Jul 18, 2013 at 7:39
-
\$\begingroup\$ As for small functions, it helps to run the function many times, as you do. To determine how different algorithms do on small vs large inputs, the usual practice is to come up with a random function that produces small values, and test against many different small inputs, and then come up with a random function that produces large values, and test against many different large values. \$\endgroup\$ruds– ruds2013年07月18日 07:41:46 +00:00Commented Jul 18, 2013 at 7:41
-
\$\begingroup\$ Can you come into this chat room? Otherwise the comments will be deleted due to long discussions. \$\endgroup\$Aseem Bansal– Aseem Bansal2013年07月18日 07:55:43 +00:00Commented Jul 18, 2013 at 7:55
Explore related questions
See similar questions with these tags.