[Python-ideas] Sampling Profiler for Python

Sun Jan 8 02:01:04 CET 2012

Hi Sümer,
On 01/05/2012 07:45 AM, Sümer Cip wrote:
> Have you looked at yappi? yappi.start() accepts a sampling parameter.
no, I haven't because I googled for sampling profiler or statistical 
profiler. I compared running pybench with yappi vs. my sampling profiler 
(with 100 samples/sec).
First, the run time increases in both cases vs. the original performance 
of python. The figures are different because I backported to Python2 
again for the comparison (is yappi available for Python3?)
yappi:
Totals: 4692ms 3317ms +41.5% 4831ms 3448ms 
+40.1%
sampling_profiler:
Totals: 3491ms 3269ms +6.8% 3638ms 
3435ms +5.9%
These are the hot spots as reported by yappi:
name #n tsub ttot tavg
Tools/pybench/Calls.py.test:439 10 1.709853 1.709853 0.170985
Tools/pybench/Strings.py.test:467 10 1.383400 2.133651 0.213365
Tools/pybench/With.py.test:16 10 1.179130 1.614330 0.161433
Tools/pybench/Calls.py.test:176 10 1.088372 1.885379 0.188538
Tools/pybench/Unicode.py.test:388 10 1.062785 1.729250 0.172925
Tools/pybench/Calls.py.test:118 10 1.008680 1.284282 0.128428
Tools/pybench/With.py.test:64 10 0.999503 1.431072 0.143107
Tools/pybench/Lists.py.test:9 10 0.995866 1.394482 0.139448
Tools/pybench/Calls.py.test:9 10 0.979050 1.505686 0.150569
Tools/pybench/Dict.py.test:351 10 0.959278 1.290247 0.129025
Compared with the sampling_profiler:
cpu_time (cum) ticks (cum) samples (cum) 
filename:lineno function
 0.552 4.224 55 419 55 419 
Tools/pybench/Calls.py:446 f
 0.512 0.512 52 52 52 52 
Tools/pybench/Unicode.py:452 calibrate
 0.436 0.436 44 44 44 44 
Tools/pybench/Unicode.py:542 calibrate
 0.436 0.436 44 44 44 44 
Tools/pybench/Strings.py:562 calibrate
 0.372 0.372 37 37 37 37 
Tools/pybench/Constructs.py:485 test
 0.348 0.348 36 36 36 36 
Tools/pybench/Calls.py:122 f
 0.272 0.272 27 27 27 27 
Tools/pybench/Lists.py:339 test
 0.248 0.248 25 25 25 25 
Tools/pybench/Calls.py:445 f
 0.228 0.228 22 22 22 22 
Tools/pybench/Strings.py:247 calibrate
 0.212 0.212 21 21 21 21 
Tools/pybench/With.py:14 __exit__
Let's look at leading hot-spots:
1) yappi reports Calls.py line 439 as the hot spot. This refers to this 
code:
 def test(self): # <-- line 439
 global f
 def f(x):
 if x > 1:
 return f(x-1) # <-- line 446
 return 1
 for i in xrange(self.rounds):
 f(10)
 f(10)
 ...
Here yappi identified the outer function, the sampling profiler points 
directly to the line where most of the time is spent.
2) Most other hot spots reported by sampling_profiler lie in the 
calibrate functions. I wondered by that might be and noticed that 
pybench calls these 20 times per default. As the sampling_profiler 
reports the hot spots based on lines vs. the functions in yappi, this 
means we can not really compare these results.
So let's compare the behaviour of the different profilers on this code:
from time import time
def fib(n):
 if n > 1:
 return fib(n-1) + fib(n-2)
 else:
 return n
def fib2(n):
 stack = [n]
 while stack:
 n = stack.pop()
 if n > 1:
 stack.append(n-1)
 stack.append(n-2)
s = time()
fib(34)
print "fib: %s seconds" % (time() - s)
s = time()
fib2(34)
print "fib2: %s seconds" % (time() - s)
Here are two variants of the fibonnaci function. The first works using 
recursion and returns the correct result. The second is just for 
simulating the same behaviour without recursion by using a local stack. 
The result is not computed since it is not required here - I just want 
to compare how this behaves wrt. our profilers.
When I run this without any profiler, I get
fib: 2.87187790871 seconds
fib2: 5.26651716232 seconds
With yappi, I get this:
fib: 6.42756795883 seconds
fib2: 14.6631779671 seconds
name #n tsub ttot tavg
demo.py.<module>:1 1 6.427689 21.090864 
21.090864
demo.py.fib2:9 1 10.353305 14.663173 
14.663173
..atprof/Lib/posixpath.py.dirname:11 1 0.000004 0.000005 
0.000005
..prof/Lib/threading.py.setprofile:8 1 0.000001 0.000001 
0.000001
..n2.6/site-packages/yappi.py.stop:4 1 0.000000 0.000000 
0.000000
demo.py.fib:3 184549.. 0.000000 0.000000 
0.000000
The slow down for fib is about what I expected, but I have no idea why 
fib2 is slowed down that much. AFAIK, the usual profiling hooks of 
Python are called on entry and exit of each function, not on each line?!
The results are incorrect for the fib function for reasons unknown to me 
- yappi thinks that no time is spent in that function, while the program 
output makes it obvious that more than 6 seconds are spent there.
I tried using the timing_sample parameter to yappi.start. When set to 2, 
all of tsub, ttot and tavg are 0 in the output for this example. I can 
speed up the run to 4.97 seconds for fib and 9.33 s for fib2 using a 
high timing_sample value of 200 or more. That is still quite some 
overhead for keeping it active at all times.
With the sampling profiler, the output is
fib: 3.10812282562 seconds
fib2: 5.91388201714 seconds
Thread MainThread (902 samples, 9.036564 seconds)
------------------------------------------------------------------------
cpu_time (cum) ticks (cum) samples (cum) 
filename:lineno function
 0.000 9.005 0 902 0 902 
ython-work/statprof/Lib/runpy.py:122 _run_module_as_main
 0.000 9.005 0 902 0 902 
tatprof/Lib/sampling_profiler.py:203 main
 0.000 9.005 0 902 0 902 
tatprof/Lib/sampling_profiler.py:209 <module>
 0.000 9.005 0 902 0 902 
ython-work/statprof/Lib/runpy.py:34 _run_code
 0.000 5.912 0 592 0 592 demo.py:21 <module>
 2.068 3.092 211 310 211 310 
demo.py:5 fib
 0.000 3.092 0 310 0 310 demo.py:18 <module>
 2.196 2.196 217 217 217 217 
demo.py:12 fib2
 1.576 1.576 157 157 157 157 
demo.py:15 fib2
 0.980 0.980 101 101 101 101 
demo.py:14 fib2
 0.900 0.900 87 87 87 87 
demo.py:4 fib
 0.644 0.644 67 67 67 67 
demo.py:11 fib2
 0.516 0.516 50 50 50 50 
demo.py:13 fib2
 0.124 0.124 12 12 12 12 
demo.py:7 fib
Just a small slow down here. And as you can see, this even tells you on 
what lines the most cpu time is spent. While I look at this, I am 
missing per-function totals, but that should be easy to add.
As you can see, the profiler correctly detects that line 21 and line 18 
of the demo code take most of the time (which are the module-level calls 
to fib2 and fib).
For completeness, let's look what cProfile finds out:
torsten at pulsar:~/python-work/statprof$ ./python run_cprofile.py demo.py
fib: 5.83289813995 seconds
fib2: 11.2586638927 seconds
 55364794 function calls (36909866 primitive calls) in 17.092 
CPU seconds
 Ordered by: standard name
 ncalls tottime percall cumtime percall filename:lineno(function)
 1 0.000 0.000 17.092 17.092 <string>:1(<module>)
 1 0.000 0.000 17.092 17.092 demo.py:1(<module>)
18454929/1 5.833 0.000 5.833 5.833 demo.py:3(fib)
 1 7.831 7.831 11.259 11.259 demo.py:9(fib2)
 18454928 1.421 0.000 1.421 0.000 {method 'append' of 
'list' objects}
 1 0.000 0.000 0.000 0.000 {method 'disable' of 
'_lsprof.Profiler' objects}
 18454929 2.007 0.000 2.007 0.000 {method 'pop' of 'list' 
objects}
 4 0.000 0.000 0.000 0.000 {time.time}
I am impressed how cProfile detects the recursive calls of the fib 
function (I guess that's the info in the ncalls column). Anyway, the 
slow down here is much more noticeable for the same information.
So, why would I prefer the sampling profiler?
a) Less slow down (could still use some optimization)
b) Points directly to the code lines where the hot spots are (based on 
equidistant time samples).
c) Does not distort run time by being active.
Greetings, Torsten