skip to main | skip to sidebar

Friday, August 04, 2006

Profile and ruby-prof

I've been playing with profile (from the Standard Library) and ruby-prof (the profile replacement written by Shugo Maeda and Charlie Savage. I really like ruby-prof — it's faster (from 5x on some short runs to 100x on longer runs), it provides more detail, and it will generate call graphs. I've noticed some differences in the way they run. I'll show you what I mean.

ruby-prof doesn't sort it's output in terms of total time used, so you'll need to sort it. After a bit of munging, the (cropped) output of running ruby-prof call_rotator test_file looks something like this:


%self cumulative total self children calls self/call total/call name
 23.64 0.13 0.13 0.13 0.00 10002 0.00 0.00 String#split
 21.82 0.30 0.39 0.12 0.27 10002 0.00 0.00 CallRot#read_line
 18.18 0.55 0.54 0.10 0.44 1 0.10 0.54 #foreach
 12.73 0.42 0.07 0.07 0.00 10002 0.00 0.00 #local
 5.45 0.33 0.05 0.03 0.02 10002 0.00 0.00 CallRot#line_is_future?
 3.64 0.44 0.02 0.02 0.00 10003 0.00 0.00 #allocate
 3.64 0.35 0.02 0.02 0.00 20004 0.00 0.00 Array#pop
 3.64 0.17 0.02 0.02 0.00 10002 0.00 0.00 Comparable#>
 3.64 0.15 0.02 0.02 0.00 10002 0.00 0.00 OnCall#initialize
 1.82 0.45 0.01 0.01 0.00 1 0.01 0.01 #open

Since ruby-prof sends it's output to stdout, it's easy to munge with standard tools (at least on Unix/Linus). It does mean that you'll need to be careful about comingling any output from your original program though. I used ruby-prof call-rotator test_file | tail +3 | sort -rn to print the methods in order of time spent in them. Since the header doesn't change, you can add that back in to make reading the table easier if you like.

Interestingly, ruby-prof runs so quickly that system activity can interfere quite a bit with the recorded timings (even to the point of changing the order of the sorted methods). Over the course of a longer running program, this should even out. I've not seen it change the order of methods actually defined in your program, but that doesn't mean it won't.

Here's the output of ruby -r profile call_rotator test_file:


 % cumulative self self total
 time seconds seconds calls ms/call ms/call name
 34.79 2.31 2.31 10002 0.23 0.42 CallRot#read_line
 13.40 3.20 0.89 1 890.00 6630.00 IO#foreach
 12.95 4.06 0.86 10002 0.09 0.15 CallRot#line_is_future?
 11.14 4.80 0.74 10003 0.07 0.11 Class#new
 7.98 5.33 0.53 10002 0.05 0.07 Comparable.>
 5.42 5.69 0.36 10002 0.04 0.04 OnCall#initialize
 4.67 6.00 0.31 20004 0.02 0.02 Array#pop
 3.61 6.24 0.24 10002 0.02 0.02 String#split
 3.46 6.47 0.23 10002 0.02 0.02 Time#local
 2.41 6.63 0.16 10002 0.02 0.02 Time#<=>

You can see that the ruby-prof output has two columns that are missing from the stock profile output. self and children break the total time into more discrete measures. self shows how much time is spent in the method itself, exclusive calls out to child methods. children shows how much time is spent in calls to those child methods. (total is the same as self in the stock output.

It's also interesting that the two methods of profiling don't pick up the same set of method calls in the program being profiled, nor is one a superset of the other. Here's a side by side comparison of the methods found by each in profiling runs of another program:


 ruby-prof profile
Array#<< Array#<< Array#[] Array#[] Array#each Array#each <Class::IO>#allocate <Class::IO>#open <Class::Time>#allocate <Class::Time>#now Enumerable#each_with_index Enumerable.each_with_index File#initialize File#initialize Fixnum#* Fixnum#* Fixnum#% Fixnum#% Fixnum#+ Fixnum#+ Fixnum#to_s Fixnum#to_s Integer#downto Integer#downto Integer#upto Integer#upto IO#open IO#puts IO#puts IO#write IO#write Kernel#load Time#- Time#- Time#+ Time#+ Time#day Time#day Time#hour Time#hour Time#initialize Time#initialize Time#min Time#min Time#mon Time#mon Time#now Time#year Time#year #toplevel #toplevel 

The four methods not caught by profile take up significant amounts of time, but won't change your basic profiling approach. Likewise the one method missed by ruby-prof isn't a game breaker, but it's good to be aware of things like this.

In both sets of output, you can see that the read_line is the more expensive of the methods, and the one where you can focus if you need to speed things up. The ruby-prof output also shows you that more time is spent in its children than locally, so you gains will be minimized. You'll get more information from a call graph, but that's whole 'nother blog entry — I'll get around to it next time.

If you found this post helpful, you might want to look at my ruby-prof post collection.

Posted by gnupate
Labels: ,

2 comments:

Anonymous said...

Hey Pat - Great article. Ruby-prof sorts by total-time for a method, not self time, which it looks like you are after. Total time is used since it makes more sense in the call graph reports.

It would be easy to change this though. For flat profiles, take a look at line 53 in flat_printer.rb:

methods.sort.reverse.each do |pair|

Would change to something like this:

methods.sort_by {|method| method.self_time} ... etc.

Since sorting by self time or total time is useful, I think the best solution might be to add a command line parameter? Patches gladly accepted :)

Also interesting that IO.open doesn't get profiled. Will have to look into that.

8/15/06, 10:54 PM
Anonymous said...

GraphHtmlPrinter is pretty awesome. Unless you need to programmatically process the output, html printer is a much better report format than the flat printer.

1/28/11, 9:44 AM

Post a Comment

Subscribe to: Post Comments (Atom)
 

AltStyle によって変換されたページ (->オリジナル) /