Ola Bini: Programming Language Synchronicity: performance

Visar inlägg med etikett performance. Visa alla inlägg

fredag, oktober 26, 2007

Current state of Regular Expressions

As I've made clear earlier, the current regular expression situation has once again become impractical. To reiterate the history: We began with regular Java regex support. This started to cave in when we found out that the algorithm used is actually recursive, and fails for some common regexps used inside Rails among others. To fix that, we integrated JRegex instead. That's the engine 1.0 was released with and is still the engine in use. It works fairly well, and is fast for a Java engine. But not fast enough. In particular, there is no support for searching for exact strings and failing fast, and the engine requires us to transform our byte[]-strings to char[] or String. Not exactly optimal. Another problem is that compatibility with MRI suffers, especially in the multi byte support.

There are two solutions currently on the way. Core developer Marcin are working on a port of the 1.9 regexp engine Oniguruma. This port still has some way to go, and is not integrated with JRuby. The other effort is called REJ, and is a port of the MRI engine I did a few months back. I've freshened up the work and integrated it with JRuby in a branch. At the moment this work actually seems to go quite well, but there are some snags.

First of all, let me point out that this approach gives us more or less total multibyte compatibility for 1.8, which is quite nice.

When doing benchmarking, I'm generally using Rails as the bar. I have a series of regular expressions that Petstore uses for each requst, and I'm using these to check performance. As a first datapoint, JRuby+REJ is faster at parsing regexps than JRuby trunk for basically all regexps. This ranges from slightly faster to twice as fast.

Most of the Rails regexen are actually faster in REJ than in JRuby+trunk, but the problem is that some of them are actually quite a bit slower. 4 of the 22 Rails regexps are slower, by between 20 and 250% percent. There are also this one: /.*_f/ =~ "_fxxxxxxxxxxxxxxxxxxxxxxx" which basically runs about 10x slower than JRuby trunk. Not nice at all.

In the end, the problem is backtracking. Since REJ is a straight port of the MRI code, the backtracking is also ported. But it seems that Java is unusually bad at handling that specific algorithm, and it performs quite badly. At the moment I'm continuing to look at it and trying to improve performance in all ways possible, so we'll see what happens. Charles Nutter have also started to look at it.

But what's really interesting is that I reran my Petstore benchmarks with the current REJ code. To rehash, my last results with JRuby trunk looked like this:

controller : 1.804000 0.000000 1.804000 ( 1.804000)
view : 5.510000 0.000000 5.510000 ( 5.510000)
full action: 13.876000 0.000000 13.876000 ( 13.876000)

But the results from rerunning with REJ was interesting, to say the least. I expected bad results because of the bad backtracking performance, but it seems the other speed improvements weigh up:

controller : 1.782000 0.000000 1.782000 ( 1.782000)
view : 4.735000 0.000000 4.735000 ( 4.735000)
full action: 12.727000 0.000000 12.727000 ( 12.727000)

As you can see, the improvement is quite large in the view numbers. It is also almost there compared to MRI which had 4.57. Finally, the full action is better by a full second too. Again, MRI is 9.57s and JRuby 12.72. It's getting closer. I am quite optimistic right now, provided that we manage to fix the remaining problems with backtracking, our regexp engine might well be a great boon to performance.

Upplagd av Ola Bini kl. 2:36:00 em 2 kommentarer:

Etiketter: jruby, performance, regular expressions

Interesting times in JRuby land

The last week or two have been quite interesting. We are finally starting to see performance numbers that seem good enough. Take a look at Nick's posts here and here for more information on this. The second post contains some valuable tips. In particular, make sure to turn off ObjectSpace and run with -server. Both of these will improve your performance and scalability quite much.

Secondly, JRuby on Rails on Oracle Application Server. Nice, huh? I would imagine more interesting things coming out of all this.

But the end message seems to be that JRuby is really ready now. The 1.1 release looks like it's going to be something really amazing. I can't wait!

Upplagd av Ola Bini kl. 1:24:00 em 1 kommentar:

Etiketter: jruby, oracle, performance

tisdag, oktober 16, 2007

Updated JRuby on Rails performance numbers

So, after my last post, several of us have spent time looking at different parts of Rails and JRuby performance. We have managed to improve things quite nicely since my last performance note. Some of the things changed have been JRuby's each_line implementation, JRuby's split implementation, a few other improvements, and some small fixes to AR-JDBC. After that, here are some new numbers for Petstore. Remember, the MRI numbers are for MRI 1.8.6 with native C MySQL. The JRuby numbers is with ActiveRecord-JDBC trunk and MySQL. (I'm only showing the best numbers of each). The number in bold is the most important one for comparison.

MRI controller : 1.000000 0.070000 1.070000 ( 1.430260)
JRuby controller : 1.804000 0.000000 1.804000 ( 1.804000)

MRI view : 4.410000 0.150000 4.560000 ( 4.575399)
JRuby view : 5.510000 0.000000 5.510000 ( 5.510000)

MRI full action: 8.260000 0.410000 8.670000 ( 9.574460)
JRuby full action: 13.876000 0.000000 13.876000 ( 13.876000)

As you can see, we are talking about 9.5s MRI to 13.8s for JRuby, which I find is a quite nice achievement if you look at the numbers from Friday. We are inching closer and closer. Both the view and the controller numbers are looking very nice. This is actually indicative of a nice trend - since general JRuby primitive performance is really good, the slowness in our Regular Expression engine is weighed up by much faster execution speed.

Once the port of Oniguruma lands, this story will almost certainly look very different. But even so, this is looking good.

Upplagd av Ola Bini kl. 1:25:00 em 4 kommentarer:

Etiketter: jruby, jruby on rails, performance, petstore

söndag, oktober 14, 2007

JRuby discovery number one

After my last entry I've spent lots of time checking different parts of JRuby, trying to find the one true bottleneck for Rails. Of course, I still haven't found it (otherwise I would have said YAY in the subject for this blog). But I have found a few things - for example, symbols are slow right now, but Bill's work will make them better. And it doesn't affect Rails performance at all.

But the discovery I made was when I looked at the performance of the regular expressions used in Rails. There are exactly 50 of them for each request, so I did a script that checked the performance of each of them against MRI. And I found that there was one in particular that had really interesting performance when comparing MRI to JRuby. In fact, it was between 200 and a 1000 times slower. What's worse, the performance wasn't linear.

So which regular expression was the culprit? Well, /.*?\n/m. That doesn't look to bad. And in fact, this expression displayed not one, but two problems with JRuby. The first one is that any regular expression engine should be able to fail fast on something like this, simply because there is a string that always needs to be part of a string for this expression to match. In MRI, that part of the engine is called bm_search, and is a very fast way to fail. JRuby doesn't have that. Marcin is working on a port of Oniguruma though, so that will fix that part of the problem.

But wait, if you grep for this regexp in the Rails sources you won't find it. So where was it actually used? Here is the kicker: it was used in JRuby's implementation of String#each_line. So, let's take some time to look at a quick benchmark for each_line:

require 'benchmark'

str = "Content-Type: text/html; charset=utf-8\r\nSet-Cookie: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa "

TIMES=100_000

puts "each_line on small string with several lines"
10.times do
puts(Benchmark.measure{TIMES.times { str.each_line{} }})
end

str = "abc" * 15

puts "each_line on short string with no line divisions"
10.times do
puts(Benchmark.measure{TIMES.times { str.each_line{} }})
end

str = "abc" * 4000

puts "each_line on large string with no line divisions"
10.times do
puts(Benchmark.measure{TIMES.times { str.each_line{} }})
end

As you can see, we simple measure the performance of doing a 100 000 each_line calls on three different strings. The first one is a short string with several newlines, the second is a short string with no newlines, and the last is a long string with no newlines. How does MRI run this benchmark?

each_line on small string with several lines
 0.160000 0.000000 0.160000 ( 0.157664)
 0.150000 0.000000 0.150000 ( 0.160450)
 0.160000 0.000000 0.160000 ( 0.171563)
 0.150000 0.000000 0.150000 ( 0.157854)
 0.150000 0.000000 0.150000 ( 0.154578)
 0.150000 0.000000 0.150000 ( 0.154547)
 0.160000 0.000000 0.160000 ( 0.158894)
 0.150000 0.000000 0.150000 ( 0.158064)
 0.150000 0.010000 0.160000 ( 0.156975)
 0.160000 0.000000 0.160000 ( 0.156857)
each_line on short string with no line divisions
 0.080000 0.000000 0.080000 ( 0.086789)
 0.090000 0.000000 0.090000 ( 0.084559)
 0.080000 0.000000 0.080000 ( 0.093477)
 0.090000 0.000000 0.090000 ( 0.084700)
 0.080000 0.000000 0.080000 ( 0.089917)
 0.090000 0.000000 0.090000 ( 0.084176)
 0.080000 0.000000 0.080000 ( 0.086735)
 0.090000 0.000000 0.090000 ( 0.085536)
 0.080000 0.000000 0.080000 ( 0.084668)
 0.090000 0.000000 0.090000 ( 0.090176)
each_line on large string with no line divisions
 3.350000 0.020000 3.370000 ( 3.404514)
 3.330000 0.020000 3.350000 ( 3.690576)
 3.320000 0.040000 3.360000 ( 3.851804)
 3.320000 0.020000 3.340000 ( 3.651748)
 3.340000 0.020000 3.360000 ( 3.478186)
 3.340000 0.020000 3.360000 ( 3.447704)
 3.330000 0.020000 3.350000 ( 3.448651)
 3.350000 0.010000 3.360000 ( 3.489842)
 3.350000 0.020000 3.370000 ( 3.429135)
 3.350000 0.010000 3.360000 ( 3.372925)

OK, this looks reasonable. The large string is obviously taking more time to search, but not incredibly much time. What about trunk JRuby?

each_line on small string with several lines
32.668000 0.000000 32.668000 ( 32.668000)
30.785000 0.000000 30.785000 ( 30.785000)
30.824000 0.000000 30.824000 ( 30.824000)
30.878000 0.000000 30.878000 ( 30.877000)
30.904000 0.000000 30.904000 ( 30.904000)
30.826000 0.000000 30.826000 ( 30.826000)
30.550000 0.000000 30.550000 ( 30.550000)
32.331000 0.000000 32.331000 ( 32.331000)
30.971000 0.000000 30.971000 ( 30.971000)
30.537000 0.000000 30.537000 ( 30.537000)
each_line on short string with no line divisions
 7.472000 0.000000 7.472000 ( 7.472000)
 7.350000 0.000000 7.350000 ( 7.350000)
 7.516000 0.000000 7.516000 ( 7.516000)
 7.252000 0.000000 7.252000 ( 7.252000)
 7.313000 0.000000 7.313000 ( 7.313000)
 7.262000 0.000000 7.262000 ( 7.262000)
 7.383000 0.000000 7.383000 ( 7.383000)
 7.786000 0.000000 7.786000 ( 7.786000)
 7.583000 0.000000 7.583000 ( 7.583000)
 7.529000 0.000000 7.529000 ( 7.529000)
each_line on large string with no line divisions

Ooops. That doesn't look so good... And also, where is the last ten lines? Eh... It's still running. It's been running for two hours to produce the first line. That means that it's taking at least 7200 seconds which is more than 2400 times slower than MRI. But in fact, since the matching of the regular expression above is not linear, but exponential in performance, I don't expect this to ever finish.

There are a few interesting lessons to take away from this exercise:

There may still be implementation problems like this in many parts of JRuby - performance will improve by quite much every time we find something like this. I haven't measured Rails performance after this is fixed, and I don't expect it to actually fix the whole problem, but I think I'll see better numbers.
Understand regular expressions. Why is /.*?\n/ so incredibly bad for strings over a certain length? In this case it's the combination of .* and ?. What would be a better implementation in almost all cases? /[^\n]\n/. Notice that there is no backtracking in this implementation, and because of that, this regexp will have performance O(n) while the earlier one was O(n^2). Learn and know these things. They are the difference between usage and expertise.

Upplagd av Ola Bini kl. 5:24:00 em 11 kommentarer:

Etiketter: jruby, performance, regular expressions

fredag, oktober 12, 2007

Mystery: An exposé on JRuby performance

So, after Charlies awesome work yesterday (documented here), I felt it was time I put the story straight on general JRuby performance. It's something of a mystery. I'll begin by showing you the results of lots of different benchmarks and comparison to MRI. All these benchmarks have been run on my MacBook Pro, dual 2.33Ghz cores with 2Gb memory. JRuby revision is 4578, Java is Apple 1.6 beta, and Ruby version is 1.8.6 (2007年03月13日 patchlevel 0). JRuby was ran with -J-server and -O.

Now, lets begin with the YARV benchmarks. The first values is always MRI, the second i JRuby. I've made the benchmark red when MRI is faster, and green when JRuby is faster.

bm_app_answer.rb:
0.480000 0.010000 0.490000 ( 0.511641)
0.479000 0.000000 0.479000 ( 0.478000)
bm_app_factorial.rb:
ERROR stack level too deep
2.687000 0.000000 2.687000 ( 2.687000)
bm_app_fib.rb:
5.880000 0.030000 5.910000 ( 6.151324)
2.687000 0.000000 2.687000 ( 2.687000)
bm_app_mandelbrot.rb:
1.930000 0.010000 1.940000 ( 1.992984)
2.752000 0.000000 2.752000 ( 2.876000)
bm_app_pentomino.rb:
84.160000 0.450000 84.610000 ( 88.205519)
77.117000 0.000000 77.117000 ( 77.117000)
bm_app_raise.rb:
2.600000 0.430000 3.030000 ( 3.169156)
2.162000 0.000000 2.162000 ( 2.162000)
bm_app_strconcat.rb:
1.390000 0.010000 1.400000 ( 1.427766)
1.003000 0.000000 1.003000 ( 1.003000)
bm_app_tak.rb:
7.570000 0.060000 7.630000 ( 7.888381)
2.676000 0.000000 2.676000 ( 2.676000)
bm_app_tarai.rb:
6.020000 0.030000 6.050000 ( 6.186971)
2.236000 0.000000 2.236000 ( 2.236000)
bm_loop_times.rb:
4.240000 0.020000 4.260000 ( 4.404826)
3.354000 0.000000 3.354000 ( 3.354000)
bm_loop_whileloop.rb:
9.450000 0.050000 9.500000 ( 9.678552)
5.037000 0.000000 5.037000 ( 5.037000)
bm_loop_whileloop2.rb:
1.890000 0.010000 1.900000 ( 1.936502)
1.039000 0.000000 1.039000 ( 1.039000)
bm_so_ackermann.rb:
ERROR stack level too deep
4.928000 0.000000 4.928000 ( 4.927000)
bm_so_array.rb:
5.580000 0.020000 5.600000 ( 5.709101)
5.552000 0.000000 5.552000 ( 5.552000)
bm_so_concatenate.rb:
1.580000 0.040000 1.620000 ( 1.647592)
1.602000 0.000000 1.602000 ( 1.602000)
bm_so_exception.rb:
4.170000 0.390000 4.560000 ( 4.597234)
4.683000 0.000000 4.683000 ( 4.683000)
bm_so_lists.rb:
0.970000 0.030000 1.000000 ( 1.036678)
0.814000 0.000000 0.814000 ( 0.814000)
bm_so_matrix.rb:
1.700000 0.010000 1.710000 ( 1.765739)
1.878000 0.000000 1.878000 ( 1.879000)
bm_so_nested_loop.rb:
5.130000 0.020000 5.150000 ( 5.258066)
4.661000 0.000000 4.661000 ( 4.661000)
bm_so_object.rb:
5.480000 0.030000 5.510000 ( 5.615154)
3.095000 0.000000 3.095000 ( 3.095000)
bm_so_random.rb:
1.760000 0.010000 1.770000 ( 1.806116)
1.495000 0.000000 1.495000 ( 1.495000)
bm_so_sieve.rb:
0.680000 0.010000 0.690000 ( 0.705296)
0.853000 0.000000 0.853000 ( 0.853000)
bm_vm1_block.rb:
19.920000 0.110000 20.030000 ( 20.547236)
12.698000 0.000000 12.698000 ( 12.698000)
bm_vm1_const.rb:
15.720000 0.080000 15.800000 ( 16.426734)
7.654000 0.000000 7.654000 ( 7.654000)
bm_vm1_ensure.rb:
14.530000 0.070000 14.600000 ( 15.137106)
7.588000 0.000000 7.588000 ( 7.589000)
bm_vm1_length.rb:
17.230000 0.090000 17.320000 ( 20.406438)
6.415000 0.000000 6.415000 ( 6.416000)
bm_vm1_rescue.rb:
11.520000 0.040000 11.560000 ( 11.736435)
5.604000 0.000000 5.604000 ( 5.603000)
bm_vm1_simplereturn.rb:
17.560000 0.080000 17.640000 ( 18.178065)
5.413000 0.000000 5.413000 ( 5.413000)
bm_vm1_swap.rb:
22.160000 0.110000 22.270000 ( 22.698746)
6.836000 0.000000 6.836000 ( 6.835000)
bm_vm2_array.rb:
5.600000 0.020000 5.620000 ( 5.675354)
1.844000 0.000000 1.844000 ( 1.844000)
bm_vm2_method.rb:
9.800000 0.030000 9.830000 ( 9.918884)
5.152000 0.000000 5.152000 ( 5.151000)
bm_vm2_poly_method.rb:
13.570000 0.050000 13.620000 ( 13.803066)
10.289000 0.000000 10.289000 ( 10.289000)
bm_vm2_poly_method_ov.rb:
3.990000 0.010000 4.000000 ( 4.071277)
1.750000 0.000000 1.750000 ( 1.749000)
bm_vm2_proc.rb:
5.670000 0.020000 5.690000 ( 5.723124)
3.267000 0.000000 3.267000 ( 3.267000)
bm_vm2_regexp.rb:
0.380000 0.000000 0.380000 ( 0.387671)
0.961000 0.000000 0.961000 ( 0.961000)
bm_vm2_send.rb:
3.720000 0.010000 3.730000 ( 3.748266)
2.135000 0.000000 2.135000 ( 2.136000)
bm_vm2_super.rb:
4.100000 0.010000 4.110000 ( 4.138355)
1.781000 0.000000 1.781000 ( 1.781000)
bm_vm2_unif1.rb:
3.320000 0.010000 3.330000 ( 3.348069)
1.385000 0.000000 1.385000 ( 1.385000)
bm_vm2_zsuper.rb:
4.810000 0.020000 4.830000 ( 4.856368)
2.920000 0.000000 2.920000 ( 2.921000)
bm_vm3_thread_create_join.rb:
0.000000 0.000000 0.000000 ( 0.006621)
0.368000 0.000000 0.368000 ( 0.368000)

What is interesting about these numbers is that almost all of them are way faster, and the ones that are slower are so by a quite narrow margin (except for the regexp and thread tests).

So, this is the baseline against MRI. Let's take a look at a few other benchmarks. These can all be found in JRuby, in the test/bench directory. When you run them, all of them generate 5 or 10 runs of all measures, but I've simply taken the best one for each. The repetition is to allow Java to warm up. Here are a few different benchmarks, same convention and running parameters as above:

bench_fib_recursive.rb
----------------------
 1.390000 0.000000 1.390000 ( 1.412710)
0.532000 0.000000 0.532000 ( 0.532000)

bench_method_dispatch.rb
------------------------
Control: 1m loops accessing a local variable 100 times
2.830000 0.010000 2.840000 ( 2.864822)
0.105000 0.000000 0.105000 ( 0.105000)
Test STI: 1m loops accessing a fixnum var and calling to_i 100 times
10.000000 0.030000 10.030000 ( 10.100846)
2.111000 0.000000 2.111000 ( 2.111000)
Test ruby method: 1m loops calling self's foo 100 times
16.130000 0.060000 16.190000 ( 16.359876)
7.971000 0.000000 7.971000 ( 7.971000)

bench_method_dispatch_only.rb
-----------------------------
Test ruby method: 100k loops calling self's foo 100 times
1.570000 0.000000 1.570000 ( 1.588715)
0.587000 0.000000 0.587000 ( 0.587000)

bench_block_invocation.rb
-------------------------
1m loops yielding a fixnum 10 times to a block that just retrieves dvar
2.800000 0.010000 2.810000 ( 2.822425)
1.194000 0.000000 1.194000 ( 1.194000)
1m loops yielding two fixnums 10 times to block accessing one
6.550000 0.030000 6.580000 ( 6.623452)
2.064000 0.000000 2.064000 ( 2.064000)
1m loops yielding three fixnums 10 times to block accessing one
7.390000 0.020000 7.410000 ( 7.467841)
2.120000 0.000000 2.120000 ( 2.120000)
1m loops yielding three fixnums 10 times to block splatting and accessing them
9.250000 0.040000 9.290000 ( 9.339131)
2.451000 0.000000 2.451000 ( 2.451000)
1m loops yielding a fixnums 10 times to block with just a fixnum (no vars)
1.890000 0.000000 1.890000 ( 1.908501)
1.278000 0.000000 1.278000 ( 1.277000)
1m loops calling a method with a fixnum that just returns it
2.740000 0.010000 2.750000 ( 2.766255)
1.426000 0.000000 1.426000 ( 1.426000)

bench_string_ops.rb
----
Measure string array sort time
5.950000 0.060000 6.010000 ( 6.055483)
8.061000 0.000000 8.061000 ( 8.061000)
Measure hash put time
0.390000 0.010000 0.400000 ( 0.398593)
0.208000 0.000000 0.208000 ( 0.209000)
Measure hash get time (note: not same scale as put test)
1.620000 0.000000 1.620000 ( 1.646155)
0.740000 0.000000 0.740000 ( 0.740000)
Measure string == comparison time
2.340000 0.010000 2.350000 ( 2.368579)
0.812000 0.000000 0.812000 ( 0.812000)
Measure string == comparison time, different last pos
2.690000 0.000000 2.690000 ( 2.724772)
0.860000 0.000000 0.860000 ( 0.860000)
Measure string <=> comparison time
2.340000 0.010000 2.350000 ( 2.369915)
0.824000 0.000000 0.824000 ( 0.824000)
Measure 'string'.index(fixnum) time
0.790000 0.010000 0.800000 ( 0.808189)
1.113000 0.000000 1.113000 ( 1.113000)
Measure 'string'.index(str) time
2.860000 0.010000 2.870000 ( 2.892730)
0.956000 0.000000 0.956000 ( 0.956000)
Measure 'string'.rindex(fixnum) time
0.800000 0.000000 0.800000 ( 0.817300)
0.631000 0.000000 0.631000 ( 0.631000)
Measure 'string'.rindex(str) time
12.190000 0.040000 12.230000 ( 12.310492)
1.247000 0.000000 1.247000 ( 1.247000)

bench_ivar_access.rb
----
100k * 100 ivar gets, 1 ivar
0.500000 0.000000 0.500000 ( 0.582682)
0.340000 0.000000 0.340000 ( 0.340000)
100k * 100 ivar sets, 1 ivar
0.700000 0.010000 0.710000 ( 0.816724)
0.402000 0.000000 0.402000 ( 0.401000)
100k * 100 attr gets, 1 ivar
0.970000 0.000000 0.970000 ( 0.988212)
0.875000 0.000000 0.875000 ( 0.874000)
100k * 100 attr sets, 1 ivar
1.390000 0.010000 1.400000 ( 1.406535)
1.114000 0.000000 1.114000 ( 1.114000)
100k * 100 ivar gets, 2 ivars
0.490000 0.000000 0.490000 ( 0.506206)
0.344000 0.000000 0.344000 ( 0.344000)
100k * 100 ivar sets, 2 ivars
0.680000 0.000000 0.680000 ( 0.693064)
0.388000 0.000000 0.388000 ( 0.388000)
100k * 100 attr gets, 2 ivars
0.970000 0.000000 0.970000 ( 0.989313)
0.878000 0.000000 0.878000 ( 0.878000)
100k * 100 attr sets, 2 ivars
1.400000 0.000000 1.400000 ( 1.434206)
1.129000 0.000000 1.129000 ( 1.128000)
100k * 100 ivar gets, 4 ivars
0.490000 0.000000 0.490000 ( 0.502097)
0.340000 0.000000 0.340000 ( 0.340000)
100k * 100 ivar sets, 4 ivars
0.690000 0.000000 0.690000 ( 0.696852)
0.389000 0.000000 0.389000 ( 0.389000)
100k * 100 attr gets, 4 ivars
0.970000 0.010000 0.980000 ( 0.986163)
0.872000 0.000000 0.872000 ( 0.872000)
100k * 100 attr sets, 4 ivars
1.370000 0.010000 1.380000 ( 1.394921)
1.128000 0.000000 1.128000 ( 1.128000)
100k * 100 ivar gets, 8 ivars
0.500000 0.000000 0.500000 ( 0.519511)
0.344000 0.000000 0.344000 ( 0.344000)
100k * 100 ivar sets, 8 ivars
0.690000 0.000000 0.690000 ( 0.710896)
0.389000 0.000000 0.389000 ( 0.389000)
100k * 100 attr gets, 8 ivars
0.970000 0.000000 0.970000 ( 0.987582)
0.870000 0.000000 0.870000 ( 0.870000)
100k * 100 attr sets, 8 ivars
1.380000 0.000000 1.380000 ( 1.400542)
1.132000 0.000000 1.132000 ( 1.132000)
100k * 100 ivar gets, 16 ivars
0.500000 0.000000 0.500000 ( 0.523690)
0.342000 0.000000 0.342000 ( 0.343000)
100k * 100 ivar sets, 16 ivars
0.680000 0.000000 0.680000 ( 0.707385)
0.391000 0.000000 0.391000 ( 0.391000)
100k * 100 attr gets, 16 ivars
0.970000 0.010000 0.980000 ( 1.017880)
0.879000 0.000000 0.879000 ( 0.879000)
100k * 100 attr sets, 16 ivars
1.370000 0.010000 1.380000 ( 1.387713)
1.128000 0.000000 1.128000 ( 1.128000)

bench_for_loop.rb
----
100k calls to a method containing 5x a for loop over a 10-element range
0.890000 0.000000 0.890000 ( 0.917563)
0.654000 0.000000 0.654000 ( 0.654000)

All of this is really good, of course. Lots of green. And the red parts are not that bad either.

But here is the mystery. General Rails performance sucks. This test case can be ran by checking out petstore from tw-commons at RubyForge. There is a file called script/console_bench that will run the benchmarks. The two commands ran was: ruby script/console_bench production, and jruby -J-server -O script/console_bench ar_jdbc. With further ado, here are the numbers:

controller : 1.000000 0.070000 1.070000 ( 1.472701)
controller : 2.144000 0.000000 2.144000 ( 2.144000)
view : 4.470000 0.160000 4.630000 ( 4.794336)
view : 6.335000 0.000000 6.335000 ( 6.335000)
full action: 8.300000 0.400000 8.700000 ( 9.597351)
full action: 16.055000 0.000000 16.055000 ( 16.055000)

These numbers plainly stink. And we can't find the reason for it. As you can see, most benchmarks are much better, and there is nothing we can think of that would add up to a general degradation like this. It can't be IO since these tests use app.get directly. So what is it? There is very likely to be one or two bottlenecks causing this problem, and when we have found it, Rails will most likely blaze along. But profiling haven't helped yet. We found a few things, but there is still stuff missing. It's an interesting problem.

My current thesis is that either symbols or regexps are responsible. I'll spend the day checking that.

Upplagd av Ola Bini kl. 11:40:00 fm 8 kommentarer:

Etiketter: jruby, performance

tisdag, april 03, 2007

On ActiveRecord-JDBC performance

I have been a bit concerned about the performance of our component that connects ActiveRecord with JDBC. Since ActiveRecord demands that every result of a select should be turned into a big array of hashes of strings to strings, I suspected we would be quite inefficient at this, and I wasn't sure I could put all my faith in JDBC either.

So, as a good developer, I decided to test this, with a very small microbenchmark, to see how bad the situation actually was.

Since I really wanted to check the raw database and unmarshalling performance, I decided to not use ActiveRecord classes, but do executions directly. The inner part of my benchmark execution looks like this:

conn.create_table :test_perf, :force => true do |t|
 t.column :one, :string
 t.column :two, :string
end
 
100.times do
 conn.insert("INSERT INTO test_perf(one, two) VALUES('one','two')")
end
 
1000.times do
 conn.select_all("SELECT * FROM test_perf")
end

conn.drop_table :test_perf

It is executed with a recent MySQL Community Edition 5 server, locally, with matching JDBC drivers. The MRI tests is run with 1.8.6, and both use ActiveRecord 1.15.3. ActiveRecord-JDBC is a prerelease of 0.2.4, available from trunk. My machine is an IBM Thinkpad T43p, running Debian. It's 32bit and Java 6.

The results were highly interesting. First, let's see the baseline: the Ruby results:

 user system total real
 7.730000 0.020000 7.750000 ( 8.531013)

Frankly, I wasn't that impressed with these numbers. I thought Ruby database performance was better. Oh well. The interesting part is the JRuby AR-JDBC results:

 user system total real
 6.948000 0.000000 6.948000 ( 6.948000)

WOW! We're actually faster in this quite interesting test. Not what I had expected at all, but very welcome news indeed. Note that there is still much block overhead in the interpreter, so the results are a little bit skewed in MRI's favour by this, too.

Upplagd av Ola Bini kl. 9:18:00 em 3 kommentarer:

Etiketter: active record, jdbc, jruby, performance

söndag, februari 11, 2007

Ragel performance

I did some performance testing on the old and new Resolver implementation. The testing have some stupid tests that exercise bad parts of both implementations (like longest match, where it can't be decided what type something is until we have to backtrack about 20 characters). I placed these 24 strings in an array, and pounded on it with an instance of the ResolverImpl that is used in exactly the same way on all scalar values in an YAML document. The objective is to find out if the value is an implicit type or not. So basically, we give it a String, and get back a tag URI. So it's not like I'm parsing a language or anything. I'm just doing some recognizing here.

The old implementation was based on a Map> where the first letter of the string to resolve was used as an index to find a list of patterns to try sequentially. This worked fine, and made it extensible. But not very fast. This is the baseline. For 24 different strings, iterated 100 000 times for 2 400 000 resolves it takes 7879ms. That's OK, but not great.

Now, the new Ragel implementation is dead simple. It's just a translation of the regexps in the aforementioned Pattern's into a state machine. At EOF out actions (%/ for people in the Ragel knowhow), I execute an action that sets a local variable to a tag, and at the end of the resolve method returns that tag. Dead simple, and not exercising the full strength of Ragel, of course.
So, for the same number of resolves, this ResolverImpl takes 1288ms. That's 611% improvement in speed. Ain't it nice to have a friend such as Ragel? And the best part is, for harder tasks, these improvements would be even larger.
Finite State Machines are your friends. All your base are belongs to us.

Upplagd av Ola Bini kl. 8:50:00 em Inga kommentarer:

Etiketter: jruby, jvyaml, jvyamlb, performance, ragel

Results of jvYAMLb

Well, the YAML-based loading is in JRuby trunk. On the way, some parts of the codebase got seriously simplified. Very nice. The final result, with regard to performance, is about 20-30% on speed. But the important gain is in memory usage. The new implementation takes only about one fourth of the memory the original used. So that's great.

Regarding the Resolver, as I mentioned in the last post, it required a different approach, since regular JvYAML uses regular expressions to recognize implicit tags. Since that approach isn't good with byte arrays, I decided to use Ragel to generate a recognizer. That approach was very successful. As soon as I got that working it was the obvious approach. Ragel is good. Ragel is great. Ragel is wonderful. I will use the same approach for regular JvYAML to get away from all those Java regexps.

So, next step will be to do the same conversion of the emitter. Of course, at that point performance isn't that important. It's more about memory usage and the need to get away from another external dependency in JRuby.

Upplagd av Ola Bini kl. 6:03:00 em Inga kommentarer:

Etiketter: jruby, jvyaml, jvyamlb, performance

lördag, februari 10, 2007

Faster YAML with byte processing

As noted in my last post, I have started work on converting JvYAML into JvYAMLb. Right now I have finished the work on the Scanner and the Parser, and it's looking quite good. The numbers I reported in the last post for regular JvYAML performance was wrong though. We're looking at about 7.8s to 10.0s for scanning that 3.5MB gemspec file. (And that's only the scanning, not file IO). But with the Scanner converted to use bytes and ByteList, the same processing takes 2.8s. That's a substantial difference. But it doesn't end with that.

As I said I also converted the Parser. It doesn't do any String processing at all, so I didn't expect either a speedup or slowdown except for that from the Scanner. But... Before, parsing the gemspec took 18.515s, but after, it runs in 4s. That's a dramatic speedup, and I don't really know where it comes from. Unless the earlier implementation generated so much more garbage, and used more memory, that it was noticeable in speed. Anyway, this looks good for JRuby YAML processing, since I expect big reductions in complexity in the callpath and generation of objects after the YAML processor is byted all the way through.

But tomorrow it's time to work on the Resolver, and that's going to be hard. Optimally, it would be nice to have a byte-based Regexp engine. And maybe that would be something for JRuby too, know? Our Regular Expressions must be dead slow now that they have to convert to strings all the time.

Upplagd av Ola Bini kl. 12:44:00 fm Inga kommentarer:

Etiketter: byte arrays, jruby, jvyaml, performance

fredag, februari 09, 2007

Announcing JvYAMLb, a fork

The conversion to using byte-arrays as the basis of our String work in JRuby has led me to realize that JvYAML just doesn't cut it anymore. The performance wasn't good to begin with, and it's even worse having to convert EVERY SINGLE STRING read into bytes. That's no good. As an example why something needs to be done I'm going to describe the transformations that happen to data in JRuby if executing this code:

YAML.load_file "gems.yml"

First, the file is opened, and wrapped inside a RandomAccessFile. Then data is read from it by YAML. Reading will proceed like this:
1. Bytes are read through the RAF, hopefully in chunks.
2. Those bytes are wrapped in a RubyString so they can be returned from the IO#read method.
3. An IOReader wraps that RubyIO object, gets the RubyString and converts it from bytes into a String, and this String gets converted into a char array.
4. That char array is returned to the YAML Scanner.
5. The chars from the char array is collected in a StringBuffer, and saved in various Strings as token values.
6. The parser, resolver and constructor work on these Strings in various ways.
7. The JRubyConstructor takes these Strings and creates RubyString objects from them and in the process converting the String back to a byte array.

Is there any doubt that this process is slow? Well, it hasn't been that big of a problem until now, since we are doing so well on performance in other parts of the system.

So, the radical decision is to rewrite JvYAML, making it more SYCK-compliant, working with InputStreams and byte-arrays, and in the process get away from several of the steps above. So that's what I'm going to do. I hereby create JvYAMLb. It will only be a part of the JRuby codebase, but it will be reasonably separate, so it can be extracted for other purposes. I will not stop work on regular JvYAML, but will maintain both projects.

Since the objective of this new project is blazing speed, I will post some numbers on this now and again. But first I will show you the speed of the regular system. JvYAML's Scanner can scan an old gem source index (about 3.5MB) of 435654 tokens in about 1654ms. This is the baseline I'm going to use to test performance, and I'll post more on this as soon as the byte-based Scanner is ready to try out.

Upplagd av Ola Bini kl. 3:56:00 em Inga kommentarer:

Etiketter: byte arrays, jruby, jvyaml, jvyamlb, performance

Ola Bini: Programming Language Synchronicity