Concurrent stack implementations in Ruby (relative performance of mutexes/CAS?)

Question 1

This code is very simple, but it is intended as an experiment in the relative performance of mutexes/CAS on different platforms. The latest version can always be found at:

https://github.com/alexdowad/showcase/blob/master/ruby-threads/concurrent_stack.rb

It comes with a built-in benchmarking script. Please try running and post the results, along with your OS, version of Ruby, and CPU. Please also post if you can find any way to improve style or performance.

For convenience the first version of the code is reproduced below:

# 3 thread-safe stack implementations
# Written by Alex Dowad
# Usage:
# stack.push(1)
# stack.peek => 1 (1 is not removed from stack)
# stack.pop => 1 (now 1 is removed from stack)
require 'rubygems' # for compatibility with MRI 1.8, JRuby
require 'thread'
require 'atomic' # atomic gem must be installed
# The easy one first
class ThreadSafeStack
 def initialize
 @s,@m = [],Mutex.new
 end
 def push(value)
 @m.synchronize { @s.push(value) }
 end
 def pop
 @m.synchronize { @s.pop }
 end
 def peek
 @m.synchronize { @s.last }
 end
end
# a non-locking version which uses compare-and-swap to update stack
class ConcurrentStack
 Node = Struct.new(:value,:next)
 def initialize
 @top = Atomic.new(nil)
 end
 def push(value)
 node = Node.new(value,nil)
 @top.update { |current| node.next = current; node }
 end
 def pop
 node = nil
 @top.update do |current|
 node = current
 return if node.nil?
 node.next
 end
 node.value
 end
 def peek
 node = @top.value
 return if node.nil?
 node.value
 end
end
# same as ConcurrentStack, but additionally recycles popped nodes
# (to reduce load on GC)
# a global free list is used, and is also updated using CAS,
# in exactly the same way as the stacks themselves
class RecyclingConcurrentStack
 Node = Struct.new(:value,:next)
 FREE_LIST = Atomic.new(nil)
 def initialize
 @top = Atomic.new(nil)
 end
 def push(value)
 node = get_node(value)
 @top.update { |current| node.next = current; node }
 end
 def pop
 node = nil
 @top.update do |current|
 return if (node = current).nil?
 node.next
 end
 result = node.value
 FREE_LIST.update do |current|
 node.next = current
 node
 end
 result
 end
 def peek
 node = @top.value
 return if node.nil?
 node.value
 end
 private
 def get_node(val)
 # if contention causes the CAS to fail, just allocate a new node
 node = FREE_LIST.value
 if node && FREE_LIST.compare_and_swap(node,node.next)
 node.value = val
 node
 else
 Node.new(val,nil)
 end
 end
end
# Test driver
if __FILE__ == 0ドル
 require 'benchmark'
 ITERATIONS_PER_TEST = 1000000
 QUEUE,MUTEX = ConditionVariable.new,Mutex.new
 def wait_for_signal
 MUTEX.synchronize { QUEUE.wait(MUTEX) }
 end
 def send_signal
 MUTEX.synchronize { QUEUE.broadcast }
 end
 def test(klass)
 test_with_threads(klass,1)
 test_with_threads(klass,5)
 test_with_threads(klass,25)
 end
 def test_with_threads(klass,n_threads)
 stack = klass.new
 iterations = ITERATIONS_PER_TEST / n_threads
 puts "Testing #{klass} with #{n_threads} thread#{'s' if n_threads>1}, iterating #{iterations}x each"
 threads = n_threads.times.collect do
 Thread.new do
 wait_for_signal
 iterations.times do
 stack.push(rand(100))
 stack.peek
 stack.pop
 end
 end
 end
 n_gc = GC.count
 sleep(0.001)
 result = Benchmark.measure do
 send_signal
 threads.each { |t| t.join }
 end
 puts result
 puts "Garbage collector ran #{GC.count - n_gc} times"
 end
 test(ThreadSafeStack)
 test(ConcurrentStack)
 test(RecyclingConcurrentStack)
end

Question 2

The bounty on this question is going to expire; if you want it, run the benchmark and post the results, even if you can't find any way to improve the code!

Question 3

I took the latest source code from the github and have started review of this project. Here are my benchmarks results:

# Intel(R) Core(TM)2 Duo CPU P7570 @ 2.26GHz
# ruby 1.9.2p290 (2011年07月09日 revision 32553) [i686-linux]
Testing ThreadSafeStack with 1 thread, iterating 1000000x each
 2.250000 0.000000 2.250000 ( 2.254392)
Garbage collector ran 0 times
Testing ThreadSafeStack with 5 threads, iterating 200000x each
 2.350000 0.020000 2.370000 ( 2.533645)
Garbage collector ran 0 times
Testing ThreadSafeStack with 25 threads, iterating 40000x each
 2.890000 0.780000 3.670000 ( 3.170657)
Garbage collector ran 0 times
Testing ConcurrentStack with 1 thread, iterating 1000000x each
 2.840000 0.000000 2.840000 ( 2.836545)
Garbage collector ran 54 times
Testing ConcurrentStack with 5 threads, iterating 200000x each
 2.850000 0.010000 2.860000 ( 2.871540)
Garbage collector ran 54 times
Testing ConcurrentStack with 25 threads, iterating 40000x each
 2.880000 0.000000 2.880000 ( 2.881745)
Garbage collector ran 56 times
Testing RecyclingConcurrentStack with 1 thread, iterating 1000000x each
 3.910000 0.000000 3.910000 ( 3.906049)
Garbage collector ran 0 times
Testing RecyclingConcurrentStack with 5 threads, iterating 200000x each
 3.900000 0.000000 3.900000 ( 3.910919)
Garbage collector ran 0 times
Testing RecyclingConcurrentStack with 25 threads, iterating 40000x each
 3.960000 0.000000 3.960000 ( 4.024026)
Garbage collector ran 0 times

With jruby 1.6.5 (ruby-1.8.7-p330) I encountered a deadlock. With JRruby it is reproduced each time I ran the benchmark, but with MRI I've seed it only once so far. I will try to find the cause and post an update here.

UPDATE

I reviewed the implementation of concurrent stacks and did not found anything that may cause deadlocks or other issues. Looks like deadlock is caused by benchmarking code. With JRuby it just hangs (very frequently). With MRI I received 'deadlock detected' exceptions:

Testing ConcurrentStack with 25 threads, iterating 40000x each
/home/alex/Projects/read-write-lock/concurrent-stack.rb:143:in `join': deadlock detected (fatal)
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:143:in `block (2 levels) in test_with_threads'
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:143:in `each'
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:143:in `block in test_with_threads'
 from /usr/share/ruby-rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/benchmark.rb:295:in `measure'
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:141:in `test_with_threads'
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:120:in `test'
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:150:in `<top (required)>'
 from -e:1:in `load'
 from -e:1:in `<main>'

And

/home/alex/Projects/read-write-lock/concurrent-stack.rb:143:in `join': deadlock detected (fatal)
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:143:in `block (2 levels) in test_with_threads'
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:143:in `each'
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:143:in `block in test_with_threads'
Testing RecyclingConcurrentStack with 1 thread, iterating 1000000x each
 from /usr/share/ruby-rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/benchmark.rb:295:in `measure'
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:141:in `test_with_threads'
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:118:in `test'
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:152:in `block in <top (required)>'
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:151:in `loop'
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:151:in `<top (required)>'
 from -e:1:in `load'
 from -e:1:in `<main>'

As far as I can see, the only thing that may cause deadlock is situation when main thread does QUEUE.broadcast and then does Thread#join on the thread but this thread did not executed QUEUE.wait yet. Then main thread infinitely waits for this thread and MRI may detect such deadlock situation.

When I increased sleep time from 0.001 to 0.01 and more it seemed that deadlock has gone. Looks like this issue is caused by long thread initialization time - sometimes it cannot start block execution for all threads before QUEUE.broadcast call.

Also, as a small improvement, the following code may be extracted into private function like cache_node to improve pop method readability:

FREE_LIST.update do |current|
 node.next = current
 node
end

Stack implementation looks correct and clear, I did not found other issues.

With sleep == 0.05 benchmarks for JRuby are these:

# jruby 1.6.5 (ruby-1.8.7-p330) (2011年10月25日 9dcd388) (Java HotSpot(TM) Server VM 1.6.0_26) [linux-i386-java]
Testing ThreadSafeStack with 1 thread, iterating 1000000x each
 2.881000 0.000000 2.881000 ( 2.881000)
Testing ThreadSafeStack with 5 threads, iterating 200000x each
 2.658000 0.000000 2.658000 ( 2.658000)
Testing ThreadSafeStack with 25 threads, iterating 40000x each
 2.885000 0.000000 2.885000 ( 2.885000)
Testing ConcurrentStack with 1 thread, iterating 1000000x each
 2.142000 0.000000 2.142000 ( 2.142000)
Testing ConcurrentStack with 5 threads, iterating 200000x each
 1.084000 0.000000 1.084000 ( 1.084000)
Testing ConcurrentStack with 25 threads, iterating 40000x each
 0.969000 0.000000 0.969000 ( 0.970000)
Testing RecyclingConcurrentStack with 1 thread, iterating 1000000x each
 2.628000 0.000000 2.628000 ( 2.628000)
Testing RecyclingConcurrentStack with 5 threads, iterating 200000x each
 1.436000 0.000000 1.436000 ( 1.436000)
Testing RecyclingConcurrentStack with 25 threads, iterating 40000x each
 1.473000 0.000000 1.473000 ( 1.473000)

Question 4

Thanks!!! I am puzzled about the deadlock. Where does it deadlock? How many of the tests does it finish successfully?

Question 5

@AlexD, I updated my answer

Question 6

Thanks, good suggestion about cache_node. What benchmarking results do you get on JRuby?

Aliaksei Kliuchnikau Aliaksei Kliuchnikau 3611 silver badge6 bronze badges · Accepted Answer · 2012-03-09 15:59:24Z

I took the latest source code from the github and have started review of this project. Here are my benchmarks results:

# Intel(R) Core(TM)2 Duo CPU P7570 @ 2.26GHz
# ruby 1.9.2p290 (2011年07月09日 revision 32553) [i686-linux]
Testing ThreadSafeStack with 1 thread, iterating 1000000x each
 2.250000 0.000000 2.250000 ( 2.254392)
Garbage collector ran 0 times
Testing ThreadSafeStack with 5 threads, iterating 200000x each
 2.350000 0.020000 2.370000 ( 2.533645)
Garbage collector ran 0 times
Testing ThreadSafeStack with 25 threads, iterating 40000x each
 2.890000 0.780000 3.670000 ( 3.170657)
Garbage collector ran 0 times
Testing ConcurrentStack with 1 thread, iterating 1000000x each
 2.840000 0.000000 2.840000 ( 2.836545)
Garbage collector ran 54 times
Testing ConcurrentStack with 5 threads, iterating 200000x each
 2.850000 0.010000 2.860000 ( 2.871540)
Garbage collector ran 54 times
Testing ConcurrentStack with 25 threads, iterating 40000x each
 2.880000 0.000000 2.880000 ( 2.881745)
Garbage collector ran 56 times
Testing RecyclingConcurrentStack with 1 thread, iterating 1000000x each
 3.910000 0.000000 3.910000 ( 3.906049)
Garbage collector ran 0 times
Testing RecyclingConcurrentStack with 5 threads, iterating 200000x each
 3.900000 0.000000 3.900000 ( 3.910919)
Garbage collector ran 0 times
Testing RecyclingConcurrentStack with 25 threads, iterating 40000x each
 3.960000 0.000000 3.960000 ( 4.024026)
Garbage collector ran 0 times

With jruby 1.6.5 (ruby-1.8.7-p330) I encountered a deadlock. With JRruby it is reproduced each time I ran the benchmark, but with MRI I've seed it only once so far. I will try to find the cause and post an update here.

UPDATE

I reviewed the implementation of concurrent stacks and did not found anything that may cause deadlocks or other issues. Looks like deadlock is caused by benchmarking code. With JRuby it just hangs (very frequently). With MRI I received 'deadlock detected' exceptions:

Testing ConcurrentStack with 25 threads, iterating 40000x each
/home/alex/Projects/read-write-lock/concurrent-stack.rb:143:in `join': deadlock detected (fatal)
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:143:in `block (2 levels) in test_with_threads'
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:143:in `each'
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:143:in `block in test_with_threads'
 from /usr/share/ruby-rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/benchmark.rb:295:in `measure'
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:141:in `test_with_threads'
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:120:in `test'
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:150:in `<top (required)>'
 from -e:1:in `load'
 from -e:1:in `<main>'

And

/home/alex/Projects/read-write-lock/concurrent-stack.rb:143:in `join': deadlock detected (fatal)
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:143:in `block (2 levels) in test_with_threads'
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:143:in `each'
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:143:in `block in test_with_threads'
Testing RecyclingConcurrentStack with 1 thread, iterating 1000000x each
 from /usr/share/ruby-rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/benchmark.rb:295:in `measure'
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:141:in `test_with_threads'
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:118:in `test'
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:152:in `block in <top (required)>'
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:151:in `loop'
 from /home/alex/Projects/read-write-lock/concurrent-stack.rb:151:in `<top (required)>'
 from -e:1:in `load'
 from -e:1:in `<main>'

As far as I can see, the only thing that may cause deadlock is situation when main thread does QUEUE.broadcast and then does Thread#join on the thread but this thread did not executed QUEUE.wait yet. Then main thread infinitely waits for this thread and MRI may detect such deadlock situation.

When I increased sleep time from 0.001 to 0.01 and more it seemed that deadlock has gone. Looks like this issue is caused by long thread initialization time - sometimes it cannot start block execution for all threads before QUEUE.broadcast call.

Also, as a small improvement, the following code may be extracted into private function like cache_node to improve pop method readability:

FREE_LIST.update do |current|
 node.next = current
 node
end

Stack implementation looks correct and clear, I did not found other issues.

With sleep == 0.05 benchmarks for JRuby are these:

# jruby 1.6.5 (ruby-1.8.7-p330) (2011年10月25日 9dcd388) (Java HotSpot(TM) Server VM 1.6.0_26) [linux-i386-java]
Testing ThreadSafeStack with 1 thread, iterating 1000000x each
 2.881000 0.000000 2.881000 ( 2.881000)
Testing ThreadSafeStack with 5 threads, iterating 200000x each
 2.658000 0.000000 2.658000 ( 2.658000)
Testing ThreadSafeStack with 25 threads, iterating 40000x each
 2.885000 0.000000 2.885000 ( 2.885000)
Testing ConcurrentStack with 1 thread, iterating 1000000x each
 2.142000 0.000000 2.142000 ( 2.142000)
Testing ConcurrentStack with 5 threads, iterating 200000x each
 1.084000 0.000000 1.084000 ( 1.084000)
Testing ConcurrentStack with 25 threads, iterating 40000x each
 0.969000 0.000000 0.969000 ( 0.970000)
Testing RecyclingConcurrentStack with 1 thread, iterating 1000000x each
 2.628000 0.000000 2.628000 ( 2.628000)
Testing RecyclingConcurrentStack with 5 threads, iterating 200000x each
 1.436000 0.000000 1.436000 ( 1.436000)
Testing RecyclingConcurrentStack with 25 threads, iterating 40000x each
 1.473000 0.000000 1.473000 ( 1.473000)

Thanks!!! I am puzzled about the deadlock. Where does it deadlock? How many of the tests does it finish successfully?
Thanks, good suggestion about cache_node. What benchmarking results do you get on JRuby?

Stack Exchange Network

Concurrent stack implementations in Ruby (relative performance of mutexes/CAS?)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Concurrent stack implementations in Ruby (relative performance of mutexes/CAS?)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions