This code is very simple, but it is intended as an experiment in the relative performance of mutexes/CAS on different platforms. The latest version can always be found at:
https://github.com/alexdowad/showcase/blob/master/ruby-threads/concurrent_stack.rb
It comes with a built-in benchmarking script. Please try running and post the results, along with your OS, version of Ruby, and CPU. Please also post if you can find any way to improve style or performance.
For convenience the first version of the code is reproduced below:
# 3 thread-safe stack implementations
# Written by Alex Dowad
# Usage:
# stack.push(1)
# stack.peek => 1 (1 is not removed from stack)
# stack.pop => 1 (now 1 is removed from stack)
require 'rubygems' # for compatibility with MRI 1.8, JRuby
require 'thread'
require 'atomic' # atomic gem must be installed
# The easy one first
class ThreadSafeStack
def initialize
@s,@m = [],Mutex.new
end
def push(value)
@m.synchronize { @s.push(value) }
end
def pop
@m.synchronize { @s.pop }
end
def peek
@m.synchronize { @s.last }
end
end
# a non-locking version which uses compare-and-swap to update stack
class ConcurrentStack
Node = Struct.new(:value,:next)
def initialize
@top = Atomic.new(nil)
end
def push(value)
node = Node.new(value,nil)
@top.update { |current| node.next = current; node }
end
def pop
node = nil
@top.update do |current|
node = current
return if node.nil?
node.next
end
node.value
end
def peek
node = @top.value
return if node.nil?
node.value
end
end
# same as ConcurrentStack, but additionally recycles popped nodes
# (to reduce load on GC)
# a global free list is used, and is also updated using CAS,
# in exactly the same way as the stacks themselves
class RecyclingConcurrentStack
Node = Struct.new(:value,:next)
FREE_LIST = Atomic.new(nil)
def initialize
@top = Atomic.new(nil)
end
def push(value)
node = get_node(value)
@top.update { |current| node.next = current; node }
end
def pop
node = nil
@top.update do |current|
return if (node = current).nil?
node.next
end
result = node.value
FREE_LIST.update do |current|
node.next = current
node
end
result
end
def peek
node = @top.value
return if node.nil?
node.value
end
private
def get_node(val)
# if contention causes the CAS to fail, just allocate a new node
node = FREE_LIST.value
if node && FREE_LIST.compare_and_swap(node,node.next)
node.value = val
node
else
Node.new(val,nil)
end
end
end
# Test driver
if __FILE__ == 0ドル
require 'benchmark'
ITERATIONS_PER_TEST = 1000000
QUEUE,MUTEX = ConditionVariable.new,Mutex.new
def wait_for_signal
MUTEX.synchronize { QUEUE.wait(MUTEX) }
end
def send_signal
MUTEX.synchronize { QUEUE.broadcast }
end
def test(klass)
test_with_threads(klass,1)
test_with_threads(klass,5)
test_with_threads(klass,25)
end
def test_with_threads(klass,n_threads)
stack = klass.new
iterations = ITERATIONS_PER_TEST / n_threads
puts "Testing #{klass} with #{n_threads} thread#{'s' if n_threads>1}, iterating #{iterations}x each"
threads = n_threads.times.collect do
Thread.new do
wait_for_signal
iterations.times do
stack.push(rand(100))
stack.peek
stack.pop
end
end
end
n_gc = GC.count
sleep(0.001)
result = Benchmark.measure do
send_signal
threads.each { |t| t.join }
end
puts result
puts "Garbage collector ran #{GC.count - n_gc} times"
end
test(ThreadSafeStack)
test(ConcurrentStack)
test(RecyclingConcurrentStack)
end
-
\$\begingroup\$ The bounty on this question is going to expire; if you want it, run the benchmark and post the results, even if you can't find any way to improve the code! \$\endgroup\$Alex D– Alex D2012年03月07日 18:51:43 +00:00Commented Mar 7, 2012 at 18:51
1 Answer 1
I took the latest source code from the github and have started review of this project. Here are my benchmarks results:
# Intel(R) Core(TM)2 Duo CPU P7570 @ 2.26GHz
# ruby 1.9.2p290 (2011年07月09日 revision 32553) [i686-linux]
Testing ThreadSafeStack with 1 thread, iterating 1000000x each
2.250000 0.000000 2.250000 ( 2.254392)
Garbage collector ran 0 times
Testing ThreadSafeStack with 5 threads, iterating 200000x each
2.350000 0.020000 2.370000 ( 2.533645)
Garbage collector ran 0 times
Testing ThreadSafeStack with 25 threads, iterating 40000x each
2.890000 0.780000 3.670000 ( 3.170657)
Garbage collector ran 0 times
Testing ConcurrentStack with 1 thread, iterating 1000000x each
2.840000 0.000000 2.840000 ( 2.836545)
Garbage collector ran 54 times
Testing ConcurrentStack with 5 threads, iterating 200000x each
2.850000 0.010000 2.860000 ( 2.871540)
Garbage collector ran 54 times
Testing ConcurrentStack with 25 threads, iterating 40000x each
2.880000 0.000000 2.880000 ( 2.881745)
Garbage collector ran 56 times
Testing RecyclingConcurrentStack with 1 thread, iterating 1000000x each
3.910000 0.000000 3.910000 ( 3.906049)
Garbage collector ran 0 times
Testing RecyclingConcurrentStack with 5 threads, iterating 200000x each
3.900000 0.000000 3.900000 ( 3.910919)
Garbage collector ran 0 times
Testing RecyclingConcurrentStack with 25 threads, iterating 40000x each
3.960000 0.000000 3.960000 ( 4.024026)
Garbage collector ran 0 times
With jruby 1.6.5 (ruby-1.8.7-p330) I encountered a deadlock. With JRruby it is reproduced each time I ran the benchmark, but with MRI I've seed it only once so far. I will try to find the cause and post an update here.
UPDATE
I reviewed the implementation of concurrent stacks and did not found anything that may cause deadlocks or other issues. Looks like deadlock is caused by benchmarking code. With JRuby it just hangs (very frequently). With MRI I received 'deadlock detected' exceptions:
Testing ConcurrentStack with 25 threads, iterating 40000x each
/home/alex/Projects/read-write-lock/concurrent-stack.rb:143:in `join': deadlock detected (fatal)
from /home/alex/Projects/read-write-lock/concurrent-stack.rb:143:in `block (2 levels) in test_with_threads'
from /home/alex/Projects/read-write-lock/concurrent-stack.rb:143:in `each'
from /home/alex/Projects/read-write-lock/concurrent-stack.rb:143:in `block in test_with_threads'
from /usr/share/ruby-rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/benchmark.rb:295:in `measure'
from /home/alex/Projects/read-write-lock/concurrent-stack.rb:141:in `test_with_threads'
from /home/alex/Projects/read-write-lock/concurrent-stack.rb:120:in `test'
from /home/alex/Projects/read-write-lock/concurrent-stack.rb:150:in `<top (required)>'
from -e:1:in `load'
from -e:1:in `<main>'
And
/home/alex/Projects/read-write-lock/concurrent-stack.rb:143:in `join': deadlock detected (fatal)
from /home/alex/Projects/read-write-lock/concurrent-stack.rb:143:in `block (2 levels) in test_with_threads'
from /home/alex/Projects/read-write-lock/concurrent-stack.rb:143:in `each'
from /home/alex/Projects/read-write-lock/concurrent-stack.rb:143:in `block in test_with_threads'
Testing RecyclingConcurrentStack with 1 thread, iterating 1000000x each
from /usr/share/ruby-rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/benchmark.rb:295:in `measure'
from /home/alex/Projects/read-write-lock/concurrent-stack.rb:141:in `test_with_threads'
from /home/alex/Projects/read-write-lock/concurrent-stack.rb:118:in `test'
from /home/alex/Projects/read-write-lock/concurrent-stack.rb:152:in `block in <top (required)>'
from /home/alex/Projects/read-write-lock/concurrent-stack.rb:151:in `loop'
from /home/alex/Projects/read-write-lock/concurrent-stack.rb:151:in `<top (required)>'
from -e:1:in `load'
from -e:1:in `<main>'
As far as I can see, the only thing that may cause deadlock is situation when main thread does QUEUE.broadcast
and then does Thread#join
on the thread but this thread did not executed QUEUE.wait
yet. Then main thread infinitely waits for this thread and MRI may detect such deadlock situation.
When I increased sleep time from 0.001
to 0.01
and more it seemed that deadlock has gone. Looks like this issue is caused by long thread initialization time - sometimes it cannot start block execution for all threads before QUEUE.broadcast
call.
Also, as a small improvement, the following code may be extracted into private function like cache_node
to improve pop
method readability:
FREE_LIST.update do |current|
node.next = current
node
end
Stack implementation looks correct and clear, I did not found other issues.
With sleep == 0.05
benchmarks for JRuby are these:
# jruby 1.6.5 (ruby-1.8.7-p330) (2011年10月25日 9dcd388) (Java HotSpot(TM) Server VM 1.6.0_26) [linux-i386-java]
Testing ThreadSafeStack with 1 thread, iterating 1000000x each
2.881000 0.000000 2.881000 ( 2.881000)
Testing ThreadSafeStack with 5 threads, iterating 200000x each
2.658000 0.000000 2.658000 ( 2.658000)
Testing ThreadSafeStack with 25 threads, iterating 40000x each
2.885000 0.000000 2.885000 ( 2.885000)
Testing ConcurrentStack with 1 thread, iterating 1000000x each
2.142000 0.000000 2.142000 ( 2.142000)
Testing ConcurrentStack with 5 threads, iterating 200000x each
1.084000 0.000000 1.084000 ( 1.084000)
Testing ConcurrentStack with 25 threads, iterating 40000x each
0.969000 0.000000 0.969000 ( 0.970000)
Testing RecyclingConcurrentStack with 1 thread, iterating 1000000x each
2.628000 0.000000 2.628000 ( 2.628000)
Testing RecyclingConcurrentStack with 5 threads, iterating 200000x each
1.436000 0.000000 1.436000 ( 1.436000)
Testing RecyclingConcurrentStack with 25 threads, iterating 40000x each
1.473000 0.000000 1.473000 ( 1.473000)
-
\$\begingroup\$ Thanks!!! I am puzzled about the deadlock. Where does it deadlock? How many of the tests does it finish successfully? \$\endgroup\$Alex D– Alex D2012年03月09日 18:31:13 +00:00Commented Mar 9, 2012 at 18:31
-
\$\begingroup\$ @AlexD, I updated my answer \$\endgroup\$Aliaksei Kliuchnikau– Aliaksei Kliuchnikau2012年03月09日 20:07:42 +00:00Commented Mar 9, 2012 at 20:07
-
\$\begingroup\$ Thanks, good suggestion about
cache_node
. What benchmarking results do you get on JRuby? \$\endgroup\$Alex D– Alex D2012年03月10日 05:33:48 +00:00Commented Mar 10, 2012 at 5:33
Explore related questions
See similar questions with these tags.