I have written a program in Ruby which calculates the median of a streaming set of numbers dynamically. For each number that arrives, the median is calculated for all the numbers that have arrived so far.
The issue I am facing is that the program times out on some inputs, and, most likely, the reason is that I return a variable that is passed to a routine. I need to do this because this variable changes inside the routine and I can't pass by reference in Ruby.
Is there any workaround for this inability to pass by reference? I have seen the code involving "binding" but I found it clumsy. Why does Ruby not have a simple, straightforward pass-by-reference as do C++, C# and Java? I have pasted my code below.
I maintain two heaps max and min. The value in any node in the max (min) heap is at least as large (small) as the value in any node below it. Any arriving number is inserted into the max heap or min heap with the proviso that the difference in the number of nodes in the two heaps does not exceed 1, and that the heap property be preserved for both heaps. With this proviso, the median can be calculated from the root elements in the two heaps.
Node=Struct.new(:v,:l,:r)do
end
max=0
min=0
def count(m)
m==0 ? 0 : 1 + count(m.l) + count(m.r)
end
def delete(d,type)
if d==0
return 0
end
if type=="max"
if d.l==0 && d.r==0
return 0
elsif d.l!=0 && d.r!=0
if d.l.v > d.r.v
d.v= d.l.v
d.l=delete(d.l,type)
else
d.v= d.r.v
d.r= delete(d.r,type)
end
elsif d.l == 0
d.v= d.r.v
d.r=delete(d.r,type)
else
d.v= d.l.v
d.l=delete(d.l,type)
end
else
if d.l==0 && d.r==0
return 0
elsif d.l!=0 && d.r!=0
if d.l.v < d.r.v
d.v= d.l.v
d.l=delete(d.l,type)
else
d.v= d.r.v
d.r= delete(d.r,type)
end
elsif d.l == 0
d.v= d.r.v
d.r=delete(d.r,type)
else
d.v= d.l.v
d.l=delete(d.l,type)
end
end
return d
end
def insert(v,type,t)
if t==0
return t=Node.new(v,0,0)
end
if type=="max"
if v>t.v
tmp=t.v
t.v=v
t.l=insert(tmp, type,t.l)
else
t.l=insert(v,type,t.l)
end
else
if v<t.v
tmp=t.v
t.v=v
t.l=insert(tmp, type,t.l)
else
t.l=insert(v,type,t.l)
end
end
return t
end
n=gets.chomp.to_i
n.times do
v=gets.chomp.to_i
if (k=count(max)-count(min))==1
if v<max.v
tmp=max.v
max=delete(max,"max")
min=insert(tmp,"min",min)
max=insert(v,"max",max)
else
min=insert(v,"min",min)
end
p (max.v+min.v)/2.0
elsif k==0
if max==0 || v<=max.v
max=insert(v,"max",max)
p max.v/1.0
else
min=insert(v,"min",min)
p min.v/1.0
end
else
if v>min.v
tmp=min.v
min=delete(min,"min")
max=insert(tmp,"max",max)
min=insert(v,"min",min)
else
max=insert(v,"max",max)
end
p (max.v+min.v)/2.0
end
end
-
\$\begingroup\$ If you don't want to mess with binding, you can instead wrap your variable in an object and pass the object, which will enable pass by reference. All you need is something really simple like: "Class Wrapper \n attr_accessor :value \n end" \$\endgroup\$Zack– Zack2016年10月04日 17:26:14 +00:00Commented Oct 4, 2016 at 17:26
-
3\$\begingroup\$ "Why does Ruby not have a simple, straightforward pass-by-reference as do C++, C# and Java?" – Java doesn't support pass-by-reference at all, it is strictly pass-by-value, always, no exceptions. (It has, in fact, the exact same semantics as Ruby in this regard.) C++ and C♯ support pass-by-reference, but you have to explicitly ask for it; the default is pass-by-value. \$\endgroup\$Jörg W Mittag– Jörg W Mittag2016年10月08日 23:22:23 +00:00Commented Oct 8, 2016 at 23:22
-
\$\begingroup\$ arrays and hashes are by default mutable, is this what you mean? \$\endgroup\$max pleaner– max pleaner2016年10月23日 20:20:57 +00:00Commented Oct 23, 2016 at 20:20
2 Answers 2
Median vs. Mean
mean
in math is commonly referred to as 'average'. Median
means 'the number in the middle'.
The mean for (1, 4, 11) would be (1+4+11)/3.0 == 5.3333... The median for (1, 4, 11) would be [1,4,11][(3/2.0).floor] == 4
Using your code, when I enter 2 2 3
( take a list of 2 entries. Give me the value for 2 and 3) I get back 2.5
. That's a mean, not a median. I guess there really is no median for 2 and 3, but for the sake of conversation let's say we use the logic above, it'd be 2.
I'm going to assume you want a mean instead of a median, because implementations speak more to intent than technical words.
Return Value Causing Timeout?
The issue I am facing is that the program times out on some inputs, and, most likely, the reason is that I return a variable that is passed to a routine. I need to do this because this variable changes inside the routine and I can't pass by reference in Ruby.
It's not clear to me why you think a timeout is caused when you "return a variable that is passed to a routine." I'm pretty sure that is not the reason.
Passing by Reference doesn't include Immutables
Is there any workaround for this inability to pass by reference? I have seen the code involving "binding" but I found it clumsy. Why does Ruby not have a simple, straightforward pass-by-reference as do C++, C# and Java?
Ruby uses pass-by-reference. However, if you "throw away" the referenced object, it will appear that Ruby used pass-by-value. Immutable items like numbers, symbols, strings, etc will always act this way because if they are changed, their reference will always be lost. If you increment or reassign a number, you aren't changing an object. You are point to a different object.
For a trivial example, just drop into IRB:
>> x = 8
=> 8
>> x.object_id
=> 17
>> x+=1
=> 9
>> x.object_id
=> 19
Without going to far afield of your question, this is because when you change from 8 to 9, you aren't changing the internal state of some object. You're actually pointing to a new object. And if you replaced a mutable object reference in ruby, you'd lose that reference, too.
def change_number(num)
num = num + 1
end
def change_object(obj)
obj = {}
end
def add_to_object(obj)
obj[:bar] = :baz
end
>> score = 8
>> score.object_id
=> 17
>> out = change_number(score)
=> 9
>> out.object_id
=> 19
>> score.object_id
=> 17 # number 9 is object 19. A new object.
>> mutable = {fu: 'bar'}
=> {:fu=>"bar"}
>> mutable.object_id
=> 6553100
>> out = change_object(mutable) # this is pass-by-reference
=> {}
>> mutable.object_id
=> 6553100 # same object we passed in
>> out.object_id
=> 7488280 # a new object was created, despite pass-by-reference.
>> out = add_to_object(mutable)
=> :baz
>> mutable.object_id
=> 6553100 # same object
>> mutable
=> {:fu=>"bar", :bar=>:baz} # object was passed by reference!
Mean Value
This is Code Review. If this were posted on Stack Overflow, I would say, "Don't do it that way. Just keep a single sorted list of the values you've seen. Then use the sum and count."
But since this is a code review site, I'll give you some thoughts on your code.
Where are your unit tests?
I highly recommend using full variable names instead of v, k, m, d, etc. We live in a time where computers have more than enough memory to remember an entire word. And there's typically autocomplete. So just type out the full word so when you go back 3 months later you won't have to figure out what each letter meant.
This is an object-oriented language, so using classes would be helpful. For instance, let's create a NumberList class.
Refactor using OO
(I'm assuming that insert(v,"max",max)
should be insert(value,"max",max)
)
Node=Struct.new(:v,:l,:r)do
end
class NumberList
def initialize()
@max = 0
@min = 0
end
def count(m)
m==0 ? 0 : 1 + count(m.l) + count(m.r)
end
def delete(d,type)
if d==0
return 0
end
if type=="max"
if d.l==0 && d.r==0
return 0
elsif d.l!=0 && d.r!=0
if d.l.v > d.r.v
d.v= d.l.v
d.l=delete(d.l,type)
else
d.v= d.r.v
d.r= delete(d.r,type)
end
elsif d.l == 0
d.v= d.r.v
d.r=delete(d.r,type)
else
d.v= d.l.v
d.l=delete(d.l,type)
end
else
if d.l==0 && d.r==0
return 0
elsif d.l!=0 && d.r!=0
if d.l.v < d.r.v
d.v= d.l.v
d.l=delete(d.l,type)
else
d.v= d.r.v
d.r= delete(d.r,type)
end
elsif d.l == 0
d.v= d.r.v
d.r=delete(d.r,type)
else
d.v= d.l.v
d.l=delete(d.l,type)
end
end
return d
end
def insert(v,type,t)
if t==0
return t=Node.new(v,0,0)
end
if type=="max"
if v>t.v
tmp=t.v
t.v=v
t.l=insert(tmp, type,t.l)
else
t.l=insert(v,type,t.l)
end
else
if v<t.v
tmp=t.v
t.v=v
t.l=insert(tmp, type,t.l)
else
t.l=insert(v,type,t.l)
end
end
return t
end
def add_value( value )
if (k=count(@max)-count(@min))==1
if value<@max.v
[email protected]
@max=delete(@max,"max")
@min=insert(tmp,"min",@min)
@max=insert(value,"max",@max)
else
@min=insert(value,"min",@min)
end
p (@[email protected])/2.0
elsif k==0
if @max==0 || value<[email protected]
@max=insert(value,"max",@max)
p @max.v/1.0
else
@min=insert(value,"min",@min)
p @min.v/1.0
end
else
if value>@min.v
[email protected]
@min=delete(@min,"min")
@max=insert(tmp,"max",@max)
@min=insert(value,"min",@min)
else
@max=insert(value,"max",@max)
end
p (@[email protected])/2.0
end
end
end
keeper = NumberList.new
n=gets.chomp.to_i
n.times do
keeper.add_value( gets.chomp.to_i )
end
Now that we have it in an object, I could write some tests. But since you posted this 3 months ago, I'm not sure you're even going to read it.
Other things you should change
This is more complicated than necessary. Think harder about your problem rather than jumping in!
Ruby has loose typing, so it is possible to put either a number or an object in a variable. But if you're going to do that, your code shouldn't have to know about it. The object should have basically the same interface as a number. In this case, you initialize with an integer 0 and then later use a Node, which response to v, l and r. Not at all the same. I'm a little confused about how this works but I'd think an empty Node would be your best choice.
Your methods are hard to read. Method names should be "executable comments" ... in other words, you should know on a granular level what's happening just by reading the method name. And they should be written with the idea that a stranger should understand them.
Use more, shorter methods
Getting the Mean, the Ruby Way
Finally, here's how I'd implement a mean value in Ruby. Since Ruby allows you to add methods to core classes like Array, it makes some sense to add mean
and sum
to those classes. Of course, this will break if you try to do ['Jim', 'Spock', 'McCoy'].sum
. So don't do that.
require 'bigdecimal'
class Array
def sum
self.inject(0){|sum,x| sum + x }
end
def mean
BigDecimal.new(self.sum) / self.count
end
end
in the console, you'd do this:
>> [2,4,66,32,3].mean.to_f
=> 21.4
-
1\$\begingroup\$ Nice answer, instead of adding directly in the
Array
class, you could use refinements ruby-doc.org/core-2.1.1/doc/syntax/refinements_rdoc.html ( I just learned from it from WayneConrad see : chat.stackoverflow.com/transcript/message/35152333#35152333) \$\endgroup\$Marc-Andre– Marc-Andre2017年01月17日 20:13:52 +00:00Commented Jan 17, 2017 at 20:13
MustModify did a great review of your code as a whole. I won't retread all that but rather look at the overall cleanliness and style of the code.
Your indentation is all over the place, making it really hard to read at times. Ruby convention is to use 2 spaces of indentation. Spaces; not tabs.
You could do with more horizontal whitespace, again for readability. I.e.
x = 2
notx=2
, andx == foo
notx==foo
. The whitespace you do have is also all over the place.return
is implicit at a method's exit point. You only need explicitreturn
s if you're returning early. And for early returns you have post-fixed contionals, such asreturn 0 if d.zero?
(notice also the use of#zero?
instead of== 0
, but read on why you shouldn't do that either)Don't use zero as a NULL value. Ruby has
nil
and theObject#nil?
check method.nil
is also false'y, so in many cases you can skip the explicitnil?
check. So yourmin
andmax
variables should be initialized tonil
- not integer zero.Don't use "magic" strings when you have
Symbol
s. I.e."max"
and"min"
should rather be:max
and:min
Don't do assignments in conditions (e.g.
if (x = foo()) == bar
; there's really no reason to, and it makes it harder to follow what's happening. Compact code ≠ faster code.Don't abbreviate variables (yes, I'm referring to the
Node
ivars in particular). Again, hard to read and follow. Just call 'emleft
,right
, andvalue
.Don't specify an empty block for
Struct.new
- it's optional, so leave it out.
And finally, you just have a lot of duplication. For instance, insert
does the same thing in both its main branches apart for one byte: >
vs <
. And the story's the same for #delete
: The only difference between the main branches is >
vs <
- the rest of the code is identical and pure duplication.
Really basic refactoring of #insert
might be:
def insert(value, type, node = nil)
return Node.new(value) if node.nil? # struct ivars are auto-initialized to nil
# could also say:
# return Node.new(value) unless node
if (type == :max && value > node.value) || (type == :min && value < node.value)
tmp = node.value
node.value = value
node.left = insert(tmp, type, node.left)
else
node.left = insert(value, type, node.left)
end
node
end
Since the count is given as the first number, another way to calculate the mean (not the median) without storing the entire input would be:
mean = 0
count = gets.chomp.to_i
count.times do
mean += gets.chomp.to_i
end
puts mean.fdiv(count)
-
\$\begingroup\$ All good thoughts. I got so bogged down in the forest I forgot the trees. \$\endgroup\$MustModify– MustModify2017年01月19日 16:38:27 +00:00Commented Jan 19, 2017 at 16:38