5
\$\begingroup\$

I have written a program in Ruby which calculates the median of a streaming set of numbers dynamically. For each number that arrives, the median is calculated for all the numbers that have arrived so far.

The issue I am facing is that the program times out on some inputs, and, most likely, the reason is that I return a variable that is passed to a routine. I need to do this because this variable changes inside the routine and I can't pass by reference in Ruby.

Is there any workaround for this inability to pass by reference? I have seen the code involving "binding" but I found it clumsy. Why does Ruby not have a simple, straightforward pass-by-reference as do C++, C# and Java? I have pasted my code below.

I maintain two heaps max and min. The value in any node in the max (min) heap is at least as large (small) as the value in any node below it. Any arriving number is inserted into the max heap or min heap with the proviso that the difference in the number of nodes in the two heaps does not exceed 1, and that the heap property be preserved for both heaps. With this proviso, the median can be calculated from the root elements in the two heaps.

Node=Struct.new(:v,:l,:r)do
end
max=0
min=0
def count(m)
 m==0 ? 0 : 1 + count(m.l) + count(m.r)
end
def delete(d,type)
 if d==0 
 return 0
 end
 if type=="max"
 if d.l==0 && d.r==0
 return 0
 elsif d.l!=0 && d.r!=0
 if d.l.v > d.r.v 
 d.v= d.l.v 
 d.l=delete(d.l,type)
 else
 d.v= d.r.v
 d.r= delete(d.r,type)
 end
 elsif d.l == 0
 d.v= d.r.v
 d.r=delete(d.r,type)
 else
 d.v= d.l.v
 d.l=delete(d.l,type)
 end
 else
 if d.l==0 && d.r==0
 return 0
 elsif d.l!=0 && d.r!=0
 if d.l.v < d.r.v 
 d.v= d.l.v 
 d.l=delete(d.l,type)
 else
 d.v= d.r.v
 d.r= delete(d.r,type)
 end
 elsif d.l == 0
 d.v= d.r.v
 d.r=delete(d.r,type)
 else
 d.v= d.l.v
 d.l=delete(d.l,type)
 end
 end
return d
 end
def insert(v,type,t)
 if t==0
 return t=Node.new(v,0,0)
 end
 if type=="max"
 if v>t.v 
 tmp=t.v
 t.v=v
 t.l=insert(tmp, type,t.l)
 else
 t.l=insert(v,type,t.l)
 end
 else
 if v<t.v 
 tmp=t.v
 t.v=v
 t.l=insert(tmp, type,t.l)
 else
 t.l=insert(v,type,t.l)
 end
 end 
 return t
end
n=gets.chomp.to_i
n.times do
 v=gets.chomp.to_i
 if (k=count(max)-count(min))==1 
 if v<max.v
 tmp=max.v
 max=delete(max,"max")
 min=insert(tmp,"min",min)
 max=insert(v,"max",max)
 else
 min=insert(v,"min",min)
 end
 p (max.v+min.v)/2.0
 elsif k==0
 if max==0 || v<=max.v
 max=insert(v,"max",max)
 p max.v/1.0
 else
 min=insert(v,"min",min)
 p min.v/1.0
 end
 else
 if v>min.v
 tmp=min.v
 min=delete(min,"min")
 max=insert(tmp,"max",max)
 min=insert(v,"min",min)
 else
 max=insert(v,"max",max)
 end
 p (max.v+min.v)/2.0
 end
end
200_success
145k22 gold badges190 silver badges478 bronze badges
asked Oct 4, 2016 at 5:57
\$\endgroup\$
3
  • \$\begingroup\$ If you don't want to mess with binding, you can instead wrap your variable in an object and pass the object, which will enable pass by reference. All you need is something really simple like: "Class Wrapper \n attr_accessor :value \n end" \$\endgroup\$ Commented Oct 4, 2016 at 17:26
  • 3
    \$\begingroup\$ "Why does Ruby not have a simple, straightforward pass-by-reference as do C++, C# and Java?" – Java doesn't support pass-by-reference at all, it is strictly pass-by-value, always, no exceptions. (It has, in fact, the exact same semantics as Ruby in this regard.) C++ and C♯ support pass-by-reference, but you have to explicitly ask for it; the default is pass-by-value. \$\endgroup\$ Commented Oct 8, 2016 at 23:22
  • \$\begingroup\$ arrays and hashes are by default mutable, is this what you mean? \$\endgroup\$ Commented Oct 23, 2016 at 20:20

2 Answers 2

7
\$\begingroup\$

Median vs. Mean

mean in math is commonly referred to as 'average'. Median means 'the number in the middle'.

The mean for (1, 4, 11) would be (1+4+11)/3.0 == 5.3333... The median for (1, 4, 11) would be [1,4,11][(3/2.0).floor] == 4

Using your code, when I enter 2 2 3 ( take a list of 2 entries. Give me the value for 2 and 3) I get back 2.5. That's a mean, not a median. I guess there really is no median for 2 and 3, but for the sake of conversation let's say we use the logic above, it'd be 2.

I'm going to assume you want a mean instead of a median, because implementations speak more to intent than technical words.

Return Value Causing Timeout?

The issue I am facing is that the program times out on some inputs, and, most likely, the reason is that I return a variable that is passed to a routine. I need to do this because this variable changes inside the routine and I can't pass by reference in Ruby.

It's not clear to me why you think a timeout is caused when you "return a variable that is passed to a routine." I'm pretty sure that is not the reason.

Passing by Reference doesn't include Immutables

Is there any workaround for this inability to pass by reference? I have seen the code involving "binding" but I found it clumsy. Why does Ruby not have a simple, straightforward pass-by-reference as do C++, C# and Java?

Ruby uses pass-by-reference. However, if you "throw away" the referenced object, it will appear that Ruby used pass-by-value. Immutable items like numbers, symbols, strings, etc will always act this way because if they are changed, their reference will always be lost. If you increment or reassign a number, you aren't changing an object. You are point to a different object.

For a trivial example, just drop into IRB:

>> x = 8
=> 8
>> x.object_id
=> 17
>> x+=1
=> 9
>> x.object_id
=> 19

Without going to far afield of your question, this is because when you change from 8 to 9, you aren't changing the internal state of some object. You're actually pointing to a new object. And if you replaced a mutable object reference in ruby, you'd lose that reference, too.

 def change_number(num)
 num = num + 1
 end
 def change_object(obj)
 obj = {}
 end
 def add_to_object(obj)
 obj[:bar] = :baz
 end
>> score = 8
>> score.object_id
=> 17
>> out = change_number(score)
=> 9
>> out.object_id
=> 19
>> score.object_id
=> 17 # number 9 is object 19. A new object.
>> mutable = {fu: 'bar'}
=> {:fu=>"bar"}
>> mutable.object_id
=> 6553100
>> out = change_object(mutable) # this is pass-by-reference
=> {}
>> mutable.object_id
=> 6553100 # same object we passed in
>> out.object_id
=> 7488280 # a new object was created, despite pass-by-reference.
>> out = add_to_object(mutable)
=> :baz
>> mutable.object_id
=> 6553100 # same object
>> mutable
=> {:fu=>"bar", :bar=>:baz} # object was passed by reference!

Mean Value

This is Code Review. If this were posted on Stack Overflow, I would say, "Don't do it that way. Just keep a single sorted list of the values you've seen. Then use the sum and count."

But since this is a code review site, I'll give you some thoughts on your code.

  • Where are your unit tests?

  • I highly recommend using full variable names instead of v, k, m, d, etc. We live in a time where computers have more than enough memory to remember an entire word. And there's typically autocomplete. So just type out the full word so when you go back 3 months later you won't have to figure out what each letter meant.

  • This is an object-oriented language, so using classes would be helpful. For instance, let's create a NumberList class.

Refactor using OO

(I'm assuming that insert(v,"max",max) should be insert(value,"max",max))

Node=Struct.new(:v,:l,:r)do
end
class NumberList
 def initialize()
 @max = 0
 @min = 0
 end
 def count(m)
 m==0 ? 0 : 1 + count(m.l) + count(m.r)
 end
 def delete(d,type)
 if d==0
 return 0
 end
 if type=="max"
 if d.l==0 && d.r==0
 return 0
 elsif d.l!=0 && d.r!=0
 if d.l.v > d.r.v
 d.v= d.l.v
 d.l=delete(d.l,type)
 else
 d.v= d.r.v
 d.r= delete(d.r,type)
 end
 elsif d.l == 0
 d.v= d.r.v
 d.r=delete(d.r,type)
 else
 d.v= d.l.v
 d.l=delete(d.l,type)
 end
 else
 if d.l==0 && d.r==0
 return 0
 elsif d.l!=0 && d.r!=0
 if d.l.v < d.r.v
 d.v= d.l.v
 d.l=delete(d.l,type)
 else
 d.v= d.r.v
 d.r= delete(d.r,type)
 end
 elsif d.l == 0
 d.v= d.r.v
 d.r=delete(d.r,type)
 else
 d.v= d.l.v
 d.l=delete(d.l,type)
 end
 end
 return d
 end
 def insert(v,type,t)
 if t==0
 return t=Node.new(v,0,0)
 end
 if type=="max"
 if v>t.v
 tmp=t.v
 t.v=v
 t.l=insert(tmp, type,t.l)
 else
 t.l=insert(v,type,t.l)
 end
 else
 if v<t.v
 tmp=t.v
 t.v=v
 t.l=insert(tmp, type,t.l)
 else
 t.l=insert(v,type,t.l)
 end
 end
 return t
 end
 def add_value( value )
 if (k=count(@max)-count(@min))==1
 if value<@max.v
 [email protected]
 @max=delete(@max,"max")
 @min=insert(tmp,"min",@min)
 @max=insert(value,"max",@max)
 else
 @min=insert(value,"min",@min)
 end
 p (@[email protected])/2.0
 elsif k==0
 if @max==0 || value<[email protected]
 @max=insert(value,"max",@max)
 p @max.v/1.0
 else
 @min=insert(value,"min",@min)
 p @min.v/1.0
 end
 else
 if value>@min.v
 [email protected]
 @min=delete(@min,"min")
 @max=insert(tmp,"max",@max)
 @min=insert(value,"min",@min)
 else
 @max=insert(value,"max",@max)
 end
 p (@[email protected])/2.0
 end
 end
end
keeper = NumberList.new
n=gets.chomp.to_i
n.times do
 keeper.add_value( gets.chomp.to_i )
end

Now that we have it in an object, I could write some tests. But since you posted this 3 months ago, I'm not sure you're even going to read it.

Other things you should change

  • This is more complicated than necessary. Think harder about your problem rather than jumping in!

  • Ruby has loose typing, so it is possible to put either a number or an object in a variable. But if you're going to do that, your code shouldn't have to know about it. The object should have basically the same interface as a number. In this case, you initialize with an integer 0 and then later use a Node, which response to v, l and r. Not at all the same. I'm a little confused about how this works but I'd think an empty Node would be your best choice.

  • Your methods are hard to read. Method names should be "executable comments" ... in other words, you should know on a granular level what's happening just by reading the method name. And they should be written with the idea that a stranger should understand them.

  • Use more, shorter methods

Getting the Mean, the Ruby Way

Finally, here's how I'd implement a mean value in Ruby. Since Ruby allows you to add methods to core classes like Array, it makes some sense to add mean and sum to those classes. Of course, this will break if you try to do ['Jim', 'Spock', 'McCoy'].sum. So don't do that.

require 'bigdecimal'
class Array
 def sum
 self.inject(0){|sum,x| sum + x }
 end
 def mean
 BigDecimal.new(self.sum) / self.count
 end
end

in the console, you'd do this:

>> [2,4,66,32,3].mean.to_f
=> 21.4
mkrieger1
1,7841 gold badge14 silver badges26 bronze badges
answered Jan 17, 2017 at 19:25
\$\endgroup\$
1
5
\$\begingroup\$

MustModify did a great review of your code as a whole. I won't retread all that but rather look at the overall cleanliness and style of the code.

  • Your indentation is all over the place, making it really hard to read at times. Ruby convention is to use 2 spaces of indentation. Spaces; not tabs.

  • You could do with more horizontal whitespace, again for readability. I.e. x = 2 not x=2, and x == foo not x==foo. The whitespace you do have is also all over the place.

  • return is implicit at a method's exit point. You only need explicit returns if you're returning early. And for early returns you have post-fixed contionals, such as return 0 if d.zero? (notice also the use of #zero? instead of == 0, but read on why you shouldn't do that either)

  • Don't use zero as a NULL value. Ruby has nil and the Object#nil? check method. nil is also false'y, so in many cases you can skip the explicit nil? check. So your min and max variables should be initialized to nil - not integer zero.

  • Don't use "magic" strings when you have Symbols. I.e. "max" and "min" should rather be :max and :min

  • Don't do assignments in conditions (e.g. if (x = foo()) == bar; there's really no reason to, and it makes it harder to follow what's happening. Compact code ≠ faster code.

  • Don't abbreviate variables (yes, I'm referring to the Node ivars in particular). Again, hard to read and follow. Just call 'em left, right, and value.

  • Don't specify an empty block for Struct.new - it's optional, so leave it out.

And finally, you just have a lot of duplication. For instance, insert does the same thing in both its main branches apart for one byte: > vs <. And the story's the same for #delete: The only difference between the main branches is > vs < - the rest of the code is identical and pure duplication.

Really basic refactoring of #insert might be:

def insert(value, type, node = nil)
 return Node.new(value) if node.nil? # struct ivars are auto-initialized to nil
 # could also say:
 # return Node.new(value) unless node
 if (type == :max && value > node.value) || (type == :min && value < node.value)
 tmp = node.value
 node.value = value
 node.left = insert(tmp, type, node.left)
 else
 node.left = insert(value, type, node.left)
 end
 node
end

Since the count is given as the first number, another way to calculate the mean (not the median) without storing the entire input would be:

mean = 0
count = gets.chomp.to_i
count.times do
 mean += gets.chomp.to_i
end
puts mean.fdiv(count)
answered Jan 17, 2017 at 23:46
\$\endgroup\$
1
  • \$\begingroup\$ All good thoughts. I got so bogged down in the forest I forgot the trees. \$\endgroup\$ Commented Jan 19, 2017 at 16:38

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.