I've written the following code to find the connected components of a graph in ruby, but it is quite slow since I don't use any sensible data structure for the disjoint sets, and hence find_set is probably \$O(n^2)\$ (don't know the complexity for enumerable.find in Ruby).
Can anybody help me improve it? I am relatively new to ruby, and so would appreciate any insights. This code is meant to solve the problem posed Here:
My solution, in plain English, is the find the number of connected components in the graph given, then either calculate the cost of building a library in every node or the cost of building a library in each connected component plus the cost of building the roads for the minimum spanning tree of each component.
The Code
#!/bin/ruby
require 'set'
def roadsAndLibraries(n, c_lib, c_road, cities)
vertices = make_set((1..n).to_a)
#find connected components
cities.each do |a, b|
first_set = find_set(vertices, a)
second_set = find_set(vertices, b)
if first_set != second_set
vertices.add(first_set | second_set)
vertices.delete(first_set)
vertices.delete(second_set)
end
end
#calculate the cost of building the edges required for a MST of each component
roads_cost = c_road * vertices.to_a.map{|x| x.size - 1}.inject(:+)
if c_road > c_lib
return n * c_lib
else
return vertices.size * c_lib + roads_cost
end
end
#takes an array of integers, and maps them into a set of singleton sets containing each node number
def make_set(arr_n)
vertices = Set.new
for i in 1..arr_n.size
vertices.add([i])
end
vertices.map!{|x| x.to_set}
end
#finds set containing a given node value
def find_set(set, node)
set.find{|x| x.include?(node)}
end
# q is number of queries, n is the number of vertices in the graph
q = gets.strip.to_i
for a0 in (0..q-1)
n, m, c_lib, c_road = gets.strip.split(' ')
n = n.to_i
m = m.to_i
c_lib = c_lib.to_i
c_road = c_road.to_i
cities = Array.new(m)
for cities_i in (0..m-1)
cities_t = gets.strip
cities[cities_i] = cities_t.split(' ').map(&:to_i)
end
result = roadsAndLibraries(n, c_lib, c_road, cities)
puts result
end
1 Answer 1
It Your solution being slow is not too much a matter of language here. I will try highlighting ruby related things you could improve to make it more readable on one hand, and algorithmic flaws on the other hand.
Ruby
Here I will point out some of the ruby code that could be improved. Note that this will not help your solution to be much faster since it is mostly an algorithmic problem you have here.
Find Set
Your find_set
method seems fine. It is a O(n)
method. You have to go through every set to find
the one which includes the node you want. The #include?
method of a set is constant in time, and
that is a good use of set here.
Make Set
Your make_set
method could be made shorter and cleaner. For instance:
def make_set(n)
Set.new((1..n).map { |i| Set.new([i]) })
end
Map + Inject
I would not recommand using #map
and then #inject
as it will create an intermediate array, and
because you can have all in the #inject
, like:
vertices.inject(0) { |sum, set| set.size + 1 + sum }
And by the way, you can get rid of to_a
since sets accept the #inject
and #map
methods too.
Method name
You should use snake case: roadsAndLibraries
becomes roads_and_libraries
.
Algorithmic enhancements
Unneeded computation
Itseems to me you don't need any computation if the road is more expensive than the library.
You can move this to the top of your roadsAndLibraries
method.
return n * c_lib if c_road > c_lib
Sets lookup
The main issue with your solution here is probably the lookup you do for each edge. You parse the
whole set of vertices for each city. This is expensive. You could try having a constant lookup
here that would help getting the connected component of a vertice on O(1)
.
You can use 2 arrays here. One for storing the connected components, like you already do. But
instead of storing them in a set, you can store them in an array, so they have an associated index.
Let's call it components
.
And then the second array can be the lookup, we will call it component_index
. For each node, you
store the index of its connected component. That way, all find_set
has to do is to take the
stored index: components[component_index[a]]
for city a
.
Your main loop could look like:
cities.each do |a, b|
first_set = component_index[a]
second_set = component_index[b]
next if first_set == second_set
merge_components(component_index, components, first_set, second_set)
end
Merging components
Where the algorithm will spend time is when we merge components together. Here you will have to
update component indexes. What you have to see here is that it will be performed much less often
than the lookup, especially in case of strongly connected components, where first_set
and
second_set
of the main loop will be equal.
To merge 2 components, I would put the nodes of the smallest component into the largest one, to avoid updating too many indexes. And then I would update the indexed component so that it contains all nodes.
def merge_components(component_index, components, first_set, second_set)
to = first_set
from = second_set
to, from = from, to if components[to].size < components[from].size
components[from].each do |i|
component_index[i] = to
end
components[to].merge components[from]
end
In case memory is a problem, you probably don't need to store the sets themselves, but only their size, since you know that they will never overlap, and that the size of the merged set is always the sum of the sizes of the 2 sets. But by doing so you have to loop through all nodes when updating components, which can be much longer.
Building Cost
Last thing to change: how we compute the cost of the solution. We will need the final connected components. This can be extracted like:
final_component_index = component_index.uniq
Then the MST cost would be
roads_cost = c_road * final_component_index.inject(0) { |sum, x| components[x].size - 1 + sum }
And the final cost
final_component_index.size * c_lib + roads_cost