I'm attempting this simple HackerRank counting sort problem but testcase #3, which has 100,000 test cases, is timing out.
Is there something particularly inefficient about this code?
input = []
$stdin.each do |line|
input << line
end
input.shift
arr = input.map { |line| line.split(' ')[0].to_i }
answer = (0..99).map do |i|
counter = 0
arr.uniq.each do |j|
counter += arr.count(j) if j <= i
end
counter
end
puts answer.join ' '
3 Answers 3
arr.count(j)
scans the array, so it takes linear time. Doing it once for each element of arr.uniq
takes quadratic time. Doing it 100 times doesn't help either. :)
The arr.uniq.each
loop is unnecessary, though (as is the call to uniq
). The problem is to count the entries less than i
, and you can express that directly:
counter = arr.count {|n| n <= i}
This takes linear time, so it should be much faster.
For another speedup, you can solve the whole problem with only one pass over the input, not 100. As the problem description hints, you'll need an auxiliary array to store the counts.
By the way, the first four lines can be replaced by one, using one of Ruby's convenient utilities:
input = $stdin.readlines
The whole program can be simplified to two lines.
The performance issues have already been discussed by other users, so I'll just show an alternative way of tackling the problem. When dealing with mathematical-related problems, a functional approach is usually the way to go. That means identifying the abstractions, implementing them and leaving the code as declarative as possible. Chances are all those abstractions will prove useful somewhere else. That's how I'd write it:
nlines = $stdin.readline.to_i
numbers = $stdin.lines.take(nlines).map { |line| line.split.first.to_i }
cumulative_sums = numbers.frequency.values_at(*(0..99)).accumulate(0, &:+)
puts(cumulative_sums.join(" "))
But where did these abstractions come from? let's implement them:
module Enumerable
def accumulate(initial_value)
acc = initial_value
map { |x| acc = yield(acc, x) }
end
def frequency
Hash.new(0).tap { |f| each { |v| f[v] += 1 } }
end
end
Note that now, the non-generic code that is really doing something interesting is a one-liner (numbers.frequency.values_at(*(0..99)).accumulate(0, &:+)
). Also, it's O(n) space/time.
You've written a straightforward solution, but botched the implementation. It takes O(m + n) space to store n items of input
and distinct values up to m=100. It takes O(m n) time to do arr.uniq
in a loop, and O(m n2) when including the count()
operation. arr.uniq
does not vary, and therefore should have been lifted out of the loop, in which case it would be O(n) to do arr.uniq
plus O(m) to build the answer
array. Moving arr.uniq
out of the loop and doing a smarter count()
might be enough to bring performance down to a reasonable level.
However, your performance challenge is for the case where \$n = 100000\,ドル and \$m = 100\$. It would be nice to be able to take advantage of the fact that \$n \gg m\$. While \$O(n)\$ time is the theoretical minimum, \$O(n)\$ space could be improved upon.
An alternate solution is to use a data structure known as a binary indexed tree, also known as a Fenwick tree. The tree can be represented as an array of \$m\$ elements. The tradeoff is that it takes \$\log m\$ time to process each data item. When \$n \gg m\,ドル you can pretend that \$m\$ is a largish constant.
$$ \newcommand{smallm}[0]{\overset{n\ \gg\ m}{\longrightarrow}} \begin{array}{|l|c|c|} \hline \\ & \textrm{Straightforward approach, done well} & \textrm{Fenwick tree} \\ \hline \\ \textrm{Space} & O(m + n) \smallm O(n) & O(m) \smallm O(1) \\ \hline \\ \textrm{Time} & O(m + n) \smallm O(n) & O((m + n) \log m) \smallm O(n) \\ \hline \end{array}$$
class FenwickTree
def initialize(limit)
@arr = [0] * limit
end
# O(log m)
def incr(item, count=1)
loop do
break if item >= @arr.length
@arr[item] += count
item |= item + 1
end
end
# O(log m)
def count_le(item)
count = 0
while item >= 0
count += @arr[item]
item = (item & (item + 1)) - 1
end
count
end
# O(m log m)
def report
([email protected]).map { |i| count_le(i) }
end
end
f = FenwickTree.new(100) # O(m)
gets.to_i.times do # O(n log m)
f.incr(gets.to_i)
end
puts f.report.join(' ') # O(m log m)
-
\$\begingroup\$ This answer is really interesting. Thanks! I'm afraid I'm not the right person to evaluate it, so I've accepted @Anonymous's answer. I will definitely spend some time thinking about this though :) \$\endgroup\$niftygrifty– niftygrifty2014年05月14日 11:43:37 +00:00Commented May 14, 2014 at 11:43
-
2\$\begingroup\$ The straightforward solution (
counts = Array.new(100, 0); $stdin.each{|line| counts[line.to_i] += 1}; puts (0..99).map{|i| counts.take(i).reduce(:+)}.join(' ')
) is \$O(m)\$ space. There's no need for a tree. \$\endgroup\$Anonymous– Anonymous2014年05月14日 16:12:56 +00:00Commented May 14, 2014 at 16:12
Explore related questions
See similar questions with these tags.