0

In Daily Coding Problem, Miller&Wu have the following problem : Given an array of integers, return a new array where each element in the new array is number of smaller elements to the right of that element in the original input array.

Their suggested solution is (in python):

import bisect
def smaller_counts(lst):
 result = []
 seen = []
 for num in reversed(lst):
 i = bisect.bisect_left(seen, num)
 resul.append(i)
 bisect.insort(seen, num)
 return list(reversed(result))

While the algorithm is correct, the authors then claim that it takes O(n log(n)), which seems wrong to me:I'd expect the call to insort to be a O(num), since we're inserting in an array and we need to move all the elements after num to the right of that array. That's also confirmed by python doc.

As a result, I think it's a total cost of O(n^2).

Am I missing something, or did the authors make a mistake here (probably assuming that the insort would have the same log(num) cost as the bisect_left) ?

I also thought of posting this on code golf, but wasn't sure it belonged there.

asked Nov 23, 2021 at 17:13

2 Answers 2

6

Your reasoning seems correct. Making n iterations over an O(n) operation means O(n2) total complexity in general. Whether the insertion actually has O(n) time complexity depends on the data distribution. For example, if we were to insert elements in ascending order (so that new elements are only appended, without shifting existing values) then it could be as cheap as O(1). However, this scenario does not make it possible to make any assumptions about the data distribution.

Note that an O(n log n) solution is possible if seen is an ordered set data structure with O(log n) insertion, such as a balanced binary tree. Python has no suitable data structure in the standard library. The heapq module is similar but cannot provide the necessary operations here. A possible manual implementation might look like this:

from dataclasses import dataclass
from typing import Optional, Tuple
def smaller_counts(lst: list[int]) -> list[int]:
 result = []
 seen = None
 for num in reversed(lst):
 location, seen = treeset_insert_and_get_location(seen, num)
 result.append(location)
 return list(reversed(result))
@dataclass
class Node:
 value: int
 size: int
 left: 'Optional[Node]' = None
 right: 'Optional[Node]' = None
 @property
 def left_size(self) -> int:
 if self.left is not None:
 return self.left.size
 return 0
 @property
 def right_size(self) -> int:
 if self.right is not None:
 return self.right.size
 return 0
def treeset_insert_and_get_location(
 tree: Optional[Node], value: int,
) -> Tuple[int, Node]:
 if tree is None:
 return 0, Node(value=value, size=1)
 if value < tree.value:
 location, tree.left = treeset_insert_and_get_location(
 tree.left, value)
 tree.size += 1
 return location, tree
 if value > tree.value:
 location, tree.right = treeset_insert_and_get_location(
 tree.right, value)
 tree.size += 1
 return tree.size - tree.right_size + location, tree
 assert tree.value == value
 tree.size += 1
 return tree.left_size, tree

full code incl. tests

In practice, the presented solution in the question is still going to be reasonably efficient at smaller input sizes. The O(n) insertion is the kind of operation that modern computers are fairly good at. I'd expect that the presented solution, or even a naive O(n2) solution, would outperform any tree-based data structure until about a few hundred or a few thousand elements.

answered Nov 23, 2021 at 17:57
2
  • The problem is that we want to position at which we insert num, which won't be returned for any data structure with a log(n) insertion cost that I know of. So I don't think there's even any algorithm that works in O(n*log(n)) for the whole problem. Commented Nov 23, 2021 at 19:37
  • 1
    @molyss I've updated the answer to include a crude implementation of a tree-set which can query/insert with O(log n) typical complexity. However, this tree is not self-balancing so that it will degrade to O(n) when dealing with sorted input. This style of tree can efficiently determine the ordered location because each node tracks its size. Commented Nov 23, 2021 at 20:58
0

An obvious solution is sorting a copy of the original array while keeping track of the location of each item in the original array, and then all the information to calculate the result is readily available. Can be done in O(n log n).

Experimentally you can use their algorithm for an array with 1,000, 10,000, 100,000 and a million random items. Execution time should grow by a factor less than 15 each time if it is n log n, and a factor 100 if quadratic.

answered Nov 26, 2021 at 13:46
1
  • 1
    I think that’d be an n^2 solution. The sorting would be O(n log(n)), but then you’d have to iterate over the sorted array and count how many on their right have an original position greater than the current. Sorting the array would do nothing here, unless I’ again missing something Commented Nov 27, 2021 at 5:12

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.