Algorithm Improvement - element differences in list

Question 1

The task is, you have an array of n numbers called nums, and an array of m numbers called maxes.

For each element in maxes, find how many members of nums are smaller than or equal to that element.

Example

nums = [4, 2, 5, 10, 8]
maxes = [3, 1, 7, 8]
output = [1, 0, 3, 4]

The first element, 3, in maxes, is higher than only 2 in nums, hence the first element in output is 1.

The initial solution I could think of was trivial, and it scales as O(n2):

def sol_naive(nums,maxes):
 output = []
 for entry in maxes:
 higher_numbers = len([x for x in nums if x <= entry])
 output.append(higher_numbers)
 return output

I'm wondering if there's a better approach to solve this problem.

Question 2

One simple observation is that this problem gets much easier if you sort nums O(n log(n). Then you can use bisect_left which uses binary search to find where an element would go in a sorted list and is O(log(n)) for each element in m. This yields a total time of O((n+m) log(n))

from bisect import bisect_left
def sol_sorted(nums,maxes):
 output = []
 nums.sort()
 for m in maxes:
 ind = bisect_left(nums, m)
 output.append(len(nums) - ind)
 return output

Question 3

You might be able to do this slightly faster involving sorting maxes, but then you have to futz around a bunch with keeping output ordering, and id didn't seem worth it to me.

Question 4

Very nice! thank you for making me a little smarter today

Question 5

If you sort maxes as well, you can get your time to O(n log(n) + m log(m)) , which benefits us the larger the length difference is between n & m. so if you want to do len(n) == 10000, len(m) == 10, it makes more sense to sort each once and then essentially use the combine step of merge-sort to find the indexes all in one go (at O(n + m)), as opposed to doing separate bisections for each max.

Question 6

Yeah, that's what I was saying. The downside is that you have to keep track of the original order of maxes so output ends up in the right order. It's not algorithmically slower, but it requires more python, and adds a fair amount of overhead. I would be interested to see if it ever ends up faster in practice.

Question 7

The base idea looks solid. Using bisect_left() instead of bisect_right() ruins it, as does subtracting the index found from the length of nums.

Question 8

You could turn your for-loop into a list comprehension:

nums = [4, 2, 5, 10, 8]
maxes = [3, 1, 7, 8]
output = [len([x for x in nums if x <= entry]) for entry in maxes]

Question 9

There are two promising alternatives to ordering nums and finding the index of the smallest element therein larger than each element of maxes:

order maxes (ascendingly) and
tally elements num in nums (ordered[m-1] <)
num <= ordered[m] and accumulate (finally reordering output) - O((n+m) log(m))

order both and take turns finding, for an element in one sequence, the first exceeding it in the other, continuing with the successor thereof
Looks promising with len(nums) and len(maxes) in the same ballpark.

finger exercise:

from bisect import bisect_right
def oscar(nums, maxes):
 """ Return an iterable containing for each element m in maxes 
 the number of elements in nums no larger than m. """
 ordered = sorted(nums)
 return (bisect_right(ordered, m) for m in maxes)
# for a list, I'd have to decide whether to make it immutable - and how
_empty = ()
# tries to reduce the ranges searched
def smith(nums, maxes):
 """ Return an iterable containing for each element m in maxes 
 the number of elements in nums no larger than m. """
 if not maxes or not nums:
 return _empty
 output = list()
 ordered = sorted(nums)
 previous = ordered[0]
 index = bisect_right(nums, previous)
 for m in maxes:
 if previous < m:
 index = bisect_right(nums, m, index)
 elif m < previous:
 index = bisect_right(nums, m, 0, index)
 output.append(index)
 previous = m
 return output

Oscar Smith Oscar Smith 3,71718 silver badges31 bronze badges · Accepted Answer · 2017-10-19 15:56:19Z

5

\$\begingroup\$

One simple observation is that this problem gets much easier if you sort nums O(n log(n). Then you can use bisect_left which uses binary search to find where an element would go in a sorted list and is O(log(n)) for each element in m. This yields a total time of O((n+m) log(n))

from bisect import bisect_left
def sol_sorted(nums,maxes):
 output = []
 nums.sort()
 for m in maxes:
 ind = bisect_left(nums, m)
 output.append(len(nums) - ind)
 return output

Share

answered Oct 19, 2017 at 15:56

Oscar Smith's user avatar

Oscar Smith Oscar Smith

3,71718 silver badges31 bronze badges

\$\endgroup\$

5

1

\$\begingroup\$ You might be able to do this slightly faster involving sorting maxes, but then you have to futz around a bunch with keeping output ordering, and id didn't seem worth it to me. \$\endgroup\$

Oscar Smith
– Oscar Smith

2017年10月19日 16:14:04 +00:00
Commented Oct 19, 2017 at 16:14
\$\begingroup\$ Very nice! thank you for making me a little smarter today \$\endgroup\$

Wboy
– Wboy

2017年10月19日 16:47:29 +00:00
Commented Oct 19, 2017 at 16:47
\$\begingroup\$ If you sort maxes as well, you can get your time to O(n log(n) + m log(m)) , which benefits us the larger the length difference is between n & m. so if you want to do len(n) == 10000, len(m) == 10, it makes more sense to sort each once and then essentially use the combine step of merge-sort to find the indexes all in one go (at O(n + m)), as opposed to doing separate bisections for each max. \$\endgroup\$

TemporalWolf
– TemporalWolf

2017年10月19日 20:08:20 +00:00
Commented Oct 19, 2017 at 20:08
\$\begingroup\$ Yeah, that's what I was saying. The downside is that you have to keep track of the original order of maxes so output ends up in the right order. It's not algorithmically slower, but it requires more python, and adds a fair amount of overhead. I would be interested to see if it ever ends up faster in practice. \$\endgroup\$

Oscar Smith
– Oscar Smith

2017年10月19日 20:52:41 +00:00
Commented Oct 19, 2017 at 20:52
\$\begingroup\$ The base idea looks solid. Using bisect_left() instead of bisect_right() ruins it, as does subtracting the index found from the length of nums. \$\endgroup\$

greybeard
– greybeard

2021年11月24日 06:38:59 +00:00
Commented Nov 24, 2021 at 6:38

Add a comment |

Stack Exchange Network

Algorithm Improvement - element differences in list

Example

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Algorithm Improvement - element differences in list

Example

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions