I am currently working on an algorithm, which involves finding the intersection of two lists. That is, given the following inputs [1,2,2,1]
,[2,2]
the result would be [2,2]
.
I have implemented it in two ways which hopefully should be easy to understand
Method 1
def intersect(nums1, nums2):
dict_count = {}
for num1 in nums1:
if num1 not in dict_count:
dict_count[num1] = 1
else:
dict_count[num1] += 1
inter_arr = []
for num2 in nums2:
if num2 in dict_count and dict_count[num2] > 0:
inter_arr.append(num2)
dict_count[num2] -= 1
return inter_arr
Method 2
def intersect_v2(nums1, nums2):
arr_one = sorted(nums1)
arr_two = sorted(nums2)
p1, p2 = 0, 0
inter_arr = []
while p1 < len(arr_one) and p2 < len(arr_two):
if arr_one[p1] < arr_two[p2]:
p1 += 1
elif arr_one[p1] > arr_two[p2]:
p2 += 1
else:
inter_arr.append(arr_one[p1])
p1 += 1
p2 += 1
return inter_arr
I have two questions about time complexity:
- What if nums1's size is small compared to nums2's size? Which algorithm is better? (The original algorithm problem is from Leetcode and this is a follow up question to the problem)
It feels like the pointer method always win, but I can't articulate why.
- If the inputs always came in sorted, which algorithm would be more efficient with respect to time and space? (This would mean that first two lines to
intersect_v2
would not be needed)
From my understanding intersect
would be O(M+N) since you would have to iterate through the two arrays not matter what. The second one is a bit confusing since it feels like the it would be O(Z)
where Z is less than Max(M,N) and greater than Min(M,N), is that about right?
1 Answer 1
Blockquote
- What if nums1's size is small compared to nums2's size? Which algorithm is better? (The original algorithm problem is from Leetcode and this is a follow up question to the problem)
It feels like the pointer method always win, but I can't articulate why.
Granted, in many cases, the pointer method might be faster because it might break out of the while
after exhausting the elements of the smaller list without having to consider the vast majority of the elements of the larger list.
However, think of a pathological worst case where max(nums1) > max(nums2)
such as nums1 = [1_000_000]
and nums2 = list(range(100_000))
. The pointer approach as shown would have to examine all 100_000
elements of nums2
.
What are some ways you can mitigate this?
Taking advantage of the fact that nums1
and nums2
are sorted, you can get the min
and max
of each list in O(1)
and use those values as additional conditions to break out of the while
loop earlier.
Given that the input is sorted, binary search might speed things up considerably. bisect.bisect_left
is python's own implementation of binary search. Given that sorted input, binary search would allow you to potentially skip over larges swaths of the input as follows:
while p1 < len(nums1) and p2 < len(nums2):
if nums1[p1] < nums2[p2]:
# advance p1 to an index s.t. nums1[p1] >= nums2[p2]
p1 = bisect.bisect_left(nums1, nums2[p2])
elif nums1[p1] > nums2[p2]:
# advance p2 to an index s.t. nums2[p2] >= nums1[p1]
p2 = bisect.bisect_left(nums2, nums1[p1])
else:
inter_arr.append(arr_one[p1])
p1 += 1
p2 += 1
P.S.:
bisect_left
has two optional keyword arguments, lo
and hi
which might be useful here as well.
-
3\$\begingroup\$ I believe
min()
andmax()
are O(n), not O(1). But they run at "C" speed, not Python speed. \$\endgroup\$RootTwo– RootTwo2021年04月01日 00:18:41 +00:00Commented Apr 1, 2021 at 0:18 -
\$\begingroup\$ In general, yes, but I think we were assuming the input was sorted to begin with, in which case,
min
andmax
can be found inO(1)
asnums[0]
andnums[-1]
respectively. \$\endgroup\$joseville– joseville2021年04月13日 15:37:36 +00:00Commented Apr 13, 2021 at 15:37
(Counter(a) & Counter(b)).elements()
solves the problem.) \$\endgroup\$