Intersection of two lists in Python

Question 1

I am currently working on an algorithm, which involves finding the intersection of two lists. That is, given the following inputs [1,2,2,1],[2,2] the result would be [2,2].

I have implemented it in two ways which hopefully should be easy to understand

Method 1

def intersect(nums1, nums2):
 dict_count = {}
 for num1 in nums1:
 if num1 not in dict_count:
 dict_count[num1] = 1
 else:
 dict_count[num1] += 1
 inter_arr = []
 for num2 in nums2:
 if num2 in dict_count and dict_count[num2] > 0:
 inter_arr.append(num2)
 dict_count[num2] -= 1
 return inter_arr

Method 2

def intersect_v2(nums1, nums2):
 arr_one = sorted(nums1)
 arr_two = sorted(nums2)
 p1, p2 = 0, 0
 inter_arr = []
 while p1 < len(arr_one) and p2 < len(arr_two):
 if arr_one[p1] < arr_two[p2]:
 p1 += 1
 elif arr_one[p1] > arr_two[p2]:
 p2 += 1
 else:
 inter_arr.append(arr_one[p1])
 p1 += 1
 p2 += 1
 return inter_arr

I have two questions about time complexity:

What if nums1's size is small compared to nums2's size? Which algorithm is better? (The original algorithm problem is from Leetcode and this is a follow up question to the problem)

It feels like the pointer method always win, but I can't articulate why.

If the inputs always came in sorted, which algorithm would be more efficient with respect to time and space? (This would mean that first two lines to intersect_v2 would not be needed)

From my understanding intersect would be O(M+N) since you would have to iterate through the two arrays not matter what. The second one is a bit confusing since it feels like the it would be O(Z) where Z is less than Max(M,N) and greater than Min(M,N), is that about right?

Question 2

What are the requirements on the order of the elements in the output? (If there are no requirements on the order, then (Counter(a) & Counter(b)).elements() solves the problem.)

Question 3

Related: stackoverflow.com/questions/40527584/… And here a Python implementation of the accepted answer: gist.github.com/fjsj/9c9f7f36cfd3205343e333d86778433c

Question 4

Blockquote

What if nums1's size is small compared to nums2's size? Which algorithm is better? (The original algorithm problem is from Leetcode and this is a follow up question to the problem)

It feels like the pointer method always win, but I can't articulate why.

Granted, in many cases, the pointer method might be faster because it might break out of the while after exhausting the elements of the smaller list without having to consider the vast majority of the elements of the larger list.

However, think of a pathological worst case where max(nums1) > max(nums2) such as nums1 = [1_000_000] and nums2 = list(range(100_000)). The pointer approach as shown would have to examine all 100_000 elements of nums2.

What are some ways you can mitigate this?

Taking advantage of the fact that nums1 and nums2 are sorted, you can get the min and max of each list in O(1) and use those values as additional conditions to break out of the while loop earlier.

Given that the input is sorted, binary search might speed things up considerably. bisect.bisect_left is python's own implementation of binary search. Given that sorted input, binary search would allow you to potentially skip over larges swaths of the input as follows:

while p1 < len(nums1) and p2 < len(nums2):
 if nums1[p1] < nums2[p2]:
 # advance p1 to an index s.t. nums1[p1] >= nums2[p2]
 p1 = bisect.bisect_left(nums1, nums2[p2])
 elif nums1[p1] > nums2[p2]:
 # advance p2 to an index s.t. nums2[p2] >= nums1[p1]
 p2 = bisect.bisect_left(nums2, nums1[p1])
 else:
 inter_arr.append(arr_one[p1])
 p1 += 1
 p2 += 1

P.S.: bisect_left has two optional keyword arguments, lo and hi which might be useful here as well.

Question 5

I believe min() and max() are O(n), not O(1). But they run at "C" speed, not Python speed.

Question 6

In general, yes, but I think we were assuming the input was sorted to begin with, in which case, min and max can be found in O(1) as nums[0] and nums[-1] respectively.

joseville joseville 5162 silver badges10 bronze badges · Answer 1 · 2021-03-31 23:17:52Z

Blockquote

What if nums1's size is small compared to nums2's size? Which algorithm is better? (The original algorithm problem is from Leetcode and this is a follow up question to the problem)

It feels like the pointer method always win, but I can't articulate why.

Granted, in many cases, the pointer method might be faster because it might break out of the while after exhausting the elements of the smaller list without having to consider the vast majority of the elements of the larger list.

However, think of a pathological worst case where max(nums1) > max(nums2) such as nums1 = [1_000_000] and nums2 = list(range(100_000)). The pointer approach as shown would have to examine all 100_000 elements of nums2.

What are some ways you can mitigate this?

Taking advantage of the fact that nums1 and nums2 are sorted, you can get the min and max of each list in O(1) and use those values as additional conditions to break out of the while loop earlier.

Given that the input is sorted, binary search might speed things up considerably. bisect.bisect_left is python's own implementation of binary search. Given that sorted input, binary search would allow you to potentially skip over larges swaths of the input as follows:

while p1 < len(nums1) and p2 < len(nums2):
 if nums1[p1] < nums2[p2]:
 # advance p1 to an index s.t. nums1[p1] >= nums2[p2]
 p1 = bisect.bisect_left(nums1, nums2[p2])
 elif nums1[p1] > nums2[p2]:
 # advance p2 to an index s.t. nums2[p2] >= nums1[p1]
 p2 = bisect.bisect_left(nums2, nums1[p1])
 else:
 inter_arr.append(arr_one[p1])
 p1 += 1
 p2 += 1

P.S.: bisect_left has two optional keyword arguments, lo and hi which might be useful here as well.

I believe min() and max() are O(n), not O(1). But they run at "C" speed, not Python speed.
In general, yes, but I think we were assuming the input was sorted to begin with, in which case, min and max can be found in O(1) as nums[0] and nums[-1] respectively.

Stack Exchange Network

Intersection of two lists in Python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions