\$\begingroup\$
\$\endgroup\$
1
Here is a problem I am trying to solve: trying to find missing photographs from a sequence of filenames. The problem boils down to: given an unsorted list of integers, return a sorted list of missing numbers. Below is my code.
What I am looking for are:
- Are there more efficient algorithms?
- Is there any performance problem if the list is large (tens of thousands)
Corner cases which does not work?
def find_missing_items(int_list): ''' Finds missing integer within an unsorted list and return a list of missing items >>> find_missing_items([1, 2, 5, 6, 7, 10]) [3, 4, 8, 9] >>> find_missing_items([3, 1, 2]) [] ''' # Put the list in a set, find smallest and largest items original_set = set(int_list) smallest_item = min(original_set) largest_item = max(original_set) # Create a super set of all items from smallest to largest full_set = set(xrange(smallest_item, largest_item + 1)) # Missing items are the ones that are in the full_set, but not in # the original_set return sorted(list(full_set - original_set)) if __name__ == '__main__': import doctest doctest.testmod()
asked Mar 29, 2013 at 23:39
1 Answer 1
\$\begingroup\$
\$\endgroup\$
1
For this
full_set = set(xrange(smallest_item, largest_item + 1))
I'd suggest inlining the min and max:
full_set = set(xrange(min(original), max(original) + 1))
You don't need to convert to a list before using sorted so:
sorted(list(full_set - original_set))
Can be written as:
sorted(full_set - original_set)
answered Mar 29, 2013 at 23:57
-
\$\begingroup\$ Beautiful. I especially like the last one: sorted() without list(). \$\endgroup\$Hai Vu– Hai Vu2013年03月30日 00:00:30 +00:00Commented Mar 30, 2013 at 0:00
lang-py
find_missing_items([-20000,0,20000])
returned correctly all 39998 items in less than 2s running on a old dual-core. \$\endgroup\$