4
\$\begingroup\$

Pythonic way of expressing the simple problem:

Tell if the list needle is sublist of haystack


#!/usr/bin/env python3
def sublist (haystack, needle):
 def start ():
 i = iter(needle)
 return next(i), i
 try:
 n0, i = start()
 for h in haystack:
 if h == n0:
 n0 = next(i)
 else:
 n0, i = start()
 except StopIteration:
 return True
 return False
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Dec 14, 2012 at 22:54
\$\endgroup\$
2
  • \$\begingroup\$ So, what's the question? \$\endgroup\$ Commented Dec 15, 2012 at 3:27
  • \$\begingroup\$ @JeffVanzella: Fair point. That's "code review". The question is: "what do you think in general?". \$\endgroup\$ Commented Dec 15, 2012 at 12:30

1 Answer 1

5
\$\begingroup\$

1. Bug

Here's a case where your function fails:

>>> sublist([1, 1, 2], [1, 2])
False

this is because in the else: case you go back to the beginning of the needle, but you keep going forward in the haystack, possibly skipping over a match. In the test case, your function tries matching with the alignment shown below, which fails at the position marked X:

 X
haystack 1,1,2
needle 1,2

Then it starts over from the beginning of the needle, but keeps going forward in the haystack, thus missing the match:

 X
haystack 1,1,2
needle 1,2

So after a mismatch you need to go backward an appropriate distance in the haystack before starting over again from the beginning of the needle.

2. A better algorithm

It turns out to be better to start matching from the end of the needle. If this fails to match, we can skip forward several steps: possibly the whole length of the needle. For example, in this situation:

 X
haystack 1,2,3,4,6,1,2,3,4,5
needle 1,2,3,4,5

we can skip forward by the whole length of the needle (because 6 does not appear in the needle). The next alignment we need to try is this:

 O O O O O
haystack 1,2,3,4,6,1,2,3,4,5
needle 1,2,3,4,5

However, we can't always skip forward the whole length of the needle. The distance we can skip depends on the item that fails to match. In this situation:

 X
haystack 1,2,3,4,1,2,3,4,5
needle 1,2,3,4,5

we should skip forward by 4, to bring the 1s into alignment.

Making these ideas precise leads to the Boyer–Moore–Horspool algorithm:

def find(haystack, needle):
 """Return the index at which the sequence needle appears in the
 sequence haystack, or -1 if it is not found, using the Boyer-
 Moore-Horspool algorithm. The elements of needle and haystack must
 be hashable.
 >>> find([1, 1, 2], [1, 2])
 1
 """
 h = len(haystack)
 n = len(needle)
 skip = {needle[i]: n - i - 1 for i in range(n - 1)}
 i = n - 1
 while i < h:
 for j in range(n):
 if haystack[i - j] != needle[-j - 1]:
 i += skip.get(haystack[i], n)
 break
 else:
 return i - n + 1
 return -1
answered Dec 15, 2012 at 2:00
\$\endgroup\$
5
  • \$\begingroup\$ @Lattyware: Generally I'd agree that you should use iterators wherever it makes sense, but this is one of the exceptional cases where iterators don't make much sense. A pure-iterator search runs in O(nm) whereas Boyer–Moore-Horspool is O(n). (In the average case on random text.) Sometimes you just have to get your hands grubby. \$\endgroup\$ Commented Dec 15, 2012 at 2:11
  • \$\begingroup\$ Scratch all my arguments, it's impossible to do this correctly the way I was doing it. \$\endgroup\$ Commented Dec 15, 2012 at 2:29
  • \$\begingroup\$ Accepted. The source is strong with this one. \$\endgroup\$ Commented Dec 15, 2012 at 12:38
  • \$\begingroup\$ Does the dict comprehension overwrite previous values? If not you'll skip too far when you see e while searching for seven. I'd bet it does but am not sure. I do miss Python for its elegance. \$\endgroup\$ Commented Feb 6, 2014 at 3:14
  • \$\begingroup\$ Yes, later items in the dict comprehension override previous items, and so find('seven', 'even')1 as required. \$\endgroup\$ Commented Feb 6, 2014 at 10:17

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.