This function returns a list of all the beginning indices of a substring in a string. After finding the index of a substring, the search for the next one begins just after this index.
def find_substring(substring, string):
"""
Returns list of indices where substring begins in string
>>> find_substring('me', "The cat says meow, meow")
[13, 19]
"""
indices = []
index = -1 # Begin at -1 so index + 1 is 0
while True:
# Find next index of substring, by starting search from index + 1
index = string.find(substring, index + 1)
if index == -1:
break # All occurrences have been found
indices.append(index)
return indices
-
1\$\begingroup\$ This seems quite reasonable when you are not super-concerned with performance and are searching for small patterns. Otherwise, it's best to go to more advanced algorithms like Knuth-Morris-Pratt or Boyer-Moore. These work by preprocessing the search pattern, so you can use results from previous search failures to skip searching from several positions. \$\endgroup\$dr jimbob– dr jimbob2016年11月13日 02:21:01 +00:00Commented Nov 13, 2016 at 2:21
2 Answers 2
I would turn this function into a generator so that an iteration over its values does not unnecessarily build a list into memory. If the caller trully needs a list, they can still call list(find_substring(...))
. I would also rename this function substring_indexes
as I feel it better convey what this function is about; also index
could be something like last_known_position
or last_found
. But naming is hard and I might be wrong on this one.
def substring_indexes(substring, string):
"""
Generate indices of where substring begins in string
>>> list(find_substring('me', "The cat says meow, meow"))
[13, 19]
"""
last_found = -1 # Begin at -1 so the next position to search from is 0
while True:
# Find next index of substring, by starting after its last known position
last_found = string.find(substring, last_found + 1)
if last_found == -1:
break # All occurrences have been found
yield last_found
The example in the docstring should illustrate that the returned indexes may overlap, since that is not apparent from the function name, and since other find_all
functions behave differently.
Also, don't mix single and double quotes unless you have good reason. The example should therefore be
substring_indexes("ana", "Canadian banana")
-
\$\begingroup\$ I got the idea to mix the single and double quotes from stackoverflow.com/questions/56011/…. The top answer used single quotes for single words, and double quotes for sentences. \$\endgroup\$Vermillion– Vermillion2016年11月12日 13:00:05 +00:00Commented Nov 12, 2016 at 13:00
-
2\$\begingroup\$ Ah, ok. But still, both the string and the substring are then of the same category, therefore I think they should use the same quotes. After reading that answer, I switched to using double quotes, thanks for the hint. \$\endgroup\$Roland Illig– Roland Illig2016年11月12日 13:08:27 +00:00Commented Nov 12, 2016 at 13:08