Function to find all occurrences of substring

Question 1

This function returns a list of all the beginning indices of a substring in a string. After finding the index of a substring, the search for the next one begins just after this index.

def find_substring(substring, string):
 """ 
 Returns list of indices where substring begins in string
 >>> find_substring('me', "The cat says meow, meow")
 [13, 19]
 """
 indices = []
 index = -1 # Begin at -1 so index + 1 is 0
 while True:
 # Find next index of substring, by starting search from index + 1
 index = string.find(substring, index + 1)
 if index == -1: 
 break # All occurrences have been found
 indices.append(index)
 return indices

Question 2

This seems quite reasonable when you are not super-concerned with performance and are searching for small patterns. Otherwise, it's best to go to more advanced algorithms like Knuth-Morris-Pratt or Boyer-Moore. These work by preprocessing the search pattern, so you can use results from previous search failures to skip searching from several positions.

Question 3

I would turn this function into a generator so that an iteration over its values does not unnecessarily build a list into memory. If the caller trully needs a list, they can still call list(find_substring(...)). I would also rename this function substring_indexes as I feel it better convey what this function is about; also index could be something like last_known_position or last_found. But naming is hard and I might be wrong on this one.

def substring_indexes(substring, string):
 """ 
 Generate indices of where substring begins in string
 >>> list(find_substring('me', "The cat says meow, meow"))
 [13, 19]
 """
 last_found = -1 # Begin at -1 so the next position to search from is 0
 while True:
 # Find next index of substring, by starting after its last known position
 last_found = string.find(substring, last_found + 1)
 if last_found == -1: 
 break # All occurrences have been found
 yield last_found

Question 4

The example in the docstring should illustrate that the returned indexes may overlap, since that is not apparent from the function name, and since other find_all functions behave differently.

Also, don't mix single and double quotes unless you have good reason. The example should therefore be

substring_indexes("ana", "Canadian banana")

Question 5

I got the idea to mix the single and double quotes from stackoverflow.com/questions/56011/…. The top answer used single quotes for single words, and double quotes for sentences.

Question 6

Ah, ok. But still, both the string and the substring are then of the same category, therefore I think they should use the same quotes. After reading that answer, I switched to using double quotes, thanks for the hint.

score 14 · Accepted Answer · 2016-11-12 12:41:45Z

I would turn this function into a generator so that an iteration over its values does not unnecessarily build a list into memory. If the caller trully needs a list, they can still call list(find_substring(...)). I would also rename this function substring_indexes as I feel it better convey what this function is about; also index could be something like last_known_position or last_found. But naming is hard and I might be wrong on this one.

def substring_indexes(substring, string):
 """ 
 Generate indices of where substring begins in string
 >>> list(find_substring('me', "The cat says meow, meow"))
 [13, 19]
 """
 last_found = -1 # Begin at -1 so the next position to search from is 0
 while True:
 # Find next index of substring, by starting after its last known position
 last_found = string.find(substring, last_found + 1)
 if last_found == -1: 
 break # All occurrences have been found
 yield last_found

Stack Exchange Network

Function to find all occurrences of substring

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Function to find all occurrences of substring

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions