2

I have many substrings(2-5 words each) which I would like to search in some text of about 40-50 words length. What is the most efficient way to flag matching substrings.

Currently I am simply using:

for substring in substrings:
 if substring in fullText:
 return True

substrings - the list of strings to be searched

fullText - text to be searched on.

Worst case for this solution is all substrings are searched on fullText repeatedly.

asked Apr 15, 2018 at 6:05
0

1 Answer 1

5

Create a regular expression from your list (something like "word1|word2|word3") and use the regular expression functions available for your language. It will hopefully create a data structure optimized for matching, maybe a finite state machine or something equivalent.

answered Apr 15, 2018 at 8:40
4
  • 1
    There's no "hopefully". Regular expressions are invariably converted to finite state machines. Commented Apr 15, 2018 at 8:46
  • I've seen incredibly naive and stupid solutions to problems where known good algorithms exist, so I'm extra careful with words, but of course you are correct. Commented Apr 15, 2018 at 9:38
  • For python on the test cases I ran, regular expression match worked slower compared to simple string search. Also wanted to mention, I am trying to search sentences within a sentence. Commented Apr 17, 2018 at 11:50
  • 1
    Compiling the regular expression consumes some time, so your results will depend on the number of patterns and the number of files. A simple string search might be faster for a single file, especially when you have an early match. To really evaluate the relative performance, measure realistic test cases. If your initial nested loop is fastest, you obviously don't need to optimize. Commented Apr 17, 2018 at 12:29

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.