Given a string length $S$, find a substring length $k$ that has the most occurrences in the given string.
We want $O(S)$ time complexity in an average case.
I think the solution lies in sophisticated use of Rabin-Karp algorithm.
I've tried to use the sliding window method, but that results in $O(S * k)$ time complexity, because when inserting into the hash table we have to actually compare the substrings in $O(k)$ time.
So how can I improve it to the desired time complexity?
1 Answer 1
Each substring occurrence of your length $k$ pattern is the suffix of a of a substring. As such, a suffix tree will provide a fast way to store all possible suffixes and - in particular in your case - all substrings of length $k$, as well as provide a way to detect when you are reading a $k$-substring that has been seen before.
In your particular case of only caring about length $k$-substrings, you can improve the suffix tree to only keep nodes up to depth $k$ as you don't need to go beyond that.
Suffix trees give $\Theta(n)$ runtimes for a lot of String-processing problems. See a list of such problems on the wiki page: https://en.wikipedia.org/wiki/Suffix_tree
Explore related questions
See similar questions with these tags.