From what I have come to understand, the best way to implement it is to use the suffix array $S$ of the string $w$ and its LCP-array (Longest Common Prefix) $L$.
The answer can be obtained by
$$ \sum_{i=1}^{|w|} \left( |S[i]| -L[i-1] \right).$$
What I don't get is how and why is this working?
I would be very grateful if someone explained this.
-
2$\begingroup$ What's LCP? What do you mean by "iteration of ...": iteration over what? Please take some time to edit your question carefully and make sure it is comprehensible and self-contained; right now it is a bit too "stream of consciousness" for me to follow. Also, I suggest that you edit the question to explain what you have tried, to answer your question on your own. Have you tried running the algorithm by hand on a few small examples? Where did you get this proposed algorithm from, and why do you think it is a correct solution? $\endgroup$D.W.– D.W. ♦2013年07月07日 20:14:51 +00:00Commented Jul 7, 2013 at 20:14
1 Answer 1
Instead of a formal proof, I want to give some intuition behind the formula. The suffix array contains all the suffixes of the string $w$. A substring is nothing else than a prefix of a suffix. So if you count $\sum_i |S[i]|,ドル you will get all the substrings, but of course you overcount the number of different substrings.
Let's have a closer look. Assume the $S[i-1]=xyz$ and $S[i]=xyxyz$. By the above counting method the entry $S[i-1]$ counted the substrings $x,xy$ and $xyz$ and the entry $S[i]$ counted $x,xy,xyx,xyxy,xyxyz$. You will notice that since the prefixes of length 2 of both entries are the same we have double counted $x,xy$. But the length of the longest common prefix is stored in $L[i]$ so we subtract it to compensate for the overcounting.
-
$\begingroup$ What is the formula to find the exact number of occurrences of a pattern from the SA and the LCP? $\endgroup$curious– curious2016年04月28日 21:15:14 +00:00Commented Apr 28, 2016 at 21:15
-
$\begingroup$ Count of distinct substrings of a string using Suffix Array geeksforgeeks.org/… $\endgroup$mosh– mosh2017年07月14日 12:18:46 +00:00Commented Jul 14, 2017 at 12:18
Explore related questions
See similar questions with these tags.