4
$\begingroup$

From what I have come to understand, the best way to implement it is to use the suffix array $S$ of the string $w$ and its LCP-array (Longest Common Prefix) $L$.

The answer can be obtained by

$$ \sum_{i=1}^{|w|} \left( |S[i]| -L[i-1] \right).$$

What I don't get is how and why is this working?

I would be very grateful if someone explained this.

A.Schulz
12.3k1 gold badge43 silver badges64 bronze badges
asked Jul 7, 2013 at 19:30
$\endgroup$
1
  • 2
    $\begingroup$ What's LCP? What do you mean by "iteration of ...": iteration over what? Please take some time to edit your question carefully and make sure it is comprehensible and self-contained; right now it is a bit too "stream of consciousness" for me to follow. Also, I suggest that you edit the question to explain what you have tried, to answer your question on your own. Have you tried running the algorithm by hand on a few small examples? Where did you get this proposed algorithm from, and why do you think it is a correct solution? $\endgroup$ Commented Jul 7, 2013 at 20:14

1 Answer 1

3
$\begingroup$

Instead of a formal proof, I want to give some intuition behind the formula. The suffix array contains all the suffixes of the string $w$. A substring is nothing else than a prefix of a suffix. So if you count $\sum_i |S[i]|,ドル you will get all the substrings, but of course you overcount the number of different substrings.

Let's have a closer look. Assume the $S[i-1]=xyz$ and $S[i]=xyxyz$. By the above counting method the entry $S[i-1]$ counted the substrings $x,xy$ and $xyz$ and the entry $S[i]$ counted $x,xy,xyx,xyxy,xyxyz$. You will notice that since the prefixes of length 2 of both entries are the same we have double counted $x,xy$. But the length of the longest common prefix is stored in $L[i]$ so we subtract it to compensate for the overcounting.

rici
12.2k22 silver badges40 bronze badges
answered Jul 12, 2013 at 7:54
$\endgroup$
2
  • $\begingroup$ What is the formula to find the exact number of occurrences of a pattern from the SA and the LCP? $\endgroup$ Commented Apr 28, 2016 at 21:15
  • $\begingroup$ Count of distinct substrings of a string using Suffix Array geeksforgeeks.org/… $\endgroup$ Commented Jul 14, 2017 at 12:18

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.