Simon algorithm ESMAJ Morris-Pratt algorithm Contents
Next: Simon algorithm Up: ESMAJ Prev: Morris-Pratt algorithm

Knuth-Morris-Pratt algorithm


Main features
Description

The design of the Knuth-Morris-Pratt algorithm follows a tight analysis of the Morris and Pratt algorithm. Let us look more closely at the Morris-Pratt algorithm. It is possible to improve the length of the shifts.

Consider an attempt at a left position j, that is when the the window is positioned on the text factor y[j .. j+m-1]. Assume that the first mismatch occurs between x[i] and y[i+j] with 0 < i < m. Then, x[0 .. i-1] = y[j .. i+j-1] =u and a = x[i] neq y[i+j]=b.

When shifting, it is reasonable to expect that a prefix v of the pattern matches some suffix of the portion u of the text. Moreover, if we want to avoid another immediate mismatch, the character following the prefix v in the pattern must be different from a. The longest such prefix v is called the tagged border of u (it occurs at both ends of u followed by different characters in x).

This introduces the notation: let kmpNext[i] be the length of the longest border of x[0 .. i-1] followed by a character c different from x[i] and -1 if no such tagged border exits, for 0 < i leq m. Then, after a shift, the comparisons can resume between characters x[kmpNext[i]] and y[i+j] without missing any occurrence of x in y, and avoiding a backtrack on the text (see figure 7.1). The value of kmpNext[0] is set to -1.

figure 7.1
Figure 7.1: Shift in the Knuth-Morris-Pratt algorithm (v border of u and c neq b).

The table kmpNext can be computed in O(m) space and time before the searching phase, applying the same searching algorithm to the pattern itself, as if x=y.

The searching phase can be performed in O(m+n) time. The Knuth-Morris-Pratt algorithm performs at most 2n-1 text character comparisons during the searching phase. The delay (maximal number of comparisons for a single text character) is bounded by logPhi(m) where Phi is the golden ratio ( golden ratio ).

The C code
/*
This program is an implementation of the KMP string matching algorithm from:
KNUTH D.E. MORRIS J.H., PRATT V.R., 1977, Fast pattern matching in strings,
SIAM Journal on Computing 6(1):323-350.
Authors: Christian Charras an Thierry Lecroq
This program is free software: you can redistribute it and/or modify
 it under the terms of the GNU General Public License as published by
 the Free Software Foundation, either version 3 of the License, or
 (at your option) any later version.
 This program is distributed in the hope that it will be useful,
 but WITHOUT ANY WARRANTY; without even the implied warranty of
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 GNU General Public License for more details.
 You should have received a copy of the GNU General Public License
 along with this program. If not, see .
*/
void preKmp(char *x, int m, int kmpNext[]) {
 int i, j;
 i = 0;
 j = kmpNext[0] = -1;
 while (i < m) {
 while (j > -1 && x[i] != x[j])
 j = kmpNext[j];
 i++;
 j++;
 if (x[i] == x[j])
 kmpNext[i] = kmpNext[j];
 else
 kmpNext[i] = j;
 }
}
void KMP(char *x, int m, char *y, int n) {
 int i, j, kmpNext[XSIZE];
 /* Preprocessing */
 preKmp(x, m, kmpNext);
 /* Searching */
 i = j = 0;
 while (j < n) {
 while (i > -1 && x[i] != y[j])
 i = kmpNext[i];
 i++;
 j++;
 if (i >= m) {
 OUTPUT(j - i);
 i = kmpNext[i];
 }
 }
}
The example

Preprocessing phase

Knuth-Morris-Pratt kmpNext table
The kmpNext table

Searching phase


References


Simon algorithm ESMAJ Morris-Pratt algorithm Contents
Next: Simon algorithm Up: ESMAJ Prev: Morris-Pratt algorithm

e-mails: {Christian.Charras, Thierry.Lecroq }@laposte.net
Tue Jan 14 15:03:31 MET 1997

AltStyle によって変換されたページ (->オリジナル) /