Commit a13e70f

author

杨世超

committed

更新「后缀数组」相关内容

1 parent 7301ed3 commit a13e70fCopy full SHA for a13e70f

File tree

1 file changed

+136

-0

lines changed

docs/04_string
- 04_12_suffix_array.md

1 file changed

+136

-0

lines changed

`‎docs/04_string/04_12_suffix_array.md`

Lines changed: 136 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,136 @@`
	`1`	`+## 1. 后缀数组简介`
	`2`	`+`
	`3`	`+> 后缀数组(Suffix Array):是一种高效处理字符串后缀相关问题的数据结构。它将字符串的所有后缀按字典序排序,并记录每个后缀在原串中的起始位置,便于实现高效的子串查找、最长重复子串、最长公共子串等操作。`
	`4`	`+`
	`5`	`+后缀数组常与 LCP(Longest Common Prefix)数组配合使用,进一步提升字符串处理的效率。`
	`6`	`+`
	`7`	`+---`
	`8`	`+`
	`9`	`+## 2. 基本原理与定义`
	`10`	`+`
	`11`	`+给定一个长度为 $n$ 的字符串 $S,ドル其后缀数组 $SA$ 是一个长度为 $n$ 的整数数组,$SA[i]$ 表示 $S$ 的第 $i$ 小后缀在原串中的起始下标。`
	`12`	`+`
	`13`	`+例如:`
	`14`	`+`
	`15`	`+> $S = "banana"$`
	`16`	`+>`
	`17`	`+> $S$ 的所有后缀及其下标:`
	`18`	`+> - 0: banana`
	`19`	`+> - 1: anana`
	`20`	`+> - 2: nana`
	`21`	`+> - 3: ana`
	`22`	`+> - 4: na`
	`23`	`+> - 5: a`
	`24`	`+>`
	`25`	`+> 按字典序排序后:`
	`26`	`+> 1. a (5)`
	`27`	`+> 2. ana (3)`
	`28`	`+> 3. anana (1)`
	`29`	`+> 4. banana (0)`
	`30`	`+> 5. na (4)`
	`31`	`+> 6. nana (2)`
	`32`	`+>`
	`33`	`+> 所以 $SA = [5, 3, 1, 0, 4, 2]$`
	`34`	`+`
	`35`	`+---`
	`36`	`+`
	`37`	`+## 3. 后缀数组的构建方法`
	`38`	`+`
	`39`	`+后缀数组的构建有多种方法,常见的有:`
	`40`	`+`
	`41`	`+- 朴素排序法:直接生成所有后缀并排序,时间复杂度 $O(n^2 \log n),ドル适合短串。`
	`42`	`+- 倍增算法:利用基数排序思想,时间复杂度 $O(n \log n)$。`
	`43`	`+- DC3/Skew 算法:线性时间 $O(n)$ 构建,适合大数据量。`
	`44`	`+`
	`45`	`+### 3.1 朴素法(适合理解原理)`
	`46`	`+`
	`47`	+```python
	`48`	`+# 朴素法构建后缀数组`
	`49`	`+S = "banana"`
	`50`	`+suffixes = [(S[i:], i) for i in range(len(S))]`
	`51`	`+suffixes.sort()`
	`52`	`+SA = [idx for (suf, idx) in suffixes]`
	`53`	`+print(SA) # 输出: [5, 3, 1, 0, 4, 2]`
	`54`	+```
	`55`	`+`
	`56`	`+### 3.2 倍增算法(常用高效实现)`
	`57`	`+`
	`58`	+```python
	`59`	`+def build_suffix_array(s):`
	`60`	`+ n = len(s)`
	`61`	`+ k = 1`
	`62`	`+ rank = [ord(c) for c in s]`
	`63`	`+ tmp = [0] * n`
	`64`	`+ sa = list(range(n))`
	`65`	`+ while True:`
	`66`	`+ sa.sort(key=lambda x: (rank[x], rank[x + k] if x + k < n else -1))`
	`67`	`+ tmp[sa[0]] = 0`
	`68`	`+ for i in range(1, n):`
	`69`	`+ tmp[sa[i]] = tmp[sa[i-1]] + \`
	`70`	`+ ((rank[sa[i]] != rank[sa[i-1]]) or`
	`71`	`+ (rank[sa[i]+k] if sa[i]+k < n else -1) != (rank[sa[i-1]+k] if sa[i-1]+k < n else -1))`
	`72`	`+ rank = tmp[:]`
	`73`	`+ if rank[sa[-1]] == n-1:`
	`74`	`+ break`
	`75`	`+ k <<= 1`
	`76`	`+ return sa`
	`77`	`+`
	`78`	`+# 示例`
	`79`	`+S = "banana"`
	`80`	`+print(build_suffix_array(S)) # 输出: [5, 3, 1, 0, 4, 2]`
	`81`	+```
	`82`	`+`
	`83`	`+---`
	`84`	`+`
	`85`	`+## 4. LCP(最长公共前缀)数组`
	`86`	`+`
	`87`	`+> LCP 数组:LCP[i] 表示 $SA[i]$ 和 $SA[i-1]$ 所指向的两个后缀的最长公共前缀长度。`
	`88`	`+`
	`89`	`+LCP 数组常用于:`
	`90`	`+- 快速查找最长重复子串`
	`91`	`+- 计算不同子串个数`
	`92`	`+- 字符串压缩等`
	`93`	`+`
	`94`	`+### LCP 数组的构建`
	`95`	`+`
	`96`	+```python
	`97`	`+def build_lcp(s, sa):`
	`98`	`+ n = len(s)`
	`99`	`+ rank = [0] * n`
	`100`	`+ for i in range(n):`
	`101`	`+ rank[sa[i]] = i`
	`102`	`+ h = 0`
	`103`	`+ lcp = [0] * n`
	`104`	`+ for i in range(n):`
	`105`	`+ if rank[i] == 0:`
	`106`	`+ lcp[0] = 0`
	`107`	`+ else:`
	`108`	`+ j = sa[rank[i] - 1]`
	`109`	`+ while i + h < n and j + h < n and s[i + h] == s[j + h]:`
	`110`	`+ h += 1`
	`111`	`+ lcp[rank[i]] = h`
	`112`	`+ if h > 0:`
	`113`	`+ h -= 1`
	`114`	`+ return lcp`
	`115`	`+`
	`116`	`+# 示例`
	`117`	`+S = "banana"`
	`118`	`+SA = build_suffix_array(S)`
	`119`	`+LCP = build_lcp(S, SA)`
	`120`	`+print(LCP) # 输出: [0, 1, 3, 0, 0, 2]`
	`121`	+```
	`122`	`+`
	`123`	`+## 5. 算法复杂度分析`
	`124`	`+`
	`125`	`+- 朴素法:$O(n^2 \log n)$`
	`126`	`+- 倍增法:$O(n \log n)$`
	`127`	`+- DC3/Skew:$O(n)$`
	`128`	`+- LCP 构建:$O(n)$`
	`129`	`+`
	`130`	`+## 参考资料`
	`131`	`+`
	`132`	`+- 《算法竞赛进阶指南》—— 胡策`
	`133`	`+- 《算法竞赛入门经典》—— 刘汝佳`
	`134`	`+- [OI Wiki - 后缀数组](https://oi-wiki.org/string/sa/)`
	`135`	`+`
	`136`	`+`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit a13e70f

File tree

1 file changed

1 file changed

`‎docs/04_string/04_12_suffix_array.md`

0 commit comments