Skip to content

Commit c46627f

Browse files
committedMar 4, 2025
Add z-algorithm and regular-expression-matching Chinese translation for README
1 parent e40a67b commit c46627f

File tree

4 files changed

+260
-14
lines changed

4 files changed

+260
-14
lines changed
 

‎src/algorithms/string/regular-expression-matching/README.md

+8-3
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
# Regular Expression Matching
22

3-
Given an input string `s` and a pattern `p`, implement regular
3+
_Read this in other languages:_
4+
[_简体中文_](README.zh-CN.md)
5+
6+
Given an input string `s` and a pattern `p`, implement regular
47
expression matching with support for `.` and `*`.
58

69
- `.` Matches any single character.
@@ -18,6 +21,7 @@ The matching should cover the **entire** input string (not partial).
1821
**Example #1**
1922

2023
Input:
24+
2125
```
2226
s = 'aa'
2327
p = 'a'
@@ -30,14 +34,15 @@ Explanation: `a` does not match the entire string `aa`.
3034
**Example #2**
3135

3236
Input:
37+
3338
```
3439
s = 'aa'
3540
p = 'a*'
3641
```
3742

3843
Output: `true`
3944

40-
Explanation: `*` means zero or more of the preceding element, `a`.
45+
Explanation: `*` means zero or more of the preceding element, `a`.
4146
Therefore, by repeating `a` once, it becomes `aa`.
4247

4348
**Example #3**
@@ -64,7 +69,7 @@ p = 'c*a*b'
6469

6570
Output: `true`
6671

67-
Explanation: `c` can be repeated 0 times, `a` can be repeated
72+
Explanation: `c` can be repeated 0 times, `a` can be repeated
6873
1 time. Therefore it matches `aab`.
6974

7075
## References
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
# 正则表达式匹配
2+
3+
## 问题描述
4+
5+
给定输入字符串 `s` 和模式 `p`,实现支持 `.``*` 的正则表达式匹配规则:
6+
`.` 匹配任意单个字符
7+
`*` 匹配零个或多个前导元素
8+
9+
要求完全匹配整个输入字符串(非部分匹配)。
10+
11+
**约束条件**
12+
`s` 可能为空且仅包含小写字母 `a-z`
13+
`p` 可能为空且包含小写字母 `a-z``.``*`
14+
15+
## 示例分析
16+
17+
<details>
18+
<summary>展开查看示例</summary>
19+
20+
**示例 1**
21+
输入:`s = "aa"`, `p = "a"`
22+
输出:`false`
23+
解释:模式仅匹配单个 `a`,无法覆盖整个字符串 `aa`
24+
25+
**示例 2**
26+
输入:`s = "aa"`, `p = "a*"`
27+
输出:`true`
28+
解释:`*` 表示前导元素 `a` 重复一次,形成 `aa`
29+
30+
**示例 3**
31+
输入:`s = "ab"`, `p = ".*"`
32+
输出:`true`
33+
解释:`.*` 表示零个或多个任意字符,可匹配任意两个字符
34+
35+
**示例 4**
36+
输入:`s = "aab"`, `p = "c*a*b"`
37+
输出:`true`
38+
解释:`c*` 匹配空,`a*` 匹配两次 `a``b` 匹配 `b`
39+
40+
</details>
41+
42+
---
43+
44+
## 核心解法
45+
46+
### 1. 递归法(回溯)
47+
48+
**核心思想**:通过分解模式字符串处理 `*` 的多种可能性:
49+
**场景 1**`*` 匹配零次前导字符,跳过模式中的 `x*` 组合
50+
**场景 2**`*` 匹配一次或多次,递归缩减输入字符串
51+
52+
```python
53+
class Solution:
54+
def isMatch(self, s: str, p: str) -> bool:
55+
if not p:
56+
return not s
57+
58+
first_match = bool(s) and p[0] in {s[0], '.'}
59+
60+
if len(p) >= 2 and p[1] == '*':
61+
return self.isMatch(s, p[2:]) or (first_match and self.isMatch(s[1:], p))
62+
else:
63+
return first_match and self.isMatch(s[1:], p[1:])
64+
```
65+
66+
**复杂度**:最坏时间复杂度 $O(2^{m+n})$(m、n 为字符串长度),空间复杂度 $O(m^2 + n^2)$
67+
68+
---
69+
70+
### 2. 记忆化搜索优化
71+
72+
**优化点**:通过缓存中间结果避免重复计算:
73+
74+
```python
75+
from functools import lru_cache
76+
77+
class Solution:
78+
@lru_cache(maxsize=None)
79+
def isMatch(self, s: str, p: str) -> bool:
80+
if not p: return not s
81+
first_match = bool(s) and p[0] in {s[0], '.'}
82+
83+
if len(p) >=2 and p[1] == '*':
84+
return self.isMatch(s, p[2:]) or (first_match and self.isMatch(s[1:], p))
85+
else:
86+
return first_match and self.isMatch(s[1:], p[1:])
87+
```
88+
89+
**复杂度**:时间复杂度优化至 $O(mn)$,空间复杂度 $O(mn)$
90+
91+
---
92+
93+
### 3. 动态规划(DP)
94+
95+
**状态定义**`dp[i][j]` 表示 `s[0..i)``p[0..j)` 是否匹配
96+
**状态转移方程**
97+
98+
```math
99+
dp[i][j] =
100+
\begin{cases}
101+
dp[i-1][j-1] & \text{if } p[j-1] \neq '*' \text{且匹配成功} \\
102+
dp[i][j-2] \text{(匹配0次)} \quad \lor \quad (dp[i-1][j] \text{且当前字符匹配}) & \text{if } p[j-1] = '*'
103+
\end{cases}
104+
```
105+
106+
**实现代码**
107+
108+
```java
109+
public boolean isMatch(String s, String p) {
110+
int m = s.length(), n = p.length();
111+
boolean[][] dp = new boolean[m+1][n+1];
112+
dp[0][0] = true;
113+
114+
// 初始化空模式匹配情况
115+
for(int j=1; j<=n; j++){
116+
if(p.charAt(j-1) == '*' && j>=2)
117+
dp[0][j] = dp[0][j-2];
118+
}
119+
120+
for(int i=1; i<=m; i++){
121+
for(int j=1; j<=n; j++){
122+
char sc = s.charAt(i-1), pc = p.charAt(j-1);
123+
if(pc != '*'){
124+
dp[i][j] = (pc == '.' || pc == sc) && dp[i-1][j-1];
125+
} else {
126+
dp[i][j] = dp[i][j-2] ||
127+
((p.charAt(j-2) == '.' || p.charAt(j-2) == sc) && dp[i-1][j]);
128+
}
129+
}
130+
}
131+
return dp[m][n];
132+
}
133+
```
134+
135+
**复杂度**:时间复杂度 $O(mn)$,空间复杂度 $O(mn)$(可优化至 $O(n)$)
136+
137+
---
138+
139+
## 关键难点解析
140+
141+
1. **`*` 的语义处理**:需同时考虑匹配零次和多次的情况
142+
• 示例:模式 `a*` 可匹配空字符串、`a``aa`
143+
2. **`.` 的泛匹配特性**:需在递归/动态规划中特殊处理通配逻辑
144+
3. **边界条件处理**:空字符串与模式的匹配关系(如 `s=""``p="a*"` 应返回 `true`
145+
146+
---
147+
148+
## 参考文献
149+
150+
1. [LeetCode 正则表达式匹配算法解析(CSDN)](https://blog.csdn.net/xx_123_1_rj/article/details/130455123)
151+
2. [动态规划解法详解(CSDN)](https://blog.csdn.net/qq_40280096/article/details/100177992)
152+
3. [递归与记忆化搜索优化实践(力扣官方)](https://leetcode.cn/problems/regular-expression-matching/solution/)
153+
4. [正则表达式引擎实现原理(英文原题)](https://leetcode.com/problems/regular-expression-matching/)

‎src/algorithms/string/z-algorithm/README.md

+14-11
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,35 @@
11
# Z Algorithm
22

3-
The Z-algorithm finds occurrences of a "word" `W`
3+
_Read this in other languages:_
4+
[_简体中文_](README.zh-CN.md)
5+
6+
The Z-algorithm finds occurrences of a "word" `W`
47
within a main "text string" `T` in linear time `O(|W| + |T|)`.
58

6-
Given a string `S` of length `n`, the algorithm produces
7-
an array, `Z` where `Z[i]` represents the longest substring
9+
Given a string `S` of length `n`, the algorithm produces
10+
an array, `Z` where `Z[i]` represents the longest substring
811
starting from `S[i]` which is also a prefix of `S`. Finding
9-
`Z` for the string obtained by concatenating the word, `W`
12+
`Z` for the string obtained by concatenating the word, `W`
1013
with a nonce character, say `$` followed by the text, `T`,
1114
helps with pattern matching, for if there is some index `i`
1215
such that `Z[i]` equals the pattern length, then the pattern
1316
must be present at that point.
1417

1518
While the `Z` array can be computed with two nested loops in `O(|W| * |T|)` time, the
16-
following strategy shows how to obtain it in linear time, based
17-
on the idea that as we iterate over the letters in the string
19+
following strategy shows how to obtain it in linear time, based
20+
on the idea that as we iterate over the letters in the string
1821
(index `i` from `1` to `n - 1`), we maintain an interval `[L, R]`
19-
which is the interval with maximum `R` such that `1 ≤ L ≤ i ≤ R`
20-
and `S[L...R]` is a prefix that is also a substring (if no such
21-
interval exists, just let `L = R =  - 1`). For `i = 1`, we can
22+
which is the interval with maximum `R` such that `1 ≤ L ≤ i ≤ R`
23+
and `S[L...R]` is a prefix that is also a substring (if no such
24+
interval exists, just let `L = R =  - 1`). For `i = 1`, we can
2225
simply compute `L` and `R` by comparing `S[0...]` to `S[1...]`.
2326

2427
**Example of Z array**
2528

2629
```
27-
Index 0 1 2 3 4 5 6 7 8 9 10 11
30+
Index 0 1 2 3 4 5 6 7 8 9 10 11
2831
Text a a b c a a b x a a a z
29-
Z values X 1 0 0 3 1 0 0 2 2 1 0
32+
Z values X 1 0 0 3 1 0 0 2 2 1 0
3033
```
3134

3235
Other examples
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# Z Algorithm
2+
3+
Z 算法(Z-Algorithm)是一种线性时间复杂度的字符串模式匹配算法,用于在文本串 `T` 中高效定位模式串 `W` 的所有出现位置。其核心思想基于预处理生成的 Z 数组,该数组记录了字符串各位置的最长公共前缀信息。
4+
5+
---
6+
7+
### **算法核心概念**
8+
9+
1. **Z 数组定义**
10+
给定长度为 `n` 的字符串 `S`,Z 数组 `Z[i]` 表示从位置 `i` 开始的最长子串长度,该子串同时是 `S` 的前缀。例如:
11+
12+
```
13+
字符串 S: a a b c a a b x a a a z
14+
Z数组值: X 1 0 0 3 1 0 0 2 2 1 0
15+
```
16+
17+
2. **Z-box(匹配区间)**
18+
算法维护一个动态区间 `[L, R]`,表示当前已知的最大右边界 `R`,使得子串 `S[L...R]``S` 的前缀。通过该区间优化计算过程,避免重复匹配。
19+
20+
3. **模式匹配应用**
21+
将模式串 `W` 与文本串 `T` 拼接为 `W$T``$` 为分隔符),计算其 Z 数组。若某位置的 `Z[i] = |W|`,则 `W``T` 中匹配成功。
22+
23+
---
24+
25+
### **Z 数组计算步骤**
26+
27+
1. **初始化**
28+
设置 `L = R = 0``Z[0]` 无意义(通常设为 `n` 或忽略)。
29+
2. **遍历字符串**
30+
对每个位置 `i`(从 `1` 开始):
31+
**情况 1**`i > R`):暴力扩展比较 `S[0...]``S[i...]`,更新 `L``R`
32+
**情况 2**`i ≤ R`):利用已有 Z-box 信息:
33+
◦ 若 `Z[i-L] < R-i+1`,则 `Z[i] = Z[i-L]`
34+
◦ 否则,从 `R+1` 开始暴力扩展,更新 `L``R`
35+
36+
---
37+
38+
### **复杂度分析**
39+
40+
**时间复杂度**`O(|W| + |T|)`。每个字符最多被比较两次(扩展和暴力匹配)。
41+
**空间复杂度**`O(|W|)`,仅需存储 Z 数组。
42+
43+
---
44+
45+
### **示例与可视化**
46+
47+
1. **示例 1**
48+
```
49+
字符串: a a a a a a
50+
Z数组: X 5 4 3 2 1
51+
```
52+
2. **示例 2**
53+
```
54+
字符串: a b a b a b a b
55+
Z数组: X 0 6 0 4 0 2 0
56+
```
57+
3. **Z-box 动态过程**
58+
![Z-box示意图](https://ivanyu.me/wp-content/uploads/2014/09/zalg1.png)
59+
60+
---
61+
62+
### **与其他算法的对比**
63+
64+
| 算法 | 时间复杂度 | 空间复杂度 | 核心思想 |
65+
| ---------- | ---------- | ---------- | ---------------------- |
66+
| **Z 算法** | O(n+m) | O(n) | Z 数组与动态区间优化 |
67+
| **KMP** | O(n+m) | O(m) | 部分匹配表(前缀函数) |
68+
| **BM** | O(n/m) | O(m) | 坏字符与好后缀规则 |
69+
| **暴力法** | O(nm) | O(1) | 双重嵌套循环 |
70+
71+
---
72+
73+
### **应用场景**
74+
75+
1. **模式匹配**:高效定位文本中所有模式出现位置。
76+
2. **最长回文子串**:结合 Manacher 算法优化。
77+
3. **重复子串检测**:通过 Z 数组快速识别周期性模式。
78+
79+
---
80+
81+
### **参考文献**
82+
83+
1. [Z 算法详解 - 知乎专栏](https://zhuanlan.zhihu.com/p/403256247)
84+
2. [Z 算法可视化教程 - Bilibili](https://www.bilibili.com/video/BV1qK4y1h7qk)
85+
3. [Z 算法在字符串处理中的应用 - 博客园](https://www.cnblogs.com/zyb993963526/p/10600775.html)

0 commit comments

Comments
 (0)
Please sign in to comment.