Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Chinese translation #2044

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions src/algorithms/string/regular-expression-matching/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# Regular Expression Matching

Given an input string `s` and a pattern `p`, implement regular
_Read this in other languages:_
[_简体中文_](README.zh-CN.md)

Given an input string `s` and a pattern `p`, implement regular
expression matching with support for `.` and `*`.

- `.` Matches any single character.
@@ -18,6 +21,7 @@ The matching should cover the **entire** input string (not partial).
**Example #1**

Input:

```
s = 'aa'
p = 'a'
@@ -30,14 +34,15 @@ Explanation: `a` does not match the entire string `aa`.
**Example #2**

Input:

```
s = 'aa'
p = 'a*'
```

Output: `true`

Explanation: `*` means zero or more of the preceding element, `a`.
Explanation: `*` means zero or more of the preceding element, `a`.
Therefore, by repeating `a` once, it becomes `aa`.

**Example #3**
@@ -64,7 +69,7 @@ p = 'c*a*b'

Output: `true`

Explanation: `c` can be repeated 0 times, `a` can be repeated
Explanation: `c` can be repeated 0 times, `a` can be repeated
1 time. Therefore it matches `aab`.

## References
153 changes: 153 additions & 0 deletions src/algorithms/string/regular-expression-matching/README.zh-CN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
# 正则表达式匹配

## 问题描述

给定输入字符串 `s` 和模式 `p`,实现支持 `.``*` 的正则表达式匹配规则:
`.` 匹配任意单个字符
`*` 匹配零个或多个前导元素

要求完全匹配整个输入字符串(非部分匹配)。

**约束条件**
`s` 可能为空且仅包含小写字母 `a-z`
`p` 可能为空且包含小写字母 `a-z``.``*`

## 示例分析

<details>
<summary>展开查看示例</summary>

**示例 1**
输入:`s = "aa"`, `p = "a"`
输出:`false`
解释:模式仅匹配单个 `a`,无法覆盖整个字符串 `aa`

**示例 2**
输入:`s = "aa"`, `p = "a*"`
输出:`true`
解释:`*` 表示前导元素 `a` 重复一次,形成 `aa`

**示例 3**
输入:`s = "ab"`, `p = ".*"`
输出:`true`
解释:`.*` 表示零个或多个任意字符,可匹配任意两个字符

**示例 4**
输入:`s = "aab"`, `p = "c*a*b"`
输出:`true`
解释:`c*` 匹配空,`a*` 匹配两次 `a``b` 匹配 `b`

</details>

---

## 核心解法

### 1. 递归法(回溯)

**核心思想**:通过分解模式字符串处理 `*` 的多种可能性:
**场景 1**`*` 匹配零次前导字符,跳过模式中的 `x*` 组合
**场景 2**`*` 匹配一次或多次,递归缩减输入字符串

```python
class Solution:
def isMatch(self, s: str, p: str) -> bool:
if not p:
return not s

first_match = bool(s) and p[0] in {s[0], '.'}

if len(p) >= 2 and p[1] == '*':
return self.isMatch(s, p[2:]) or (first_match and self.isMatch(s[1:], p))
else:
return first_match and self.isMatch(s[1:], p[1:])
```

**复杂度**:最坏时间复杂度 $O(2^{m+n})$(m、n 为字符串长度),空间复杂度 $O(m^2 + n^2)$

---

### 2. 记忆化搜索优化

**优化点**:通过缓存中间结果避免重复计算:

```python
from functools import lru_cache

class Solution:
@lru_cache(maxsize=None)
def isMatch(self, s: str, p: str) -> bool:
if not p: return not s
first_match = bool(s) and p[0] in {s[0], '.'}

if len(p) >=2 and p[1] == '*':
return self.isMatch(s, p[2:]) or (first_match and self.isMatch(s[1:], p))
else:
return first_match and self.isMatch(s[1:], p[1:])
```

**复杂度**:时间复杂度优化至 $O(mn)$,空间复杂度 $O(mn)$

---

### 3. 动态规划(DP)

**状态定义**`dp[i][j]` 表示 `s[0..i)``p[0..j)` 是否匹配
**状态转移方程**

```math
dp[i][j] =
\begin{cases}
dp[i-1][j-1] & \text{if } p[j-1] \neq '*' \text{且匹配成功} \\
dp[i][j-2] \text{(匹配0次)} \quad \lor \quad (dp[i-1][j] \text{且当前字符匹配}) & \text{if } p[j-1] = '*'
\end{cases}
```

**实现代码**

```java
public boolean isMatch(String s, String p) {
int m = s.length(), n = p.length();
boolean[][] dp = new boolean[m+1][n+1];
dp[0][0] = true;

// 初始化空模式匹配情况
for(int j=1; j<=n; j++){
if(p.charAt(j-1) == '*' && j>=2)
dp[0][j] = dp[0][j-2];
}

for(int i=1; i<=m; i++){
for(int j=1; j<=n; j++){
char sc = s.charAt(i-1), pc = p.charAt(j-1);
if(pc != '*'){
dp[i][j] = (pc == '.' || pc == sc) && dp[i-1][j-1];
} else {
dp[i][j] = dp[i][j-2] ||
((p.charAt(j-2) == '.' || p.charAt(j-2) == sc) && dp[i-1][j]);
}
}
}
return dp[m][n];
}
```

**复杂度**:时间复杂度 $O(mn)$,空间复杂度 $O(mn)$(可优化至 $O(n)$)

---

## 关键难点解析

1. **`*` 的语义处理**:需同时考虑匹配零次和多次的情况
• 示例:模式 `a*` 可匹配空字符串、`a``aa`
2. **`.` 的泛匹配特性**:需在递归/动态规划中特殊处理通配逻辑
3. **边界条件处理**:空字符串与模式的匹配关系(如 `s=""``p="a*"` 应返回 `true`

---

## 参考文献

1. [LeetCode 正则表达式匹配算法解析(CSDN)](https://blog.csdn.net/xx_123_1_rj/article/details/130455123)
2. [动态规划解法详解(CSDN)](https://blog.csdn.net/qq_40280096/article/details/100177992)
3. [递归与记忆化搜索优化实践(力扣官方)](https://leetcode.cn/problems/regular-expression-matching/solution/)
4. [正则表达式引擎实现原理(英文原题)](https://leetcode.com/problems/regular-expression-matching/)
25 changes: 14 additions & 11 deletions src/algorithms/string/z-algorithm/README.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,35 @@
# Z Algorithm

The Z-algorithm finds occurrences of a "word" `W`
_Read this in other languages:_
[_简体中文_](README.zh-CN.md)

The Z-algorithm finds occurrences of a "word" `W`
within a main "text string" `T` in linear time `O(|W| + |T|)`.

Given a string `S` of length `n`, the algorithm produces
an array, `Z` where `Z[i]` represents the longest substring
Given a string `S` of length `n`, the algorithm produces
an array, `Z` where `Z[i]` represents the longest substring
starting from `S[i]` which is also a prefix of `S`. Finding
`Z` for the string obtained by concatenating the word, `W`
`Z` for the string obtained by concatenating the word, `W`
with a nonce character, say `$` followed by the text, `T`,
helps with pattern matching, for if there is some index `i`
such that `Z[i]` equals the pattern length, then the pattern
must be present at that point.

While the `Z` array can be computed with two nested loops in `O(|W| * |T|)` time, the
following strategy shows how to obtain it in linear time, based
on the idea that as we iterate over the letters in the string
following strategy shows how to obtain it in linear time, based
on the idea that as we iterate over the letters in the string
(index `i` from `1` to `n - 1`), we maintain an interval `[L, R]`
which is the interval with maximum `R` such that `1 ≤ L ≤ i ≤ R`
and `S[L...R]` is a prefix that is also a substring (if no such
interval exists, just let `L = R =  - 1`). For `i = 1`, we can
which is the interval with maximum `R` such that `1 ≤ L ≤ i ≤ R`
and `S[L...R]` is a prefix that is also a substring (if no such
interval exists, just let `L = R =  - 1`). For `i = 1`, we can
simply compute `L` and `R` by comparing `S[0...]` to `S[1...]`.

**Example of Z array**

```
Index 0 1 2 3 4 5 6 7 8 9 10 11
Index 0 1 2 3 4 5 6 7 8 9 10 11
Text a a b c a a b x a a a z
Z values X 1 0 0 3 1 0 0 2 2 1 0
Z values X 1 0 0 3 1 0 0 2 2 1 0
```

Other examples
85 changes: 85 additions & 0 deletions src/algorithms/string/z-algorithm/README.zh-CN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Z Algorithm

Z 算法(Z-Algorithm)是一种线性时间复杂度的字符串模式匹配算法,用于在文本串 `T` 中高效定位模式串 `W` 的所有出现位置。其核心思想基于预处理生成的 Z 数组,该数组记录了字符串各位置的最长公共前缀信息。

---

### **算法核心概念**

1. **Z 数组定义**
给定长度为 `n` 的字符串 `S`,Z 数组 `Z[i]` 表示从位置 `i` 开始的最长子串长度,该子串同时是 `S` 的前缀。例如:

```
字符串 S: a a b c a a b x a a a z
Z数组值: X 1 0 0 3 1 0 0 2 2 1 0
```

2. **Z-box(匹配区间)**
算法维护一个动态区间 `[L, R]`,表示当前已知的最大右边界 `R`,使得子串 `S[L...R]``S` 的前缀。通过该区间优化计算过程,避免重复匹配。

3. **模式匹配应用**
将模式串 `W` 与文本串 `T` 拼接为 `W$T``$` 为分隔符),计算其 Z 数组。若某位置的 `Z[i] = |W|`,则 `W``T` 中匹配成功。

---

### **Z 数组计算步骤**

1. **初始化**
设置 `L = R = 0``Z[0]` 无意义(通常设为 `n` 或忽略)。
2. **遍历字符串**
对每个位置 `i`(从 `1` 开始):
**情况 1**`i > R`):暴力扩展比较 `S[0...]``S[i...]`,更新 `L``R`
**情况 2**`i ≤ R`):利用已有 Z-box 信息:
◦ 若 `Z[i-L] < R-i+1`,则 `Z[i] = Z[i-L]`
◦ 否则,从 `R+1` 开始暴力扩展,更新 `L``R`

---

### **复杂度分析**

**时间复杂度**`O(|W| + |T|)`。每个字符最多被比较两次(扩展和暴力匹配)。
**空间复杂度**`O(|W|)`,仅需存储 Z 数组。

---

### **示例与可视化**

1. **示例 1**
```
字符串: a a a a a a
Z数组: X 5 4 3 2 1
```
2. **示例 2**
```
字符串: a b a b a b a b
Z数组: X 0 6 0 4 0 2 0
```
3. **Z-box 动态过程**
![Z-box示意图](https://ivanyu.me/wp-content/uploads/2014/09/zalg1.png)

---

### **与其他算法的对比**

| 算法 | 时间复杂度 | 空间复杂度 | 核心思想 |
| ---------- | ---------- | ---------- | ---------------------- |
| **Z 算法** | O(n+m) | O(n) | Z 数组与动态区间优化 |
| **KMP** | O(n+m) | O(m) | 部分匹配表(前缀函数) |
| **BM** | O(n/m) | O(m) | 坏字符与好后缀规则 |
| **暴力法** | O(nm) | O(1) | 双重嵌套循环 |

---

### **应用场景**

1. **模式匹配**:高效定位文本中所有模式出现位置。
2. **最长回文子串**:结合 Manacher 算法优化。
3. **重复子串检测**:通过 Z 数组快速识别周期性模式。

---

### **参考文献**

1. [Z 算法详解 - 知乎专栏](https://zhuanlan.zhihu.com/p/403256247)
2. [Z 算法可视化教程 - Bilibili](https://www.bilibili.com/video/BV1qK4y1h7qk)
3. [Z 算法在字符串处理中的应用 - 博客园](https://www.cnblogs.com/zyb993963526/p/10600775.html)