Add z-algorithm and regular-expression-matching Chinese translation for README

letianpailove · letianpailove · commit c46627f9dab3 · 2025-03-04T17:35:56.000+08:00
diff --git a/src/algorithms/string/regular-expression-matching/README.md b/src/algorithms/string/regular-expression-matching/README.md
@@ -1,6 +1,9 @@
 # Regular Expression Matching
 
-Given an input string `s` and a pattern `p`, implement regular 
+_Read this in other languages:_
+[_简体中文_](README.zh-CN.md)
+
+Given an input string `s` and a pattern `p`, implement regular
 expression matching with support for `.` and `*`.
 
 - `.` Matches any single character.
@@ -18,6 +21,7 @@ The matching should cover the **entire** input string (not partial).
 **Example #1**
 
 Input:
+
 ```
 s = 'aa'
 p = 'a'
@@ -30,14 +34,15 @@ Explanation: `a` does not match the entire string `aa`.
 **Example #2**
 
 Input:
+
 ```
 s = 'aa'
 p = 'a*'
 ```
 
 Output: `true`
 
-Explanation: `*` means zero or more of the preceding element, `a`. 
+Explanation: `*` means zero or more of the preceding element, `a`.
 Therefore, by repeating `a` once, it becomes `aa`.
 
 **Example #3**
@@ -64,7 +69,7 @@ p = 'c*a*b'
 
 Output: `true`
 
-Explanation: `c` can be repeated 0 times, `a` can be repeated 
+Explanation: `c` can be repeated 0 times, `a` can be repeated
 1 time. Therefore it matches `aab`.
 
 ## References
diff --git a/src/algorithms/string/regular-expression-matching/README.zh-CN.md b/src/algorithms/string/regular-expression-matching/README.zh-CN.md
@@ -0,0 +1,153 @@
+# 正则表达式匹配
+
+## 问题描述
+
+给定输入字符串 `s` 和模式 `p`，实现支持 `.` 和 `*` 的正则表达式匹配规则：
+• `.` 匹配任意单个字符
+• `*` 匹配零个或多个前导元素
+
+要求完全匹配整个输入字符串（非部分匹配）。
+
+**约束条件**
+• `s` 可能为空且仅包含小写字母 `a-z`
+• `p` 可能为空且包含小写字母 `a-z`、`.` 或 `*`
+
+## 示例分析
+
+<details>
+<summary>展开查看示例</summary>
+
+**示例 1**  
+输入：`s = "aa"`, `p = "a"`  
+输出：`false`  
+解释：模式仅匹配单个 `a`，无法覆盖整个字符串 `aa`
+
+**示例 2**  
+输入：`s = "aa"`, `p = "a*"`  
+输出：`true`  
+解释：`*` 表示前导元素 `a` 重复一次，形成 `aa`
+
+**示例 3**  
+输入：`s = "ab"`, `p = ".*"`  
+输出：`true`  
+解释：`.*` 表示零个或多个任意字符，可匹配任意两个字符
+
+**示例 4**  
+输入：`s = "aab"`, `p = "c*a*b"`  
+输出：`true`  
+解释：`c*` 匹配空，`a*` 匹配两次 `a`，`b` 匹配 `b`
+
+</details>
+
+---
+
+## 核心解法
+
+### 1. 递归法（回溯）
+
+**核心思想**：通过分解模式字符串处理 `*` 的多种可能性：
+• **场景 1**：`*` 匹配零次前导字符，跳过模式中的 `x*` 组合
+• **场景 2**：`*` 匹配一次或多次，递归缩减输入字符串
+
+```python
+class Solution:
+    def isMatch(self, s: str, p: str) -> bool:
+        if not p:
+            return not s
+
+        first_match = bool(s) and p[0] in {s[0], '.'}
+
+        if len(p) >= 2 and p[1] == '*':
+            return self.isMatch(s, p[2:]) or (first_match and self.isMatch(s[1:], p))
+        else:
+            return first_match and self.isMatch(s[1:], p[1:])
+```
+
+**复杂度**：最坏时间复杂度 $O(2^{m+n})$（m、n 为字符串长度），空间复杂度 $O(m^2 + n^2)$
+
+---
+
+### 2. 记忆化搜索优化
+
+**优化点**：通过缓存中间结果避免重复计算：
+
+```python
+from functools import lru_cache
+
+class Solution:
+    @lru_cache(maxsize=None)
+    def isMatch(self, s: str, p: str) -> bool:
+        if not p: return not s
+        first_match = bool(s) and p[0] in {s[0], '.'}
+
+        if len(p) >=2 and p[1] == '*':
+            return self.isMatch(s, p[2:]) or (first_match and self.isMatch(s[1:], p))
+        else:
+            return first_match and self.isMatch(s[1:], p[1:])
+```
+
+**复杂度**：时间复杂度优化至 $O(mn)$，空间复杂度 $O(mn)$
+
+---
+
+### 3. 动态规划（DP）
+
+**状态定义**：`dp[i][j]` 表示 `s[0..i)` 与 `p[0..j)` 是否匹配  
+**状态转移方程**：
+
+```math
+dp[i][j] =
+\begin{cases}
+dp[i-1][j-1] & \text{if } p[j-1] \neq '*' \text{且匹配成功} \\
+dp[i][j-2] \text{（匹配0次）} \quad \lor \quad (dp[i-1][j] \text{且当前字符匹配}) & \text{if } p[j-1] = '*'
+\end{cases}
+```
+
+**实现代码**：
+
+```java
+public boolean isMatch(String s, String p) {
+    int m = s.length(), n = p.length();
+    boolean[][] dp = new boolean[m+1][n+1];
+    dp[0][0] = true;
+
+    // 初始化空模式匹配情况
+    for(int j=1; j<=n; j++){
+        if(p.charAt(j-1) == '*' && j>=2)
+            dp[0][j] = dp[0][j-2];
+    }
+
+    for(int i=1; i<=m; i++){
+        for(int j=1; j<=n; j++){
+            char sc = s.charAt(i-1), pc = p.charAt(j-1);
+            if(pc != '*'){
+                dp[i][j] = (pc == '.' || pc == sc) && dp[i-1][j-1];
+            } else {
+                dp[i][j] = dp[i][j-2] ||
+                           ((p.charAt(j-2) == '.' || p.charAt(j-2) == sc) && dp[i-1][j]);
+            }
+        }
+    }
+    return dp[m][n];
+}
+```
+
+**复杂度**：时间复杂度 $O(mn)$，空间复杂度 $O(mn)$（可优化至 $O(n)$）
+
+---
+
+## 关键难点解析
+
+1. **`*` 的语义处理**：需同时考虑匹配零次和多次的情况
+   • 示例：模式 `a*` 可匹配空字符串、`a`、`aa` 等
+2. **`.` 的泛匹配特性**：需在递归/动态规划中特殊处理通配逻辑
+3. **边界条件处理**：空字符串与模式的匹配关系（如 `s=""` 和 `p="a*"` 应返回 `true`）
+
+---
+
+## 参考文献
+
+1. [LeetCode 正则表达式匹配算法解析（CSDN）](https://blog.csdn.net/xx_123_1_rj/article/details/130455123)
+2. [动态规划解法详解（CSDN）](https://blog.csdn.net/qq_40280096/article/details/100177992)
+3. [递归与记忆化搜索优化实践（力扣官方）](https://leetcode.cn/problems/regular-expression-matching/solution/)
+4. [正则表达式引擎实现原理（英文原题）](https://leetcode.com/problems/regular-expression-matching/)
diff --git a/src/algorithms/string/z-algorithm/README.md b/src/algorithms/string/z-algorithm/README.md
@@ -1,32 +1,35 @@
 # Z Algorithm
 
-The Z-algorithm finds occurrences of a "word" `W` 
+_Read this in other languages:_
+[_简体中文_](README.zh-CN.md)
+
+The Z-algorithm finds occurrences of a "word" `W`
 within a main "text string" `T` in linear time `O(|W| + |T|)`.
 
-Given a string `S` of length `n`, the algorithm produces 
-an array, `Z` where `Z[i]` represents the longest substring 
+Given a string `S` of length `n`, the algorithm produces
+an array, `Z` where `Z[i]` represents the longest substring
 starting from `S[i]` which is also a prefix of `S`. Finding
-`Z` for the string obtained by concatenating the word, `W` 
+`Z` for the string obtained by concatenating the word, `W`
 with a nonce character, say `$` followed by the text, `T`,
 helps with pattern matching, for if there is some index `i`
 such that `Z[i]` equals the pattern length, then the pattern
 must be present at that point.
 
 While the `Z` array can be computed with two nested loops in `O(|W| * |T|)` time, the
-following strategy shows how to obtain it in linear time, based 
-on the idea that as we iterate over the letters in the string 
+following strategy shows how to obtain it in linear time, based
+on the idea that as we iterate over the letters in the string
 (index `i` from `1` to `n - 1`), we maintain an interval `[L, R]`
-which is the interval with maximum `R` such that `1 ≤ L ≤ i ≤ R` 
-and `S[L...R]` is a prefix that is also a substring (if no such 
-interval exists, just let `L = R =  - 1`). For `i = 1`, we can 
+which is the interval with maximum `R` such that `1 ≤ L ≤ i ≤ R`
+and `S[L...R]` is a prefix that is also a substring (if no such
+interval exists, just let `L = R =  - 1`). For `i = 1`, we can
 simply compute `L` and `R` by comparing `S[0...]` to `S[1...]`.
 
 **Example of Z array**
 
 ```
-Index            0   1   2   3   4   5   6   7   8   9  10  11 
+Index            0   1   2   3   4   5   6   7   8   9  10  11
 Text             a   a   b   c   a   a   b   x   a   a   a   z
-Z values         X   1   0   0   3   1   0   0   2   2   1   0 
+Z values         X   1   0   0   3   1   0   0   2   2   1   0
 ```
 
 Other examples
diff --git a/src/algorithms/string/z-algorithm/README.zh-CN.md b/src/algorithms/string/z-algorithm/README.zh-CN.md
@@ -0,0 +1,85 @@
+# Z Algorithm
+
+Z 算法（Z-Algorithm）是一种线性时间复杂度的字符串模式匹配算法，用于在文本串 `T` 中高效定位模式串 `W` 的所有出现位置。其核心思想基于预处理生成的 Z 数组，该数组记录了字符串各位置的最长公共前缀信息。
+
+---
+
+### **算法核心概念**
+
+1. **Z 数组定义**  
+   给定长度为 `n` 的字符串 `S`，Z 数组 `Z[i]` 表示从位置 `i` 开始的最长子串长度，该子串同时是 `S` 的前缀。例如：
+
+   ```
+   字符串 S: a a b c a a b x a a a z
+   Z数组值: X 1 0 0 3 1 0 0 2 2 1 0
+   ```
+
+2. **Z-box（匹配区间）**  
+   算法维护一个动态区间 `[L, R]`，表示当前已知的最大右边界 `R`，使得子串 `S[L...R]` 是 `S` 的前缀。通过该区间优化计算过程，避免重复匹配。
+
+3. **模式匹配应用**  
+   将模式串 `W` 与文本串 `T` 拼接为 `W$T`（`$` 为分隔符），计算其 Z 数组。若某位置的 `Z[i] = |W|`，则 `W` 在 `T` 中匹配成功。
+
+---
+
+### **Z 数组计算步骤**
+
+1. **初始化**  
+   设置 `L = R = 0`，`Z[0]` 无意义（通常设为 `n` 或忽略）。
+2. **遍历字符串**  
+   对每个位置 `i`（从 `1` 开始）：
+   • **情况 1**（`i > R`）：暴力扩展比较 `S[0...]` 与 `S[i...]`，更新 `L` 和 `R`。
+   • **情况 2**（`i ≤ R`）：利用已有 Z-box 信息：
+   ◦ 若 `Z[i-L] < R-i+1`，则 `Z[i] = Z[i-L]`。
+   ◦ 否则，从 `R+1` 开始暴力扩展，更新 `L` 和 `R`。
+
+---
+
+### **复杂度分析**
+
+• **时间复杂度**：`O(|W| + |T|)`。每个字符最多被比较两次（扩展和暴力匹配）。
+• **空间复杂度**：`O(|W|)`，仅需存储 Z 数组。
+
+---
+
+### **示例与可视化**
+
+1. **示例 1**
+   ```
+   字符串: a a a a a a
+   Z数组: X 5 4 3 2 1
+   ```
+2. **示例 2**
+   ```
+   字符串: a b a b a b a b
+   Z数组: X 0 6 0 4 0 2 0
+   ```
+3. **Z-box 动态过程**  
+   ![Z-box示意图](https://ivanyu.me/wp-content/uploads/2014/09/zalg1.png)
+
+---
+
+### **与其他算法的对比**
+
+| 算法       | 时间复杂度 | 空间复杂度 | 核心思想               |
+| ---------- | ---------- | ---------- | ---------------------- |
+| **Z 算法** | O(n+m)     | O(n)       | Z 数组与动态区间优化   |
+| **KMP**    | O(n+m)     | O(m)       | 部分匹配表（前缀函数） |
+| **BM**     | O(n/m)     | O(m)       | 坏字符与好后缀规则     |
+| **暴力法** | O(nm)      | O(1)       | 双重嵌套循环           |
+
+---
+
+### **应用场景**
+
+1. **模式匹配**：高效定位文本中所有模式出现位置。
+2. **最长回文子串**：结合 Manacher 算法优化。
+3. **重复子串检测**：通过 Z 数组快速识别周期性模式。
+
+---
+
+### **参考文献**
+
+1. [Z 算法详解 - 知乎专栏](https://zhuanlan.zhihu.com/p/403256247)
+2. [Z 算法可视化教程 - Bilibili](https://www.bilibili.com/video/BV1qK4y1h7qk)
+3. [Z 算法在字符串处理中的应用 - 博客园](https://www.cnblogs.com/zyb993963526/p/10600775.html)