From c46627f9dab370789d98d6204f794ed7888a5661 Mon Sep 17 00:00:00 2001 From: letianpailove <1326907803@qq.com> Date: Tue, 4 Mar 2025 17:35:56 +0800 Subject: [PATCH] Add z-algorithm and regular-expression-matching Chinese translation for README --- .../regular-expression-matching/README.md | 11 +- .../README.zh-CN.md | 153 ++++++++++++++++++ src/algorithms/string/z-algorithm/README.md | 25 +-- .../string/z-algorithm/README.zh-CN.md | 85 ++++++++++ 4 files changed, 260 insertions(+), 14 deletions(-) create mode 100644 src/algorithms/string/regular-expression-matching/README.zh-CN.md create mode 100644 src/algorithms/string/z-algorithm/README.zh-CN.md diff --git a/src/algorithms/string/regular-expression-matching/README.md b/src/algorithms/string/regular-expression-matching/README.md index e04c071e40..7e648e4ad1 100644 --- a/src/algorithms/string/regular-expression-matching/README.md +++ b/src/algorithms/string/regular-expression-matching/README.md @@ -1,6 +1,9 @@ # Regular Expression Matching -Given an input string `s` and a pattern `p`, implement regular +_Read this in other languages:_ +[_简体中文_](README.zh-CN.md) + +Given an input string `s` and a pattern `p`, implement regular expression matching with support for `.` and `*`. - `.` Matches any single character. @@ -18,6 +21,7 @@ The matching should cover the **entire** input string (not partial). **Example #1** Input: + ``` s = 'aa' p = 'a' @@ -30,6 +34,7 @@ Explanation: `a` does not match the entire string `aa`. **Example #2** Input: + ``` s = 'aa' p = 'a*' @@ -37,7 +42,7 @@ p = 'a*' Output: `true` -Explanation: `*` means zero or more of the preceding element, `a`. +Explanation: `*` means zero or more of the preceding element, `a`. Therefore, by repeating `a` once, it becomes `aa`. **Example #3** @@ -64,7 +69,7 @@ p = 'c*a*b' Output: `true` -Explanation: `c` can be repeated 0 times, `a` can be repeated +Explanation: `c` can be repeated 0 times, `a` can be repeated 1 time. Therefore it matches `aab`. ## References diff --git a/src/algorithms/string/regular-expression-matching/README.zh-CN.md b/src/algorithms/string/regular-expression-matching/README.zh-CN.md new file mode 100644 index 0000000000..f6b6bd22da --- /dev/null +++ b/src/algorithms/string/regular-expression-matching/README.zh-CN.md @@ -0,0 +1,153 @@ +# 正则表达式匹配 + +## 问题描述 + +给定输入字符串 `s` 和模式 `p`,实现支持 `.` 和 `*` 的正则表达式匹配规则: +• `.` 匹配任意单个字符 +• `*` 匹配零个或多个前导元素 + +要求完全匹配整个输入字符串(非部分匹配)。 + +**约束条件** +• `s` 可能为空且仅包含小写字母 `a-z` +• `p` 可能为空且包含小写字母 `a-z`、`.` 或 `*` + +## 示例分析 + +<details> +<summary>展开查看示例</summary> + +**示例 1** +输入:`s = "aa"`, `p = "a"` +输出:`false` +解释:模式仅匹配单个 `a`,无法覆盖整个字符串 `aa` + +**示例 2** +输入:`s = "aa"`, `p = "a*"` +输出:`true` +解释:`*` 表示前导元素 `a` 重复一次,形成 `aa` + +**示例 3** +输入:`s = "ab"`, `p = ".*"` +输出:`true` +解释:`.*` 表示零个或多个任意字符,可匹配任意两个字符 + +**示例 4** +输入:`s = "aab"`, `p = "c*a*b"` +输出:`true` +解释:`c*` 匹配空,`a*` 匹配两次 `a`,`b` 匹配 `b` + +</details> + +--- + +## 核心解法 + +### 1. 递归法(回溯) + +**核心思想**:通过分解模式字符串处理 `*` 的多种可能性: +• **场景 1**:`*` 匹配零次前导字符,跳过模式中的 `x*` 组合 +• **场景 2**:`*` 匹配一次或多次,递归缩减输入字符串 + +```python +class Solution: + def isMatch(self, s: str, p: str) -> bool: + if not p: + return not s + + first_match = bool(s) and p[0] in {s[0], '.'} + + if len(p) >= 2 and p[1] == '*': + return self.isMatch(s, p[2:]) or (first_match and self.isMatch(s[1:], p)) + else: + return first_match and self.isMatch(s[1:], p[1:]) +``` + +**复杂度**:最坏时间复杂度 $O(2^{m+n})$(m、n 为字符串长度),空间复杂度 $O(m^2 + n^2)$ + +--- + +### 2. 记忆化搜索优化 + +**优化点**:通过缓存中间结果避免重复计算: + +```python +from functools import lru_cache + +class Solution: + @lru_cache(maxsize=None) + def isMatch(self, s: str, p: str) -> bool: + if not p: return not s + first_match = bool(s) and p[0] in {s[0], '.'} + + if len(p) >=2 and p[1] == '*': + return self.isMatch(s, p[2:]) or (first_match and self.isMatch(s[1:], p)) + else: + return first_match and self.isMatch(s[1:], p[1:]) +``` + +**复杂度**:时间复杂度优化至 $O(mn)$,空间复杂度 $O(mn)$ + +--- + +### 3. 动态规划(DP) + +**状态定义**:`dp[i][j]` 表示 `s[0..i)` 与 `p[0..j)` 是否匹配 +**状态转移方程**: + +```math +dp[i][j] = +\begin{cases} +dp[i-1][j-1] & \text{if } p[j-1] \neq '*' \text{且匹配成功} \\ +dp[i][j-2] \text{(匹配0次)} \quad \lor \quad (dp[i-1][j] \text{且当前字符匹配}) & \text{if } p[j-1] = '*' +\end{cases} +``` + +**实现代码**: + +```java +public boolean isMatch(String s, String p) { + int m = s.length(), n = p.length(); + boolean[][] dp = new boolean[m+1][n+1]; + dp[0][0] = true; + + // 初始化空模式匹配情况 + for(int j=1; j<=n; j++){ + if(p.charAt(j-1) == '*' && j>=2) + dp[0][j] = dp[0][j-2]; + } + + for(int i=1; i<=m; i++){ + for(int j=1; j<=n; j++){ + char sc = s.charAt(i-1), pc = p.charAt(j-1); + if(pc != '*'){ + dp[i][j] = (pc == '.' || pc == sc) && dp[i-1][j-1]; + } else { + dp[i][j] = dp[i][j-2] || + ((p.charAt(j-2) == '.' || p.charAt(j-2) == sc) && dp[i-1][j]); + } + } + } + return dp[m][n]; +} +``` + +**复杂度**:时间复杂度 $O(mn)$,空间复杂度 $O(mn)$(可优化至 $O(n)$) + +--- + +## 关键难点解析 + +1. **`*` 的语义处理**:需同时考虑匹配零次和多次的情况 + • 示例:模式 `a*` 可匹配空字符串、`a`、`aa` 等 +2. **`.` 的泛匹配特性**:需在递归/动态规划中特殊处理通配逻辑 +3. **边界条件处理**:空字符串与模式的匹配关系(如 `s=""` 和 `p="a*"` 应返回 `true`) + +--- + +## 参考文献 + +1. [LeetCode 正则表达式匹配算法解析(CSDN)](https://blog.csdn.net/xx_123_1_rj/article/details/130455123) +2. [动态规划解法详解(CSDN)](https://blog.csdn.net/qq_40280096/article/details/100177992) +3. [递归与记忆化搜索优化实践(力扣官方)](https://leetcode.cn/problems/regular-expression-matching/solution/) +4. [正则表达式引擎实现原理(英文原题)](https://leetcode.com/problems/regular-expression-matching/) diff --git a/src/algorithms/string/z-algorithm/README.md b/src/algorithms/string/z-algorithm/README.md index 0f12bc7333..cf2baf75ad 100644 --- a/src/algorithms/string/z-algorithm/README.md +++ b/src/algorithms/string/z-algorithm/README.md @@ -1,32 +1,35 @@ # Z Algorithm -The Z-algorithm finds occurrences of a "word" `W` +_Read this in other languages:_ +[_简体中文_](README.zh-CN.md) + +The Z-algorithm finds occurrences of a "word" `W` within a main "text string" `T` in linear time `O(|W| + |T|)`. -Given a string `S` of length `n`, the algorithm produces -an array, `Z` where `Z[i]` represents the longest substring +Given a string `S` of length `n`, the algorithm produces +an array, `Z` where `Z[i]` represents the longest substring starting from `S[i]` which is also a prefix of `S`. Finding -`Z` for the string obtained by concatenating the word, `W` +`Z` for the string obtained by concatenating the word, `W` with a nonce character, say `$` followed by the text, `T`, helps with pattern matching, for if there is some index `i` such that `Z[i]` equals the pattern length, then the pattern must be present at that point. While the `Z` array can be computed with two nested loops in `O(|W| * |T|)` time, the -following strategy shows how to obtain it in linear time, based -on the idea that as we iterate over the letters in the string +following strategy shows how to obtain it in linear time, based +on the idea that as we iterate over the letters in the string (index `i` from `1` to `n - 1`), we maintain an interval `[L, R]` -which is the interval with maximum `R` such that `1 ≤ L ≤ i ≤ R` -and `S[L...R]` is a prefix that is also a substring (if no such -interval exists, just let `L = R = - 1`). For `i = 1`, we can +which is the interval with maximum `R` such that `1 ≤ L ≤ i ≤ R` +and `S[L...R]` is a prefix that is also a substring (if no such +interval exists, just let `L = R = - 1`). For `i = 1`, we can simply compute `L` and `R` by comparing `S[0...]` to `S[1...]`. **Example of Z array** ``` -Index 0 1 2 3 4 5 6 7 8 9 10 11 +Index 0 1 2 3 4 5 6 7 8 9 10 11 Text a a b c a a b x a a a z -Z values X 1 0 0 3 1 0 0 2 2 1 0 +Z values X 1 0 0 3 1 0 0 2 2 1 0 ``` Other examples diff --git a/src/algorithms/string/z-algorithm/README.zh-CN.md b/src/algorithms/string/z-algorithm/README.zh-CN.md new file mode 100644 index 0000000000..d15ee0c7fa --- /dev/null +++ b/src/algorithms/string/z-algorithm/README.zh-CN.md @@ -0,0 +1,85 @@ +# Z Algorithm + +Z 算法(Z-Algorithm)是一种线性时间复杂度的字符串模式匹配算法,用于在文本串 `T` 中高效定位模式串 `W` 的所有出现位置。其核心思想基于预处理生成的 Z 数组,该数组记录了字符串各位置的最长公共前缀信息。 + +--- + +### **算法核心概念** + +1. **Z 数组定义** + 给定长度为 `n` 的字符串 `S`,Z 数组 `Z[i]` 表示从位置 `i` 开始的最长子串长度,该子串同时是 `S` 的前缀。例如: + + ``` + 字符串 S: a a b c a a b x a a a z + Z数组值: X 1 0 0 3 1 0 0 2 2 1 0 + ``` + +2. **Z-box(匹配区间)** + 算法维护一个动态区间 `[L, R]`,表示当前已知的最大右边界 `R`,使得子串 `S[L...R]` 是 `S` 的前缀。通过该区间优化计算过程,避免重复匹配。 + +3. **模式匹配应用** + 将模式串 `W` 与文本串 `T` 拼接为 `W$T`(`$` 为分隔符),计算其 Z 数组。若某位置的 `Z[i] = |W|`,则 `W` 在 `T` 中匹配成功。 + +--- + +### **Z 数组计算步骤** + +1. **初始化** + 设置 `L = R = 0`,`Z[0]` 无意义(通常设为 `n` 或忽略)。 +2. **遍历字符串** + 对每个位置 `i`(从 `1` 开始): + • **情况 1**(`i > R`):暴力扩展比较 `S[0...]` 与 `S[i...]`,更新 `L` 和 `R`。 + • **情况 2**(`i ≤ R`):利用已有 Z-box 信息: + ◦ 若 `Z[i-L] < R-i+1`,则 `Z[i] = Z[i-L]`。 + ◦ 否则,从 `R+1` 开始暴力扩展,更新 `L` 和 `R`。 + +--- + +### **复杂度分析** + +• **时间复杂度**:`O(|W| + |T|)`。每个字符最多被比较两次(扩展和暴力匹配)。 +• **空间复杂度**:`O(|W|)`,仅需存储 Z 数组。 + +--- + +### **示例与可视化** + +1. **示例 1** + ``` + 字符串: a a a a a a + Z数组: X 5 4 3 2 1 + ``` +2. **示例 2** + ``` + 字符串: a b a b a b a b + Z数组: X 0 6 0 4 0 2 0 + ``` +3. **Z-box 动态过程** +  + +--- + +### **与其他算法的对比** + +| 算法 | 时间复杂度 | 空间复杂度 | 核心思想 | +| ---------- | ---------- | ---------- | ---------------------- | +| **Z 算法** | O(n+m) | O(n) | Z 数组与动态区间优化 | +| **KMP** | O(n+m) | O(m) | 部分匹配表(前缀函数) | +| **BM** | O(n/m) | O(m) | 坏字符与好后缀规则 | +| **暴力法** | O(nm) | O(1) | 双重嵌套循环 | + +--- + +### **应用场景** + +1. **模式匹配**:高效定位文本中所有模式出现位置。 +2. **最长回文子串**:结合 Manacher 算法优化。 +3. **重复子串检测**:通过 Z 数组快速识别周期性模式。 + +--- + +### **参考文献** + +1. [Z 算法详解 - 知乎专栏](https://zhuanlan.zhihu.com/p/403256247) +2. [Z 算法可视化教程 - Bilibili](https://www.bilibili.com/video/BV1qK4y1h7qk) +3. [Z 算法在字符串处理中的应用 - 博客园](https://www.cnblogs.com/zyb993963526/p/10600775.html)