Python3 【正则表达式】：经典示例参考手册

李智 - 重庆

于 2025-01-25 15:24:09 发布

阅读量1.1k

点赞数 22

CC 4.0 BY-SA版权

分类专栏： Python 精讲精练 - 从入门到实战文章标签：正则表达式经验分享趣味编程编程技巧干货满满

本文链接：https://blog.csdn.net/weixin_47267103/article/details/145355942

Python3 【正则表达式】：经典示例参考手册

文章摘要

本文由两部分组成：

基础速通：浓缩知识点，干货满满；
经典示例：15 个经典示例，便于模仿学习。

一、基础速通

正则表达式（Regular Expression，简称 regex 或 regexp）是一种强大的工具，用于匹配和处理文本。Python 通过 re 模块提供了对正则表达式的支持。正则表达式可以用于搜索、替换、分割和验证字符串。

1. 基本概念

模式（Pattern）：正则表达式的核心是模式，它定义了你要匹配的文本规则。
元字符（Metacharacters）：在正则表达式中具有特殊意义的字符，如 ., *, +, ?, ^, $, \, |, {, }, [, ], (, ) 等。
普通字符：除了元字符之外的字符，如字母、数字等。

2. 常用元字符

.：匹配除换行符以外的任意单个字符。
^：匹配字符串的开头。
$：匹配字符串的结尾。
*：匹配前面的字符零次或多次。
+：匹配前面的字符一次或多次。
?：匹配前面的字符零次或一次。
{n}：匹配前面的字符恰好 n 次。
{n,}：匹配前面的字符至少 n 次。
{n,m}：匹配前面的字符至少 n 次，至多 m 次。
\：转义字符，用于匹配元字符本身。
|：或操作符，匹配左边或右边的表达式。
[]：字符集，匹配其中的任意一个字符。
()：分组，将多个字符作为一个整体进行匹配。

3. 常用字符集

\d：匹配任意数字，等价于 [0-9]。
\D：匹配任意非数字字符，等价于 [^0-9]。
\w：匹配任意字母、数字或下划线，等价于 [a-zA-Z0-9_]。
\W：匹配任意非字母、数字或下划线的字符，等价于 [^a-zA-Z0-9_]。
\s：匹配任意空白字符，包括空格、制表符、换行符等。
\S：匹配任意非空白字符。

4. `re` 模块常用函数

re.match(pattern, string)：从字符串的起始位置匹配正则表达式，如果匹配成功返回匹配对象，否则返回 None。
re.search(pattern, string)：在字符串中搜索匹配正则表达式的第一个位置，如果匹配成功返回匹配对象，否则返回 None。
re.findall(pattern, string)：返回字符串中所有匹配正则表达式的子串，返回一个列表。
re.finditer(pattern, string)：返回一个迭代器，包含所有匹配正则表达式的子串。
re.sub(pattern, repl, string)：将字符串中匹配正则表达式的部分替换为 repl。
re.split(pattern, string)：根据正则表达式匹配的子串将字符串分割，返回一个列表。

5. 示例

5.1 匹配数字

import re

text = "The price is 123.45 dollars."
pattern = r'\d+\.\d+'
match = re.search(pattern, text)
if match:
    print("Found:", match.group())

5.2 替换字符串

import re

text = "Hello, world!"
pattern = r'world'
repl = 'Python'
new_text = re.sub(pattern, repl, text)
print(new_text)  # 输出: Hello, Python!

5.3 分割字符串

import re

text = "apple,banana,cherry"
pattern = r','
result = re.split(pattern, text)
print(result)  # 输出: ['apple', 'banana', 'cherry']

5.4 查找所有匹配

import re

text = "The rain in Spain falls mainly in the plain."
pattern = r'\bin\b'
matches = re.findall(pattern, text)
print(matches)  # 输出: ['in', 'in', 'in']

6. 分组和捕获

分组使用 () 来定义，可以捕获匹配的子串。

import re

text = "John Doe, Jane Doe"
pattern = r'(\w+) (\w+)'
matches = re.findall(pattern, text)
for first_name, last_name in matches:
    print(f"First: {
     
     first_name}, Last: {
     
     last_name}")

7. 非贪婪匹配

默认情况下，* 和 + 是贪婪的，会尽可能多地匹配字符。可以在它们后面加上 ? 来使其变为非贪婪匹配。

import re

text = "<html><head><title>Title</title></head></html>"
pattern = r'<.*?>'
matches = re.findall(pattern, text)
print(matches)  # 输出: ['<html>', '<head>', '<title>', '</title>', '</head>', '</html>']

8. 编译正则表达式

如果需要多次使用同一个正则表达式，可以将其编译为正则表达式对象，以提高效率。

import re

pattern = re.compile(r'\d+')
text = "There are 3 apples and 5 oranges."
matches = pattern.findall(text)

最低0.47元/天解锁文章

200万优质内容无限畅学

Python3 【正则表达式】：经典示例参考手册

Python3 【正则表达式】：经典示例参考手册

文章摘要

一、基础速通

1. 基本概念

2. 常用元字符

3. 常用字符集

4. re 模块常用函数

5. 示例

5.1 匹配数字

5.2 替换字符串

5.3 分割字符串

5.4 查找所有匹配

6. 分组和捕获

7. 非贪婪匹配

8. 编译正则表达式

4. `re` 模块常用函数