模块化编程概念（模块、包、导入）及常见系统模块总结和第三方模块管理-CSDN博客

本文链接：https://blog.csdn.net/lgcloveself1/article/details/149489483

一、模块化编程概念

1.1 模块基本概念

定义：模块是Python文件（.py文件），用于组织代码，避免重复，提高复用性。
作用：将相关功能的代码封装在单独文件中，便于维护和复用。
使用：通过import语句导入模块内容。

示例：

# 创建math_utils.py模块
# math_utils.py
def add(a, b):
    return a + b

def subtract(a, b):
    return a - b

# 导入使用模块
import math_utils

result = math_utils.add(5, 3)
print(result)  # 输出: 8

1.2 模块导入方式

三种主要导入语法：

导入整个模块
导入特定属性
导入全部属性

示例：

# 1. 导入整个模块
import math_utils as mu
print(mu.subtract(10, 4))  # 输出: 6

# 2. 导入特定函数
from math_utils import add
print(add(3, 5))  # 输出: 8

# 3. 导入所有内容（不推荐）
from math_utils import *
print(subtract(8, 2))  # 输出: 6

1.3 模块内部属性

两个重要内置属性：

__file__：模块文件路径
__name__：模块名称（主模块为__main__）

示例：

# 在模块中添加
print(f"模块路径: {__file__}")
print(f"模块名称: {__name__}")

# 运行输出:
# 主模块: __name__ = '__main__'
# 导入模块: __name__ = '模块名'

1.4 模块分类

Python模块分为三类：

系统模块：Python内置（如math, os）
第三方模块：通过pip安装（如numpy）
自定义模块：用户创建的.py文件

二、系统模块详解

2.1 常见系统模块

模块	功能	常用函数
`random`	随机数生成	randint(), choice(), shuffle()
`time`	时间操作	time(), sleep(), ctime()
`os`	系统操作	listdir(), mkdir(), remove()
`datetime`	日期时间处理	datetime.now(), timedelta()

2.2 模块使用示例

random模块：

import random

# 生成1-10随机整数
print(random.randint(1, 10))  

# 从序列随机选择
fruits = ['apple', 'banana', 'cherry']
print(random.choice(fruits))  

# 打乱列表顺序
cards = list(range(1, 11))
random.shuffle(cards)
print(cards)  # 如[7, 2, 9, ...]

os模块：

import os

# 获取当前目录
print(os.getcwd())  

# 列出目录内容
print(os.listdir())  

# 创建新目录
os.mkdir("new_folder")

# 删除文件
open("temp.txt", "w").close()
os.remove("temp.txt")

三、第三方模块管理

3.1 安装与使用

通过pip安装管理第三方模块：

# 安装模块
pip install numpy pandas requests

# 查看已安装模块
pip list

3.2 常用第三方模块

模块	功能	使用场景
`numpy`	数值计算	科学计算，矩阵操作
`pandas`	数据分析	数据处理，CSV操作
`requests`	HTTP请求	网络请求，API调用
`matplotlib`	数据可视化	绘制图表

示例：

import numpy as np
import pandas as pd
import requests

# NumPy数组操作
arr = np.array([1, 2, 3])
print(arr * 2)  # [2 4 6]

# Pandas数据处理
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df)

# Requests网络请求
response = requests.get("https://api.github.com")
print(response.status_code)  # 200

四、包(Package)管理

4.1 包基本概念

定义：包含__init__.py文件的目录，用于组织多个模块。
结构：

my_package/
    __init__.py
    module1.py
    module2.py
    subpackage/
        __init__.py
        module3.py

4.2 导入包和子包

# 导入包中的模块
import my_package.module1

# 导入子包中的模块
from my_package.subpackage import module3

# 导入特定函数
from my_package.module1 import my_function

4.3 init.py作用

标识包目录
执行初始化代码
定义__all__控制导入范围
提供包级别命名空间

示例：

# my_package/__init__.py
__all__ = ['module1']  # 限制导入范围

print("包初始化完成")

# 包级别变量
version = "1.0"

五、模块搜索路径解析

5.1 搜索优先级

内置模块（sys.builtin_module_names）
sys.modules缓存
当前工作目录
PYTHONPATH环境变量
标准库路径
site-packages（第三方库）
__pycache__字节码缓存

查看路径：

import sys
print(sys.path)

5.2 添加自定义路径

import sys
sys.path.insert(0, "/custom/module/path")

六、正则表达式核心语法

6.1 字符匹配

模式	功能	示例
.	匹配除换行符外任意字符	`a.c` → "abc", "a c"
\d	匹配数字	`\d\d` → "42"
\w	字母数字下划线	`\w+` → "hello_123"
[abc]	匹配指定字符	`[aeiou]` → "e" in "hello"
[^a]	排除指定字符	`[^0-9]` → "a" in "a1"

示例：

import re

text = "Contact: 电话12345，邮箱abc@example.com"
numbers = re.findall(r'\d+', text)
print(numbers)  # ['12345']

6.2 数量控制

量词	功能	示例
*	0次或多次	`a*b` → "b", "aaab"
+	1次或多次	`a+b` → "ab", "aaab"
?	0次或1次	`a?b` → "b", "ab"
{n}	精确n次	`a{3}` → "aaa"
{n,}	至少n次	`a{2,}` → "aaa"
{n,m}	n到m次	`a{2,4}` → "aaa"

示例：

text = "错误: 404, 警告: 30303, 信息: 123"
codes = re.findall(r'\d{3}', text)
print(codes)  # ['404', '303', '123']

6.3 分组与捕获

使用()创建捕获组：

text = "姓名: 张三, 年龄: 30, 姓名: 李四, 年龄: 25"
matches = re.findall(r'姓名: (\w+), 年龄: (\d+)', text)
print(matches)  # [('张三', '30'), ('李四', '25')]

6.4 边界匹配

边界	功能	示例
^	字符串开始	`^Hello` → 匹配开头
$	字符串结束	`end$` → 匹配结尾
\b	单词边界	`\bcat\b` → "a cat"

示例：

text = "start message end"
match = re.search(r'^start.*end$', text)
print(match.group() if match else "未匹配")  # "start message end"

七、正则表达式高级应用

7.1 常用方法对比

方法	返回类型	功能
findall()	列表	所有匹配字符串
search()	Match对象	第一个匹配位置
match()	Match对象	仅匹配字符串开头
finditer()	迭代器	返回匹配对象迭代器
sub()	字符串	替换匹配内容
split()	列表	按匹配模式分割字符串

7.2 复杂模式匹配

匹配邮箱：

emails = "联系: a@b.com, 客服: support@company.org"
pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
found = re.findall(pattern, emails)
print(found)  # ['a@b.com', 'support@company.org']

提取HTML内容：

html = "<div>内容<p>段落文本</p></div>"
match = re.search(r'<p>(.*?)</p>', html)
print(match.group(1)) if match else None  # "段落文本"

7.3 实用函数封装

def extract_phone_numbers(text):
    """提取中国手机号"""
    pattern = r'(?<!\d)(1[3-9]\d{9})(?!\d)'
    return re.findall(pattern, text)

text = "联系方式: 13800138000, 备用: 13912345678"
print(extract_phone_numbers(text))  # ['13800138000', '13912345678']

八、正则表达式性能优化

预编译正则：

pattern = re.compile(r'\d{4}-\d{2}-\d{2}')  # 编译一次多次使用
dates = pattern.findall("日期: 2023-01-01, 2023-02-15")

使用非捕获组 (?:...)：

# 比捕获组 ( ) 更高效
phones = re.findall(r'(?:\+86)?(1\d{10})', text)

避免回溯陷阱：

# 差: .* 可能导致过度回溯
# 好: 使用[^>]*等限制范围
html_pattern = re.compile(r'<div[^>]*>.*?</div>')

九、综合应用案例

9.1 日志分析系统

import re
from collections import Counter

log_data = """
[2023-05-01 10:23:45] INFO: User login
[2023-05-01 10:24:12] ERROR: Database connection failed
[2023-05-01 11:30:22] WARNING: High memory usage
[2023-05-01 11:45:03] ERROR: File not found
"""

# 提取日志级别
levels = re.findall(r'\] (\w+):', log_data)
level_count = Counter(levels)

# 提取错误详情
errors = re.findall(r'ERROR: (.+)', log_data)

print("错误统计:", level_count.most_common())
print("错误详情:", errors)

9.2 数据清洗管道

import re

def clean_text(text):
    # 移除非打印字符
    text = re.sub(r'[\x00-\x1F\x7F]', '', text)
    
    # 标准化电话号码
    text = re.sub(r'(?<!\d)(1\d{2})[ -]?(\d{4})[ -]?(\d{4})(?!\d)', r'\1****\3', text)
    
    # 移除多余空格
    text = re.sub(r'\s+', ' ', text).strip()
    
    return text

dirty_text = "联系: 138 1234 5678 \t 或 139-8765-4321 \x0B"
print(clean_text(dirty_text))
# 输出: "联系: 138****5678 或 139****4321"