引言
在当今数据驱动的时代,数据可视化已经成为数据分析和科学研究中不可或缺的一部分。Python作为最受欢迎的数据科学编程语言之一,其强大的可视化生态系统为数据科学家和开发者提供了丰富的工具选择。在这个生态系统中,Matplotlib无疑是最基础也是最重要的可视化库之一。它不仅是许多其他高级可视化库的底层支撑,更是以其灵活性和强大的定制能力,成为了Python数据可视化领域的基石。本文将深入探讨Matplotlib的方方面面,从基础概念到高级技巧,通过大量实际案例和完整的代码示例,帮助读者全面掌握这个强大的可视化工具。
第一章:Matplotlib基础架构与核心概念
Matplotlib的设计理念源于MATLAB的绘图系统,但它在Python生态系统中发展出了自己独特的特性和优势。理解Matplotlib的核心架构对于深入掌握这个库至关重要。Matplotlib采用了分层的架构设计,主要包括三个层次:脚本层(pyplot)、美工层(artist)和后端层(backend)。脚本层提供了类似MATLAB的简单接口,适合快速绘图和原型开发;美工层提供了面向对象的接口,给予用户对图形元素的完全控制;后端层负责将图形渲染到不同的输出设备上,包括屏幕显示、文件保存等。这种分层设计使得Matplotlib既能满足简单快速绘图的需求,又能实现复杂的自定义可视化。
在实际使用中,我们最常接触的是pyplot接口和面向对象接口。pyplot接口简单直观,适合快速绘图和交互式探索;而面向对象接口则提供了更细粒度的控制,适合创建复杂的图形布局和自定义组件。理解这两种接口的特点和适用场景,能够帮助我们在不同的应用场景下做出合适的选择。此外,Matplotlib的图形对象模型也值得深入了解,包括Figure(画布)、Axes(坐标轴)、Axis(轴)、Artist(艺术家对象)等核心概念,这些概念构成了Matplotlib图形的基础框架。
1.1 基础环境配置与中文显示
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')
# 设置中文显示和负号显示
plt.rcParams['font.sans-serif'] = ['SimHei'] # 用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False # 用来正常显示负号
plt.rcParams['figure.dpi'] = 100 # 设置图形分辨率
plt.rcParams['figure.figsize'] = (12, 8) # 设置默认图形大小
# 创建示例数据
np.random.seed(42)
x = np.linspace(0, 10, 100)
y1 = np.sin(x) + np.random.normal(0, 0.1, 100)
y2 = np.cos(x) + np.random.normal(0, 0.1, 100)
# 创建基础图形
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 10))
# 第一个子图:折线图
ax1.plot(x, y1, 'b-', label='正弦波 + 噪声', linewidth=2, alpha=0.7)
ax1.plot(x, np.sin(x), 'r--', label='理论正弦波', linewidth=1.5)
ax1.fill_between(x, y1, np.sin(x), alpha=0.3, color='gray')
ax1.set_xlabel('时间 (秒)', fontsize=12)
ax1.set_ylabel('振幅', fontsize=12)
ax1.set_title('正弦波信号分析', fontsize=14, fontweight='bold')
ax1.legend(loc='upper right', fontsize=10)
ax1.grid(True, alpha=0.3, linestyle='--')
ax1.set_xlim([0, 10])
ax1.set_ylim([-1.5, 1.5])
# 第二个子图:余弦图
ax2.plot(x, y2, 'g-', label='余弦波 + 噪声', linewidth=2, alpha=0.7)
ax2.plot(x, np.cos(x), 'm--', label='理论余弦波', linewidth=1.5)
ax2.fill_between(x, y2, np.cos(x), alpha=0.3, color='yellow')
ax2.set_xlabel('时间 (秒)', fontsize=12)
ax2.set_ylabel('振幅', fontsize=12)
ax2.set_title('余弦波信号分析', fontsize=14, fontweight='bold')
ax2.legend(loc='upper right', fontsize=10)
ax2.grid(True, alpha=0.3, linestyle='--')
ax2.set_xlim([0, 10])
ax2.set_ylim([-1.5, 1.5])
# 调整子图间距
plt.tight_layout()
# 添加整体标题
fig.suptitle('Matplotlib基础配置示例:信号处理可视化', fontsize=16, fontweight='bold', y=1.02)
# 保存图形
plt.savefig('matplotlib_basic_config.png', dpi=300, bbox_inches='tight')
plt.show()
第二章:基础图表类型的深度探索
Matplotlib提供了丰富多样的图表类型,每种图表都有其特定的应用场景和最佳实践。在数据可视化的过程中,选择合适的图表类型对于有效传达信息至关重要。折线图适合展示时间序列数据的趋势变化;柱状图适合比较不同类别的数值大小;散点图适合展示变量之间的相关性;直方图适合展示数据的分布情况;饼图适合展示各部分占整体的比例关系。除了这些基础图表,Matplotlib还支持箱线图、小提琴图、等高线图、3D图表等高级图表类型,每种图表都有其独特的表现力和应用价值。
在实际应用中,我们往往需要根据数据的特性和分析目标来选择合适的图表类型。例如,在分析股票价格走势时,我们可能会使用K线图结合成交量柱状图;在展示地理数据时,我们可能会使用热力图或等高线图;在进行统计分析时,我们可能会使用箱线图来展示数据的分布特征和异常值。掌握各种图表类型的特点和使用技巧,能够让我们的数据可视化更加专业和有效。
2.1 折线图的高级应用
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy import signal
from datetime import datetime, timedelta
# 设置中文显示
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
# 创建时间序列数据
np.random.seed(42)
dates = pd.date_range(start='2023-01-01', end='2023-12-31', freq='D')
n_points = len(dates)
# 生成多个相关的时间序列
trend = np.linspace(100, 150, n_points)
seasonal = 10 * np.sin(np.linspace(0, 4*np.pi, n_points))
noise = np.random.normal(0, 5, n_points)
sales = trend + seasonal + noise
# 生成移动平均
window_size = 30
moving_avg = pd.Series(sales).rolling(window=window_size, center=True).mean()
# 生成置信区间
std = pd.Series(sales).rolling(window=window_size, center=True).std()
upper_bound = moving_avg + 1.96 * std
lower_bound = moving_avg - 1.96 * std
# 创建图形
fig, axes = plt.subplots(3, 1, figsize=(15, 12))
# 第一个子图:原始数据和趋势
ax1 = axes[0]
ax1.plot(dates, sales, 'b-', alpha=0.3, linewidth=0.5, label='原始销售数据')
ax1.plot(dates, moving_avg, 'r-', linewidth=2, label=f'{window_size}天移动平均')
ax1.fill_between(dates, lower_bound, upper_bound, alpha=0.2, color='gray', label='95% 置信区间')
ax1.set_xlabel('日期', fontsize=12)
ax1.set_ylabel('销售额 (万元)', fontsize=12)
ax1.set_title('2023年销售数据时间序列分析', fontsize=14, fontweight='bold')
ax1.legend(loc='upper left', fontsize=10)
ax1.grid(True, alpha=0.3)
ax1.set_xlim([dates[0], dates[-1]])
# 标注重要事件
important_dates = ['2023-03-15', '2023-06-18', '2023-11-11']
important_events = ['春季促销', '618购物节', '双十一']
for date_str, event in zip(important_dates, important_events):
date = pd.to_datetime(date_str)
idx = dates.get_loc(date)
ax1.annotate(event, xy=(date, sales[idx]), xytext=(date, sales[idx] + 20),
arrowprops=dict(arrowstyle='->', color='red', lw=1.5),
fontsize=10, color='red', ha='center')
# 第二个子图:分解后的成分
ax2 = axes[1]
ax2.plot(dates, trend, 'g-', linewidth=2, label='长期趋势')
ax2.plot(dates, seasonal + 125, 'b-', linewidth=1.5, label='季节性成分')
ax2.plot(dates, noise + 125, 'r-', alpha=0.3, linewidth=0.5, label='随机噪声')
ax2.set_xlabel('日期', fontsize=12)
ax2.set_ylabel('分解值', fontsize=12)
ax2.set_title('时间序列分解:趋势、季节性和噪声', fontsize=14, fontweight='bold')
ax2.legend(loc='upper left', fontsize=10)
ax2.grid(True, alpha=0.3)
ax2.set_xlim([dates[0], dates[-1]])
# 第三个子图:月度汇总
monthly_sales = pd.Series(sales, index=dates).resample('M').agg(['mean', 'std', 'min', 'max'])
months = monthly_sales.index
ax3 = axes[2]
ax3.plot(months, monthly_sales['mean'], 'o-', linewidth=2, markersize=8, label='月平均销售额')
ax3.fill_between(months, monthly_sales['min'], monthly_sales['max'],
alpha=0.3, color='lightblue', label='月度范围')
ax3.errorbar(months, monthly_sales['mean'], yerr=monthly_sales['std'],
fmt='none', ecolor='red', alpha=0.5, capsize=5, label='标准差')
ax3.set_xlabel('月份', fontsize=12)
ax3.set_ylabel('销售额 (万元)', fontsize=12)
ax3.set_title('月度销售统计汇总', fontsize=14, fontweight='bold')
ax3.legend(loc='upper left', fontsize=10)
ax3.grid(True, alpha=0.3)
# 添加月度数值标签
for i, (month, value) in enumerate(zip(months, monthly_sales['mean'])):
ax3.text(month, value + 5, f'{value:.1f}', ha='center', fontsize=9)
plt.tight_layout()
plt.savefig('advanced_line_plot.png', dpi=300, bbox_inches='tight')
plt.show()
2.2 柱状图的创新设计
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib.patches import Rectangle
import matplotlib.patches as mpatches
# 设置中文显示
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
# 创建销售数据
np.random.seed(42)
categories = ['电子产品', '服装配饰', '食品饮料', '家居用品', '图书音像', '运动户外', '美容护肤', '母婴用品']
quarters = ['Q1', 'Q2', 'Q3', 'Q4']
# 生成多维度数据
sales_data = np.random.randint(50, 200, size=(len(categories), len(quarters)))
growth_rate = np.random.uniform(-0.2, 0.3, size=len(categories))
market_share = np.random.uniform(0.05, 0.25, size=len(categories))
market_share = market_share / market_share.sum()
# 创建复杂的柱状图布局
fig = plt.figure(figsize=(16, 12))
gs = fig.add_gridspec(3, 2, height_ratios=[2, 1.5, 1.5], width_ratios=[2, 1])
# 主图:分组柱状图
ax1 = fig.add_subplot(gs[0, :])
x = np.arange(len(categories))
width = 0.2
colors = ['#FF6B6B', '#4ECDC4', '#45B7D1', '#96CEB4']
for i, quarter in enumerate(quarters):
offset = (i - 1.5) * width
bars = ax1.bar(x + offset, sales_data[:, i], width, label=quarter, color=colors[i], alpha=0.8)
# 添加数值标签
for bar, value in zip(bars, sales_data[:, i]):
height = bar.get_height()
ax1.text(bar.get_x() + bar.get_width()/2., height + 2,
f'{value}', ha='center', va='bottom', fontsize=8)
ax1.set_xlabel('产品类别', fontsize=12, fontweight='bold')
ax1.set_ylabel('销售额 (万元)', fontsize=12, fontweight='bold')
ax1.set_title('2023年各季度产品类别销售额对比分析', fontsize=14, fontweight='bold')
ax1.set_xticks(x)
ax1.set_xticklabels(categories, rotation=45, ha='right')
ax1.legend(loc='upper left', ncol=4, fontsize=10)
ax1.grid(True, axis='y', alpha=0.3)
ax1.set_ylim(0, max(sales_data.flatten()) * 1.15)
# 添加年度总计线
annual_total = sales_data.sum(axis=1)
ax1_twin = ax1.twinx()
ax1_twin.plot(x, annual_total, 'ko-', linewidth=2, markersize=8, label='年度总计')
ax1_twin.set_ylabel('年度总计 (万元)', fontsize=12, fontweight='bold', color='black')
ax1_twin.legend(loc='upper right', fontsize=10)
# 子图2:水平柱状图 - 增长率
ax2 = fig.add_subplot(gs[1, 0])
colors_growth = ['green' if g > 0 else 'red' for g in growth_rate]
bars2 = ax2.barh(categories, growth_rate * 100, color=colors_growth, alpha=0.7)
# 添加数值标签和零线
for bar, value in zip(bars2, growth_rate * 100):
width = bar.get_width()
label_x = width + 1 if width > 0 else width - 1
ha = 'left' if width > 0 else 'right'
ax2.text(label_x, bar.get_y() + bar.get_height()/2.,
f'{value:.1f}%', ha=ha, va='center', fontsize=9)
ax2.axvline(x=0, color='black', linewidth=1, linestyle='-')
ax2.set_xlabel('同比增长率 (%)', fontsize=11, fontweight='bold')
ax2.set_title('各类别年度同比增长率', fontsize=12, fontweight='bold')
ax2.grid(True, axis='x', alpha=0.3)
ax2.set_xlim(-30, 35)
# 子图3:堆叠柱状图 - 季度占比
ax3 = fig.add_subplot(gs[1, 1])
bottom = np.zeros(len(quarters))
for i, category in enumerate(categories):
values = sales_data[i, :] / sales_data.sum(axis=0) * 100
ax3.bar(quarters, values, bottom=bottom, label=category, alpha=0.8)
bottom += values
ax3.set_ylabel('占比 (%)', fontsize=11, fontweight='bold')
ax3.set_title('各季度类别销售占比', fontsize=12, fontweight='bold')
ax3.legend(bbox_to_anchor=(1.05, 1), loc='upper left', fontsize=8)
ax3.grid(True, axis='y', alpha=0.3)
# 子图4:市场份额饼图转柱状图
ax4 = fig.add_subplot(gs[2, :])
sorted_indices = np.argsort(market_share)[::-1]
sorted_categories = [categories[i] for i in sorted_indices]
sorted_shares = market_share[sorted_indices]
bars4 = ax4.bar(range(len(sorted_categories)), sorted_shares * 100,
color=plt.cm.Set3(np.linspace(0, 1, len(categories))), alpha=0.8)
# 添加累积线
cumulative = np.cumsum(sorted_shares) * 100
ax4_twin = ax4.twinx()
ax4_twin.plot(range(len(sorted_categories)), cumulative, 'ro-', linewidth=2, markersize=6)
ax4_twin.set_ylabel('累积市场份额 (%)', fontsize=11, fontweight='bold', color='red')
ax4_twin.set_ylim(0, 105)
ax4_twin.grid(True, axis='y', alpha=0.2, color='red', linestyle='--')
ax4.set_xlabel('产品类别', fontsize=11, fontweight='bold')
ax4.set_ylabel('市场份额 (%)', fontsize=11, fontweight='bold')
ax4.set_title('产品类别市场份额分析(帕累托图)', fontsize=12, fontweight='bold')
ax4.set_xticks(range(len(sorted_categories)))
ax4.set_xticklabels(sorted_categories, rotation=45, ha='right')
# 添加数值标签
for bar, value in zip(bars4, sorted_shares * 100):
height = bar.get_height()
ax4.text(bar.get_x() + bar.get_width()/2., height + 0.5,
f'{value:.1f}%', ha='center', va='bottom', fontsize=8)
plt.suptitle('综合销售数据分析仪表板', fontsize=16, fontweight='bold', y=1.02)
plt.tight_layout()
plt.savefig('advanced_bar_charts.png', dpi=300, bbox_inches='tight')
plt.show()
第三章:统计图表与分布可视化
统计图表在数据分析中扮演着至关重要的角色,它们能够直观地展示数据的分布特征、离散程度、中心趋势和异常值等重要信息。Matplotlib提供了丰富的统计图表类型,包括直方图、密度图、箱线图、小提琴图等,每种图表都有其独特的优势和适用场景。直方图能够清晰地展示数据的频率分布;箱线图能够同时展示数据的五数概括(最小值、下四分位数、中位数、上四分位数、最大值)和异常值;小提琴图结合了箱线图和密度图的优点,能够更全面地展示数据的分布形态。
在实际的数据分析工作中,统计图表不仅用于探索性数据分析,还常用于假设检验、模型诊断和结果展示。例如,通过Q-Q图可以检验数据是否符合正态分布;通过残差图可以诊断回归模型的拟合效果;通过热力图可以展示变量之间的相关性矩阵。掌握这些统计图表的绘制技巧和解读方法,能够大大提升我们的数据分析能力。
3.1 高级统计分布可视化
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy import stats
import seaborn as sns
plt.rcParams['axes.unicode_minus'] = False
plt.style.use('seaborn-v0_8-darkgrid')
# ===== Simulated Data =====
np.random.seed(42)
n_samples = 1000
normal_data = np.random.normal(100, 15, n_samples)
skewed_data = np.random.gamma(2, 2, n_samples) * 10 + 50
bimodal_data = np.concatenate([
np.random.normal(70, 10, n_samples // 2),
np.random.normal(120, 15, n_samples // 2)
])
uniform_data = np.random.uniform(50, 150, n_samples)
df = pd.DataFrame({
'Normal Dist.': normal_data,
'Skewed Dist.': skewed_data,
'Bimodal Dist.': bimodal_data,
'Uniform Dist.': uniform_data
})
# ===== Plot Layout =====
fig = plt.figure(figsize=(18, 14))
gs = fig.add_gridspec(4, 3, hspace=0.35, wspace=0.35)
colors = ['#FF6B6B', '#4ECDC4', '#45B7D1', '#96CEB4']
# 1. Histogram & KDE
ax1 = fig.add_subplot(gs[0, :])
for i, col in enumerate(df.columns):
data = df[col]
ax1.hist(data, bins=30, alpha=0.3, color=colors[i],
density=True, label=f'{col} (Hist)')
density = stats.gaussian_kde(data)
xs = np.linspace(data.min(), data.max(), 200)
ax1.plot(xs, density(xs), color=colors[i], linewidth=2, label=f'{col} (KDE)')
ax1.set_xlabel('Value', fontsize=12)
ax1.set_ylabel('Probability Density', fontsize=12)
ax1.set_title('Histograms & KDE of Various Distributions', fontsize=14, fontweight='bold')
ax1.legend(loc='upper right', ncol=2, fontsize=10)
ax1.grid(alpha=0.3)
# 2. Boxplot
ax2 = fig.add_subplot(gs[1, 0])
bp = ax2.boxplot([df[col] for col in df.columns], labels=df.columns,
patch_artist=True, notch=True, showmeans=True,
meanprops=dict(marker='D', markerfacecolor='red', markersize=8))
for patch, color in zip(bp['boxes'], colors):
patch.set_facecolor(color)
patch.set_alpha(0.7)
ax2.set_ylabel('Value', fontsize=12)
ax2.set_title('Boxplot Comparison', fontsize=13, fontweight='bold')
ax2.tick_params(axis='x', rotation=30)
ax2.grid(True, axis='y', alpha=0.3)
# 3. Violin Plot
ax3 = fig.add_subplot(gs[1, 1])
parts = ax3.violinplot([df[col] for col in df.columns], positions=range(len(df.columns)),
showmeans=True, showmedians=True, showextrema=True)
for pc, color in zip(parts['bodies'], colors):
pc.set_facecolor(color)
pc.set_alpha(0.7)
ax3.set_xticks(range(len(df.columns)))
ax3.set_xticklabels(df.columns, rotation=30)
ax3.set_ylabel('Value', fontsize=12)
ax3.set_title('Violin Plot: Distribution Shapes', fontsize=13, fontweight='bold')
ax3.grid(True, axis='y', alpha=0.3)
# 4. Q-Q Plot
ax4 = fig.add_subplot(gs[1, 2])
for i, col in enumerate(df.columns):
stats.probplot(df[col], dist="norm", plot=ax4)
ax4.get_lines()[i * 2].set_color(colors[i])
ax4.get_lines()[i * 2].set_alpha(0.6)
ax4.get_lines()[i * 2 + 1].set_color(colors[i])
ax4.get_lines()[i * 2 + 1].set_linewidth(2)
ax4.set_title('Q-Q Plot: Normality Check', fontsize=13, fontweight='bold')
ax4.grid(alpha=0.3)
# 5. CDF
ax5 = fig.add_subplot(gs[2, 0])
for i, col in enumerate(df.columns):
sorted_data = np.sort(df[col])
cumulative = np.arange(1, len(sorted_data) + 1) / len(sorted_data)
ax5.plot(sorted_data, cumulative, color=colors[i], linewidth=2, label=col)
ax5.set_xlabel('Value', fontsize=12)
ax5.set_ylabel('Cumulative Probability', fontsize=12)
ax5.set_title('Cumulative Distribution Function (CDF)', fontsize=13, fontweight='bold')
ax5.legend(fontsize=10)
ax5.grid(alpha=0.3)
# 6. 2D Histogram
ax6 = fig.add_subplot(gs[2, 1:])
x = normal_data
y = 0.5 * normal_data + np.random.normal(50, 10, n_samples)
h, xedges, yedges, im = ax6.hist2d(x, y, bins=30, cmap='YlOrRd', density=True)
ax6.set_xlabel('X Variable (Normal Dist.)', fontsize=12)
ax6.set_ylabel('Y Variable (Correlated)', fontsize=12)
ax6.set_title('2D Histogram: Joint Distribution', fontsize=13, fontweight='bold')
cbar = plt.colorbar(im, ax=ax6)
cbar.set_label('Probability Density', fontsize=11)
X_grid, Y_grid = np.meshgrid(xedges[:-1], yedges[:-1])
ax6.contour(X_grid.T, Y_grid.T, h, colors='black', alpha=0.4, linewidths=1)
# 7. Statistics Table
ax7 = fig.add_subplot(gs[3, :])
ax7.axis('tight')
ax7.axis('off')
stats_data = []
for col in df.columns:
stats_data.append([
col,
f'{df[col].mean():.2f}',
f'{df[col].median():.2f}',
f'{df[col].std():.2f}',
f'{stats.skew(df[col]):.2f}',
f'{stats.kurtosis(df[col]):.2f}'
])
table = ax7.table(cellText=stats_data,
colLabels=['Distribution', 'Mean', 'Median', 'Std Dev', 'Skewness', 'Kurtosis'],
cellLoc='center',
loc='center',
colWidths=[0.15]*6)
table.auto_set_font_size(False)
table.set_fontsize(11)
table.scale(1, 2)
for key, cell in table.get_celld().items():
if key[0] == 0:
cell.set_fontsize(11)
cell.set_facecolor('#40466e')
cell.set_text_props(color='white', weight='bold')
elif key[0] % 2 == 0:
cell.set_facecolor('#f0f0f0')
ax7.set_title('Statistical Summary Table', fontsize=13, fontweight='bold', y=0.95)
plt.suptitle('Comprehensive Statistical Distribution Dashboard', fontsize=16, fontweight='bold', y=0.98)
plt.savefig('statistical_distributions_EN.png', dpi=300, bbox_inches='tight')
plt.show()
第四章:散点图与相关性分析
散点图是探索变量之间关系的重要工具,它能够直观地展示两个或多个变量之间的相关性、聚类模式和异常值。在数据科学和机器学习领域,散点图常用于特征选择、聚类分析和回归诊断。通过观察散点图的模式,我们可以快速识别线性关系、非线性关系、聚类结构和离群点。Matplotlib不仅支持基本的二维散点图,还可以通过颜色、大小、形状等视觉编码展示更多维度的信息,甚至可以创建三维散点图来展示三个变量之间的关系。
在实际应用中,散点图经常与回归线、置信区间、密度估计等统计元素结合使用,以提供更深入的分析洞察。例如,在金融分析中,我们可以使用散点图展示资产收益率与风险之间的关系;在生物医学研究中,可以用散点图展示基因表达水平之间的相关性;在市场营销中,可以用散点图分析客户细分和行为模式。掌握散点图的高级技巧,能够帮助我们更好地理解数据中的复杂关系。
4.1 多维散点图与相关性矩阵
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy import stats
from matplotlib.patches import Ellipse
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.offsetbox import AnnotationBbox, OffsetImage
import os
plt.rcParams['axes.unicode_minus'] = False
# ===== 数据生成 =====
np.random.seed(42)
n_points = 500
x1 = np.random.normal(100, 20, n_points)
x2 = 0.8 * x1 + np.random.normal(50, 15, n_points)
x3 = -0.5 * x1 + np.random.normal(80, 10, n_points)
x4 = 0.3 * x2 + 0.4 * x3 + np.random.normal(60, 12, n_points)
x5 = np.random.uniform(0, 100, n_points)
categories = np.random.choice(['A', 'B', 'C', 'D'], n_points, p=[0.3, 0.3, 0.2, 0.2])
cat_colors = {'A': '#FF6B6B', 'B': '#4ECDC4', 'C': '#45B7D1', 'D': '#96CEB4'}
data = pd.DataFrame({
'Var1': x1,
'Var2': x2,
'Var3': x3,
'Var4': x4,
'Var5': x5,
'Category': categories
})
# ===== 主图布局 =====
fig = plt.figure(figsize=(22, 18))
gs = fig.add_gridspec(3, 3, hspace=0.6, wspace=0.5) # 加大间距
# -------- 1. 散点图 + 回归线 --------
ax1 = fig.add_subplot(gs[0, 0])
colors = [cat_colors[cat] for cat in data['Category']]
ax1.scatter(data['Var1'], data['Var2'], c=colors, alpha=0.6, s=30, edgecolors='black', linewidth=0.5)
z = np.polyfit(data['Var1'], data['Var2'], 1)
p = np.poly1d(z)
x_line = np.linspace(data['Var1'].min(), data['Var1'].max(), 100)
ax1.plot(x_line, p(x_line), 'r--', linewidth=2, label=f'Fit: y={z[0]:.2f}x+{z[1]:.2f}')
predict_mean_se = stats.sem(data['Var2'] - p(data['Var1']))
margin = 1.96 * predict_mean_se
ax1.fill_between(x_line, p(x_line) - margin, p(x_line) + margin, alpha=0.2, color='gray')
corr = data['Var1'].corr(data['Var2'])
ax1.text(0.05, 0.95, f'Corr: {corr:.3f}', transform=ax1.transAxes, fontsize=10,
verticalalignment='top', bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))
ax1.set_xlabel('Var1')
ax1.set_ylabel('Var2')
ax1.set_title('Scatter Plot with Linear Regression', fontsize=12, fontweight='bold')
ax1.legend(fontsize=9)
ax1.grid(True, alpha=0.3)
# -------- 2. 气泡图 --------
ax2 = fig.add_subplot(gs[0, 1])
scatter = ax2.scatter(data['Var3'], data['Var4'],
c=data['Var5'], s=data['Var1'] * 2,
alpha=0.6, cmap='viridis', edgecolors='black', linewidth=0.5)
cbar = plt.colorbar(scatter, ax=ax2)
cbar.set_label('Var5 Value', fontsize=9)
ax2.set_xlabel('Var3')
ax2.set_ylabel('Var4')
ax2.set_title('Bubble Chart: 4D info', fontsize=12, fontweight='bold')
ax2.grid(True, alpha=0.3)
# -------- 3. 类别散点图 + 椭圆 --------
ax3 = fig.add_subplot(gs[0, 2])
for cat in data['Category'].unique():
mask = data['Category'] == cat
ax3.scatter(data.loc[mask, 'Var1'], data.loc[mask, 'Var3'],
color=cat_colors[cat], label=f'Cat {cat}', alpha=0.6, s=50)
subset = data.loc[mask, ['Var1', 'Var3']]
mean = subset.mean()
cov = subset.cov()
v, w = np.linalg.eigh(cov)
v = 2. * np.sqrt(2.) * np.sqrt(v)
u = w[0] / np.linalg.norm(w[0])
angle = np.arctan2(u[1], u[0])
ell = Ellipse(mean, v[0], v[1], angle=180. + np.degrees(angle),
color=cat_colors[cat], alpha=0.2)
ax3.add_patch(ell)
ax3.set_xlabel('Var1')
ax3.set_ylabel('Var3')
ax3.set_title('Categorical Scatter with Covariance Ellipse', fontsize=12, fontweight='bold')
ax3.legend(fontsize=8)
ax3.grid(True, alpha=0.3)
# -------- 4. 3D 散点 --------
ax4 = fig.add_subplot(gs[1, :2], projection='3d')
for cat in data['Category'].unique():
mask = data['Category'] == cat
ax4.scatter(data.loc[mask, 'Var1'], data.loc[mask, 'Var2'], data.loc[mask, 'Var3'],
c=cat_colors[cat], label=f'Cat {cat}', alpha=0.6, s=30)
ax4.set_xlabel('Var1')
ax4.set_ylabel('Var2')
ax4.set_zlabel('Var3')
ax4.set_title('3D Scatter', fontsize=12, fontweight='bold')
ax4.legend(fontsize=8)
ax4.view_init(elev=20, azim=45)
# -------- 5. 相关系数矩阵 --------
ax5 = fig.add_subplot(gs[1, 2])
corr_matrix = data[['Var1', 'Var2', 'Var3', 'Var4', 'Var5']].corr()
im = ax5.imshow(corr_matrix, cmap='RdBu_r', aspect='auto', vmin=-1, vmax=1)
for i in range(len(corr_matrix)):
for j in range(len(corr_matrix)):
ax5.text(j, i, f'{corr_matrix.iloc[i, j]:.2f}', ha="center", va="center", color="black", fontsize=9)
ax5.set_xticks(np.arange(len(corr_matrix.columns)))
ax5.set_yticks(np.arange(len(corr_matrix.columns)))
ax5.set_xticklabels(corr_matrix.columns, rotation=45, ha='right', fontsize=9)
ax5.set_yticklabels(corr_matrix.columns, fontsize=9)
ax5.set_title('Correlation Matrix Heatmap', fontsize=12, fontweight='bold')
cbar = plt.colorbar(im, ax=ax5, fraction=0.046, pad=0.04)
cbar.set_label('Correlation', fontsize=9)
# -------- 6. 嵌入小型散点矩阵 --------
ax6 = fig.add_subplot(gs[2, :])
ax6.axis('off')
n_vars = 3
vars_to_plot = ['Var1', 'Var2', 'Var3']
fig_sub = plt.figure(figsize=(5, 5))
for i in range(n_vars):
for j in range(n_vars):
ax_sub = fig_sub.add_subplot(n_vars, n_vars, i * n_vars + j + 1)
if i == j:
ax_sub.hist(data[vars_to_plot[i]], bins=20, alpha=0.6, color='steelblue')
else:
ax_sub.scatter(data[vars_to_plot[j]], data[vars_to_plot[i]], alpha=0.3, s=10, color='darkblue')
ax_sub.tick_params(labelsize=6)
fig_sub.tight_layout()
temp_img = 'scatter_matrix_temp.png'
fig_sub.savefig(temp_img, dpi=100, bbox_inches='tight', pad_inches=0.1)
plt.close(fig_sub)
img = plt.imread(temp_img)
imagebox = OffsetImage(img, zoom=0.75)
ab = AnnotationBbox(imagebox, (0.5, 0.5), xycoords='axes fraction', frameon=False)
ax6.add_artist(ab)
# ===== 布局调整,确保保存完整 =====
plt.suptitle('Scatter & Correlation Analysis Dashboard', fontsize=18, fontweight='bold', y=0.99)
fig.subplots_adjust(top=0.94, bottom=0.05, left=0.05, right=0.95)
save_name = 'scatter_correlation_analysis_complete.png'
plt.savefig(save_name, dpi=300, bbox_inches='tight', pad_inches=0.1)
plt.show()
if os.path.exists(temp_img):
os.remove(temp_img)
print(f"✅ 图表已完整保存为: {save_name}")
第五章:时间序列与金融数据可视化
时间序列数据在金融、经济、气象等领域具有极其重要的地位。Matplotlib为时间序列可视化提供了强大的支持,包括日期时间轴的处理、多时间尺度的展示、金融图表的绘制等功能。在金融数据可视化中,K线图(蜡烛图)是最常用的图表类型之一,它能够在一个图形中同时展示开盘价、收盘价、最高价和最低价四个关键价格信息。除了K线图,成交量图、移动平均线、布林带、MACD等技术指标的可视化也是金融分析中不可或缺的组成部分。
时间序列可视化的挑战在于如何有效地展示不同时间尺度的模式和趋势。例如,我们可能需要同时展示日内波动、周期性模式、长期趋势和异常事件。通过合理使用子图、双轴、缩放和标注等技术,我们可以创建信息丰富且易于理解的时间序列图表。此外,交互式时间序列图表的实现也越来越重要,它允许用户动态地探索不同时间段的数据细节。
5.1 金融数据综合可视化
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.patches import Rectangle
from matplotlib.lines import Line2D
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
# 关闭中文字体设置,避免乱码
plt.rcParams['axes.unicode_minus'] = False # 允许正常显示负号
# 生成模拟股票数据
np.random.seed(42)
dates = pd.date_range(start='2023-01-01', end='2023-12-31', freq='D')
n_days = len(dates)
# 生成价格数据(模拟真实股价走势)
initial_price = 100
returns = np.random.normal(0.001, 0.02, n_days)
returns[::20] += np.random.normal(0, 0.03, len(returns[::20])) # 添加一些大的波动
price = initial_price * np.exp(np.cumsum(returns))
# 生成OHLC数据
daily_volatility = 0.02
open_prices = price * (1 + np.random.normal(0, daily_volatility / 2, n_days))
high_prices = np.maximum(open_prices, price) * (1 + np.abs(np.random.normal(0, daily_volatility, n_days)))
low_prices = np.minimum(open_prices, price) * (1 - np.abs(np.random.normal(0, daily_volatility, n_days)))
close_prices = price
# 生成成交量数据
base_volume = 1000000
volume = base_volume * (1 + np.random.normal(0, 0.3, n_days))
volume = np.abs(volume)
# 计算技术指标
# 移动平均线
ma_20 = pd.Series(close_prices).rolling(window=20).mean()
ma_60 = pd.Series(close_prices).rolling(window=60).mean()
# 布林带
rolling_std = pd.Series(close_prices).rolling(window=20).std()
upper_band = ma_20 + 2 * rolling_std
lower_band = ma_20 - 2 * rolling_std
# RSI指标计算
def calculate_rsi(prices, period=14):
delta = pd.Series(prices).diff()
gain = (delta.where(delta > 0, 0)).rolling(window=period).mean()
loss = (-delta.where(delta < 0, 0)).rolling(window=period).mean()
rs = gain / loss
rsi = 100 - (100 / (1 + rs))
return rsi
rsi = calculate_rsi(close_prices)
# MACD计算
exp1 = pd.Series(close_prices).ewm(span=12, adjust=False).mean()
exp2 = pd.Series(close_prices).ewm(span=26, adjust=False).mean()
macd = exp1 - exp2
signal = macd.ewm(span=9, adjust=False).mean()
macd_histogram = macd - signal
# 创建图形
fig = plt.figure(figsize=(18, 14))
gs = fig.add_gridspec(5, 2, height_ratios=[3, 1, 1, 1, 1], width_ratios=[3, 1],
hspace=0.05, wspace=0.2)
# 1. 主图:K线图
ax1 = fig.add_subplot(gs[0, 0])
# 绘制K线
colors = ['g' if close >= open else 'r'
for close, open in zip(close_prices, open_prices)]
for i in range(len(dates)):
ax1.plot([dates[i], dates[i]], [low_prices[i], high_prices[i]],
color=colors[i], linewidth=1, alpha=0.8)
height = close_prices[i] - open_prices[i]
bottom = open_prices[i] if height >= 0 else close_prices[i]
ax1.bar(dates[i], np.abs(height), bottom=bottom, width=0.8,
color=colors[i], alpha=0.8, edgecolor=colors[i])
# 添加移动平均线和布林带
ax1.plot(dates, ma_20, 'b-', linewidth=1.5, label='MA20', alpha=0.7)
ax1.plot(dates, ma_60, 'orange', linewidth=1.5, label='MA60', alpha=0.7)
ax1.fill_between(dates, upper_band, lower_band, alpha=0.1, color='gray', label='Bollinger Bands')
ax1.plot(dates, upper_band, 'gray', linewidth=0.5, alpha=0.5, linestyle='--')
ax1.plot(dates, lower_band, 'gray', linewidth=0.5, alpha=0.5, linestyle='--')
# 标注最高和最低点
max_idx = np.argmax(close_prices)
min_idx = np.argmin(close_prices)
ax1.annotate(f'High: {close_prices[max_idx]:.2f}',
xy=(dates[max_idx], high_prices[max_idx]),
xytext=(dates[max_idx] + timedelta(days=10), high_prices[max_idx] + 5),
arrowprops=dict(arrowstyle='->', color='red'),
fontsize=9, color='red')
ax1.annotate(f'Low: {close_prices[min_idx]:.2f}',
xy=(dates[min_idx], low_prices[min_idx]),
xytext=(dates[min_idx] + timedelta(days=10), low_prices[min_idx] - 5),
arrowprops=dict(arrowstyle='->', color='green'),
fontsize=9, color='green')
ax1.set_title('Stock Price Candlestick Chart & Technical Analysis', fontsize=14, fontweight='bold')
ax1.set_ylabel('Price (CNY)', fontsize=11)
ax1.legend(loc='upper left', fontsize=9)
ax1.grid(True, alpha=0.3)
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
ax1.xaxis.set_major_locator(mdates.MonthLocator())
# 2. 成交量图
ax2 = fig.add_subplot(gs[1, 0], sharex=ax1)
colors_vol = ['g' if close >= open else 'r'
for close, open in zip(close_prices, open_prices)]
ax2.bar(dates, volume / 1_000_000, color=colors_vol, alpha=0.6)
ax2.set_ylabel('Volume\n(Million)', fontsize=10)
ax2.grid(True, alpha=0.3)
ax2.set_ylim(0, max(volume / 1_000_000) * 1.1)
# 3. RSI指标
ax3 = fig.add_subplot(gs[2, 0], sharex=ax1)
ax3.plot(dates, rsi, 'purple', linewidth=1.5, label='RSI(14)')
ax3.axhline(y=70, color='r', linestyle='--', linewidth=0.5, alpha=0.5)
ax3.axhline(y=30, color='g', linestyle='--', linewidth=0.5, alpha=0.5)
ax3.fill_between(dates, 30, 70, alpha=0.1, color='gray')
ax3.set_ylabel('RSI', fontsize=10)
ax3.set_ylim(0, 100)
ax3.legend(loc='upper left', fontsize=8)
ax3.grid(True, alpha=0.3)
# 4. MACD指标
ax4 = fig.add_subplot(gs[3, 0], sharex=ax1)
ax4.plot(dates, macd, 'b-', linewidth=1, label='MACD')
ax4.plot(dates, signal, 'r-', linewidth=1, label='Signal')
colors_macd = ['g' if h > 0 else 'r' for h in macd_histogram]
ax4.bar(dates, macd_histogram, color=colors_macd, alpha=0.6, label='Histogram')
ax4.set_ylabel('MACD', fontsize=10)
ax4.legend(loc='upper left', fontsize=8)
ax4.grid(True, alpha=0.3)
ax4.axhline(y=0, color='black', linewidth=0.5)
# 5. 收益率分布
ax5 = fig.add_subplot(gs[4, 0], sharex=ax1)
returns_pct = pd.Series(close_prices).pct_change() * 100
colors_ret = ['g' if r > 0 else 'r' for r in returns_pct]
ax5.bar(dates, returns_pct, color=colors_ret, alpha=0.6)
ax5.set_ylabel('Daily Return\n(%)', fontsize=10)
ax5.set_xlabel('Date', fontsize=11)
ax5.grid(True, alpha=0.3)
ax5.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
# 6. 统计信息面板
ax6 = fig.add_subplot(gs[:2, 1])
ax6.axis('off')
# 计算统计数据
total_return = (close_prices[-1] - close_prices[0]) / close_prices[0] * 100
annual_return = total_return * 365 / n_days
volatility = np.std(returns_pct.dropna()) * np.sqrt(252)
sharpe_ratio = annual_return / volatility
max_drawdown = np.min(close_prices / np.maximum.accumulate(close_prices) - 1) * 100
# 创建统计信息文本(英文)
stats_text = f'''
📊 Summary Statistics
{'=' * 25}
Current Price: {close_prices[-1]:.2f} CNY
Total Return: {total_return:.2f}%
Annual Return: {annual_return:.2f}%
Annual Volatility: {volatility:.2f}%
Sharpe Ratio: {sharpe_ratio:.2f}
Max Drawdown: {max_drawdown:.2f}%
📈 Price Range
{'=' * 25}
Highest Price: {np.max(high_prices):.2f} CNY
Lowest Price: {np.min(low_prices):.2f} CNY
Average Price: {np.mean(close_prices):.2f} CNY
📊 Volume Statistics
{'=' * 25}
Average Volume: {np.mean(volume) / 1_000_000:.2f}M
Max Volume: {np.max(volume) / 1_000_000:.2f}M
Min Volume: {np.min(volume) / 1_000_000:.2f}M
'''
ax6.text(0.1, 0.9, stats_text, transform=ax6.transAxes, fontsize=10,
verticalalignment='top', family='monospace',
bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))
# 7. 月度收益热力图
ax7 = fig.add_subplot(gs[2:, 1])
monthly_returns = pd.Series(close_prices, index=dates).resample('M').last().pct_change() * 100
months = monthly_returns.index.month
monthly_values = monthly_returns.values[1:]
heatmap_data = np.zeros((1, 12))
for i, month in enumerate(months[1:]):
heatmap_data[0, month - 1] = monthly_values[i]
im = ax7.imshow(heatmap_data, cmap='RdYlGn', aspect='auto', vmin=-10, vmax=10)
ax7.set_xticks(range(12))
ax7.set_xticklabels(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'], fontsize=8)
ax7.set_yticks([])
ax7.set_title('Monthly Return Heatmap (%)', fontsize=11, fontweight='bold')
for i in range(12):
if heatmap_data[0, i] != 0:
ax7.text(i, 0, f'{heatmap_data[0, i]:.1f}',
ha="center", va="center",
color="white" if abs(heatmap_data[0, i]) > 5 else "black",
fontsize=8, fontweight='bold')
cbar = plt.colorbar(im, ax=ax7, orientation='horizontal', pad=0.1)
cbar.set_label('Return (%)', fontsize=9)
# 去除多余x轴标签
for ax in [ax1, ax2, ax3, ax4]:
plt.setp(ax.xaxis.get_majorticklabels(), visible=False)
plt.suptitle('Financial Data Analysis Dashboard', fontsize=16, fontweight='bold', y=0.98)
plt.tight_layout()
plt.savefig('financial_dashboard_en.png', dpi=300, bbox_inches='tight')
plt.show()
第六章:高级可视化技术与组合图表
在数据可视化的实践中,单一类型的图表往往无法充分展示数据的全貌。组合图表通过将多种图表类型巧妙地整合在一起,能够提供更加丰富和全面的信息展示。Matplotlib的灵活架构使得创建复杂的组合图表成为可能,我们可以在同一个图形中结合不同的坐标系、不同的图表类型,甚至不同的数据维度。这种技术在创建仪表板、报告和综合分析展示时特别有用。
高级可视化技术还包括自定义图形元素、动画效果、交互式功能等方面。通过掌握这些技术,我们可以创建出既美观又实用的数据可视化作品。例如,我们可以使用极坐标系创建雷达图来展示多维度评分;使用桑基图展示流程和转化;使用树状图展示层级关系;使用网络图展示关系网络。这些高级图表类型能够帮助我们以更直观的方式展示复杂的数据关系和模式。
6.1 雷达图与多维度分析
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from math import pi
import matplotlib.patches as patches
from matplotlib.patches import Wedge, Circle
# 设置中文显示
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
# 创建综合评估数据
np.random.seed(42)
# 产品性能评估数据
categories_radar = ['性能', '可靠性', '易用性', '功能性', '兼容性', '安全性', '效率', '可维护性']
n_cats = len(categories_radar)
# 多个产品的评分数据
products = {
'产品A': [85, 90, 75, 88, 82, 95, 78, 85],
'产品B': [78, 85, 88, 75, 90, 80, 85, 75],
'产品C': [92, 78, 80, 85, 75, 88, 90, 82],
'竞品均值': [80, 82, 78, 80, 80, 82, 80, 78]
}
# 创建复杂的组合图表
fig = plt.figure(figsize=(20, 14))
gs = fig.add_gridspec(3, 3, hspace=0.3, wspace=0.3)
# 1. 雷达图
ax1 = fig.add_subplot(gs[0, 0], projection='polar')
angles = [n / n_cats * 2 * pi for n in range(n_cats)]
angles += angles[:1]
# 设置雷达图
ax1.set_xticks(angles[:-1])
ax1.set_xticklabels(categories_radar, size=10)
ax1.set_ylim(0, 100)
ax1.set_yticks([20, 40, 60, 80, 100])
ax1.set_yticklabels(['20', '40', '60', '80', '100'], size=8)
ax1.grid(True)
# 绘制每个产品的数据
colors = ['#FF6B6B', '#4ECDC4', '#45B7D1', '#95A5A6']
for (product, values), color in zip(products.items(), colors):
values += values[:1]
ax1.plot(angles, values, 'o-', linewidth=2, label=product, color=color)
ax1.fill(angles, values, alpha=0.15, color=color)
ax1.set_title('产品综合性能雷达图', size=12, fontweight='bold', pad=20)
ax1.legend(loc='upper right', bbox_to_anchor=(1.3, 1.1), fontsize=9)
# 2. 环形图(甜甜圈图)
ax2 = fig.add_subplot(gs[0, 1])
sizes = [30, 25, 20, 15, 10]
labels = ['研发', '市场', '销售', '运营', '管理']
colors_donut = ['#FF9999', '#66B2FF', '#99FF99', '#FFD700', '#FF99CC']
explode = (0.05, 0.05, 0.05, 0.05, 0.05)
# 创建环形图
wedges, texts, autotexts = ax2.pie(sizes, labels=labels, colors=colors_donut,
autopct='%1.1f%%', startangle=90,
explode=explode, pctdistance=0.85)
# 添加中心圆使其成为环形图
centre_circle = Circle((0, 0), 0.60, fc='white')
ax2.add_artist(centre_circle)
# 美化文本
for text in texts:
text.set_fontsize(10)
for autotext in autotexts:
autotext.set_color('white')
autotext.set_fontsize(9)
autotext.set_fontweight('bold')
ax2.set_title('部门预算分配', fontsize=12, fontweight='bold')
# 在中心添加总额文本
ax2.text(0, 0, '总预算\n1000万', ha='center', va='center', fontsize=11, fontweight='bold')
# 3. 瀑布图
ax3 = fig.add_subplot(gs[0, 2])
categories_waterfall = ['初始值', '销售增长', '成本降低', '新市场', '汇率损失', '营销费用', '最终值']
values_waterfall = [100, 30, 20, 15, -10, -25, 0]
values_waterfall[-1] = sum(values_waterfall[:-1])
# 计算累积值和位置
cumulative = [values_waterfall[0]]
for v in values_waterfall[1:-1]:
cumulative.append(cumulative[-1] + v)
cumulative.append(values_waterfall[-1])
# 绘制瀑布图
colors_waterfall = ['#4CAF50' if v >= 0 else '#F44336' for v in values_waterfall]
colors_waterfall[0] = '#2196F3' # 初始值颜色
colors_waterfall[-1] = '#2196F3' # 最终值颜色
x_pos = range(len(categories_waterfall))
for i in range(len(categories_waterfall)):
if i == 0:
ax3.bar(i, values_waterfall[i], color=colors_waterfall[i], alpha=0.8)
ax3.text(i, values_waterfall[i]/2, f'{values_waterfall[i]}',
ha='center', va='center', fontweight='bold', color='white')
elif i == len(categories_waterfall) - 1:
ax3.bar(i, cumulative[i], color=colors_waterfall[i], alpha=0.8)
ax3.text(i, cumulative[i]/2, f'{cumulative[i]}',
ha='center', va='center', fontweight='bold', color='white')
else:
if values_waterfall[i] >= 0:
bottom = cumulative[i-1]
else:
bottom = cumulative[i]
ax3.bar(i, abs(values_waterfall[i]), bottom=bottom,
color=colors_waterfall[i], alpha=0.8)
ax3.text(i, bottom + abs(values_waterfall[i])/2, f'{values_waterfall[i]:+d}',
ha='center', va='center', fontweight='bold', color='white')
# 添加连接线
if i < len(categories_waterfall) - 1 and i > 0:
ax3.plot([i-0.4, i+0.4], [cumulative[i], cumulative[i]],
'k--', alpha=0.5, linewidth=1)
ax3.set_xticks(x_pos)
ax3.set_xticklabels(categories_waterfall, rotation=45, ha='right', fontsize=9)
ax3.set_ylabel('值', fontsize=10)
ax3.set_title('利润瀑布图分析', fontsize=12, fontweight='bold')
ax3.grid(True, axis='y', alpha=0.3)
# 4. 树状图
ax4 = fig.add_subplot(gs[1, :2])
ax4.set_xlim(0, 10)
ax4.set_ylim(0, 10)
ax4.axis('off')
# 创建层级数据
sizes_tree = [40, 25, 20, 15]
labels_tree = ['产品A\n40%', '产品B\n25%', '产品C\n20%', '其他\n15%']
colors_tree = ['#FF6B6B', '#4ECDC4', '#45B7D1', '#96CEB4']
# 手动创建树状图布局
positions = [(2, 7, 4, 2), (6, 7, 3, 2), (2, 4, 4, 2), (6, 4, 3, 2)]
for (x, y, w, h), label, color in zip(positions, labels_tree, colors_tree):
rect = patches.Rectangle((x, y), w, h, linewidth=2,
edgecolor='black', facecolor=color, alpha=0.7)
ax4.add_patch(rect)
ax4.text(x + w/2, y + h/2, label, ha='center', va='center',
fontsize=11, fontweight='bold', color='white')
ax4.set_title('市场份额树状图', fontsize=12, fontweight='bold', y=1.02)
# 5. 桑基图(简化版)
ax5 = fig.add_subplot(gs[1, 2])
ax5.set_xlim(0, 10)
ax5.set_ylim(0, 10)
ax5.axis('off')
# 绘制简化的桑基图
# 源节点
sources = ['原材料', '人工', '设备']
source_y = [7, 5, 3]
source_values = [40, 35, 25]
# 目标节点
targets = ['产品1', '产品2']
target_y = [6, 3]
target_values = [60, 40]
# 绘制节点
for i, (source, y, val) in enumerate(zip(sources, source_y, source_values)):
ax5.add_patch(patches.Rectangle((1, y-0.3), 1, 0.6,
facecolor='lightblue', edgecolor='black'))
ax5.text(1.5, y, f'{source}\n{val}%', ha='center', va='center', fontsize=9)
for i, (target, y, val) in enumerate(zip(targets, target_y, target_values)):
ax5.add_patch(patches.Rectangle((7, y-0.3), 1, 0.6,
facecolor='lightgreen', edgecolor='black'))
ax5.text(7.5, y, f'{target}\n{val}%', ha='center', va='center', fontsize=9)
# 绘制流动(使用贝塞尔曲线近似)
from matplotlib.patches import FancyBboxPatch, PathPatch
from matplotlib.path import Path
# 简化的连接线
connections = [
(source_y[0], target_y[0], 0.3, 'lightblue'),
(source_y[0], target_y[1], 0.2, 'lightcoral'),
(source_y[1], target_y[0], 0.25, 'lightyellow'),
(source_y[2], target_y[1], 0.15, 'lightgreen')
]
for source_y_pos, target_y_pos, width, color in connections:
vertices = [(2, source_y_pos), (5, source_y_pos),
(5, target_y_pos), (7, target_y_pos)]
codes = [Path.MOVETO, Path.CURVE4, Path.CURVE4, Path.CURVE4]
path = Path(vertices, codes)
patch = PathPatch(path, facecolor='none', edgecolor=color, linewidth=width*50, alpha=0.5)
ax5.add_patch(patch)
ax5.set_title('资源流向桑基图', fontsize=12, fontweight='bold')
# 6. 热力矩阵
ax6 = fig.add_subplot(gs[2, :])
# 创建相关性数据
departments = ['研发', '销售', '市场', '运营', '财务', '人力', '技术', '客服']
n_deps = len(departments)
collaboration_matrix = np.random.rand(n_deps, n_deps)
collaboration_matrix = (collaboration_matrix + collaboration_matrix.T) / 2
np.fill_diagonal(collaboration_matrix, 1)
# 绘制热力图
im = ax6.imshow(collaboration_matrix, cmap='YlOrRd', aspect='auto', vmin=0, vmax=1)
# 添加数值标注
for i in range(n_deps):
for j in range(n_deps):
text = ax6.text(j, i, f'{collaboration_matrix[i, j]:.2f}',
ha="center", va="center", color="black" if collaboration_matrix[i, j] > 0.5 else "white",
fontsize=8)
ax6.set_xticks(np.arange(n_deps))
ax6.set_yticks(np.arange(n_deps))
ax6.set_xticklabels(departments, rotation=45, ha='right', fontsize=9)
ax6.set_yticklabels(departments, fontsize=9)
ax6.set_title('部门协作强度热力图', fontsize=12, fontweight='bold')
# 添加颜色条
cbar = plt.colorbar(im, ax=ax6, fraction=0.046, pad=0.04)
cbar.set_label('协作强度', fontsize=10)
plt.suptitle('高级可视化技术综合展示', fontsize=16, fontweight='bold', y=0.98)
plt.tight_layout()
plt.savefig('advanced_visualization_techniques.png', dpi=300, bbox_inches='tight')
plt.show()
结语
通过本文的深入探讨,我们系统地学习了Matplotlib库的方方面面,从基础配置到高级技术,从简单图表到复杂的组合可视化。Matplotlib作为Python数据可视化的基石,其强大的功能和灵活性为我们提供了无限的创作可能。无论是进行探索性数据分析、创建专业的报告图表,还是构建复杂的可视化仪表板,Matplotlib都能够满足我们的需求。
数据可视化不仅是一门技术,更是一门艺术。它要求我们不仅要掌握工具的使用方法,还要理解数据的内在规律,选择合适的视觉表达方式,设计清晰美观的图表布局。在实际应用中,我们需要根据数据特性、分析目标和受众需求,灵活运用各种可视化技术,创造出既准确又富有洞察力的数据故事。
随着数据科学的不断发展,数据可视化技术也在持续演进。交互式可视化、实时数据展示、3D可视化等新技术不断涌现,为我们提供了更多的可能性。然而,无论技术如何发展,扎实的基础知识和良好的设计理念始终是创建优秀数据可视化作品的根本。希望本文能够帮助读者建立起完整的Matplotlib知识体系,在数据可视化的道路上走得更远。
最后,数据可视化是一个需要不断实践和探索的领域。每个数据集都有其独特的故事,每个分析场景都有其特定的需求。通过不断地练习和创新,我们能够逐步提升自己的可视化技能,创作出更加专业和富有影响力的数据可视化作品。愿每位读者都能在数据可视化的世界中找到属于自己的表达方式,用图表讲述精彩的数据故事。