
CatBoost-shap集成模型在回归问题上的应用及变量解释分析
先扔个完整代码镇楼(波士顿房价预测实战):
```python
from catboost import CatBoostRegressor, Pool
import shap
import pandas as pd
import matplotlib.pyplot as plt
# 加载数据
data = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Boston
Housing.csv')
X = data.iloc[:,:-1]
y = data.iloc[:,-1]
# 训练CatBoost
model = CatBoostRegressor(iterations=300,
depth=5,
learning_rate=0.1,
verbose=0)
cat_features = list(X.select_dtypes(include='object').columns)
model.fit(X, y, cat_features=cat_features)
# SHAP魔法开始
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(Pool(X, cat_features=cat_features))
# 特征重要性全景图
plt.figure(figsize=(10,6))
shap.summary_plot(shap_values, X, plot_type="bar")
plt.title('特征影响力排行榜')
plt.tight_layout()
# 单样本决策推演