机器学习中的输入组件相关性、模式识别与技术优化
立即解锁
发布时间: 2025-08-23 02:00:42 阅读量: 7 订阅数: 15 


从曲线拟合到机器学习:智能系统参考
# 机器学习中的输入组件相关性、模式识别与技术优化
## 1. 输入组件相关性分析
输入是所有机器学习任务的必要基础,一个输入本身是一个数学向量,以适合机器学习任务的方式编码信息。从从业者的角度来看,输入应包含所有有用信息,即输入的每个组件都应有精确的科学意义。然而,单个输入组件对机器学习任务是否真正有意义,很难提前判断,这就引出了输入组件相关性的问题。
### 1.1 鸢尾花分类示例
以鸢尾花分类为例,鸢尾花输入由四个组件组成,分别是萼片长度(组件 1)、萼片宽度(组件 2)、花瓣长度(组件 3)和花瓣宽度(组件 4)。使用一个只有两个隐藏神经元的小型感知器和一个优化的训练集(训练比例为 24%),可以得到令人满意的分类结果:
```mathematica
Clear["Global‘*"];
<<CIP‘ExperimentalData‘
<<CIP‘Perceptron‘
<<CIP‘Graphics‘
<<CIP‘DataTransformation‘
classificationDataSet = CIP‘ExperimentalData‘GetIrisFlowerClassificationDataSet[];
numberOfHiddenNeurons = 2;
trainingFraction = 0.24;
numberOfTrainingSetOptimizationSteps = 20;
blackListLength = 20;
perceptronTrainOptimization = CIP‘Perceptron‘GetPerceptronTrainOptimization[
classificationDataSet, numberOfHiddenNeurons, trainingFraction,
numberOfTrainingSetOptimizationSteps,
UtilityOptionBlackListLength -> blackListLength];
CIP‘Perceptron‘ShowPerceptronTrainOptimization[perceptronTrainOptimization];
bestIndex = CIP‘Perceptron‘GetBestPerceptronClassOptimization[perceptronTrainOptimization];
trainingAndTestSet = perceptronTrainOptimization[[3, bestIndex]];
trainingSet = trainingAndTestSet[[1]];
testSet = trainingAndTestSet[[2]];
perceptronInfo = perceptronTrainOptimization[[4, bestIndex]];
CIP‘Perceptron‘ShowPerceptronClassificationResult[
{"CorrectClassification"}, trainingAndTestSet, perceptronInfo];
```
训练集的正确分类率为 100%,测试集的正确分类率为 99.1%。
### 1.2 输入组件相关性确定
可以通过逐个移除输入组件并重新评估机器学习任务来确定输入组件的相关性。对于鸢尾花分类,移除输入组件的顺序为 {2,1,3},这表明组件 2(萼片宽度)对分类任务的贡献最小,其次是组件 1(萼片长度)和组件 3(花瓣长度),最具影响力的组件是组件 4(花瓣宽度)。
```mathematica
perceptronInputRelevanceClass = CIP‘Perceptron‘GetPerceptronInputRelevanceClass[
trainingAndTestSet, numberOfHiddenNeurons];
CIP‘Perceptron‘ShowPerceptronInputRelevanceClass[perceptronInputRelevanceClass];
```
### 1.3 花瓣宽度的可视化
可以将三个鸢尾花物种的花瓣宽度值可视化,以展示其高预测性。
```mathematica
classIndex = 1;
class1TrainingSet = CIP‘DataTransformation‘GetSpecificClassDataSubSet[
trainingSet, classIndex];
class1TrainingInputs = class1TrainingSet[[All, 1]];
class1TrainingPetalWidths = class1TrainingInputs[[All, 4]];
class1TestSet = CIP‘DataTransformation‘GetSpecificClassDataSubSet[
testSet, classIndex];
class1TestInputs = class1TestSet[[All, 1]];
class1TestPetalWidths = class1TestInputs[[All, 4]];
classIndex = 2;
class2TrainingSet = CIP‘DataTransformation‘GetSpecificClassDataSubSet[
trainingSet, classIndex];
class2TrainingInputs = class2TrainingSet[[All, 1]];
class2TrainingPetalWidths = class2TrainingInputs[[All, 4]];
class2TestSet = CIP‘DataTransformation‘GetSpecificClassDataSubSet[
testSet, classIndex];
class2TestInputs = class2TestSet[[All, 1]];
class2TestPetalWidths = class2TestInputs[[All, 4]];
classIndex = 3;
class3TrainingSet = CIP‘DataTransformation‘GetSpecificClassDataSubSet[
trainingSet, classIndex];
class3TrainingInputs = class3TrainingSet[[All, 1]];
class3TrainingPetalWidths = class3TrainingInputs[[All, 4]];
class3TestSet = CIP‘DataTransformation‘GetSpecificClassDataSubSet[
testSet, classIndex];
class3TestInputs = class3TestSet[[All, 1]];
class3TestPetalWidths = class3TestInputs[[All, 4]];
class1TrainingPetalWidthPoints = Table[{class1TrainingPetalWidths[[i]], 1.2},
{i, Length[class1TrainingPetalWidths]}];
class1TestPetalWidthPoints = Table[{class1TestPetalWidths[[i]], 1.1},
{i, Length[class1TestPetalWidths]}];
class2TrainingPetalWidthPoints = Table[{class2TrainingPetalWidths[[i]], 1.2},
{i, Length[class2TrainingPetalWidths]}];
class2TestPetalWidthPoints = Table[{class2TestPetalWidths[[i]], 1.1},
{i, Length[class2TestPetalWidths]}];
class3TrainingPetalWidthPoints = Table[{class3TrainingPetalWidths[[i]], 1.2},
{i, Length[class3TrainingPetalWidths]}];
class3TestPetalWidthPoints = Table[{class3TestPetalWidths[[i]], 1.1},
{i, Length[class3TestPetalWidths]}];
class1TrainingPoints2DWithPlotStyle = {class1TrainingPetalWidthPoints,
{PointSize[0.03], Opacity[0.3, Black]}};
class1TestPoints2DWithPlotStyle = {class1TestPetalWidthPoints,
{PointSize[0.03], Opacity[0.3, Black]}};
class2TrainingPoints2DWithPlotStyle = {class2TrainingPetalWidthPoints,
{PointSize[0.03], Opacity[0.3, Blue]}};
class2TestPoints2DWithPlotStyle = {class2TestPetalWidthPoints,
{PointSize[0.03], Opacity[0.3, Blue]}};
class3TrainingPoints2DWithPlotStyle = {class3TrainingPetalWidthPoints,
{PointSize[0.03], Opacity[0.3, Red]}};
class3TestPoints2DWithPlotStyle = {class3TestPetalWidthPoints,
{PointSize[0.03], Opacity[0.3, Red]}};
points2DWithPlotStyleList = {class1TrainingPoints2DWithPlotStyle,
class1TestPoints2DWithPlotStyle,
class2TrainingPoints2DWithPlotStyle,
class2TestPoints2DWithPlotStyle,
class3TrainingPoints2DWithPlotStyle,
class3TestPoints2DWithPlotStyle};
labels = {"Petal width [mm]", "Output",
"Class 1 (black), 2 (blue), 3 (red)"};
argumentRange = {0.0, 26.0};
functionValueRange = {-0.1, 1.3};
pointGraphics = CIP‘Graphics‘PlotMultiple2dPoints[
points2DWithPlotStyleList, labels,
GraphicsOptionArgumentRange2D -> argumentRange,
GraphicsOptionFunctionValueRange2D -> functionValueRange];
```
### 1.4 移除部分输入组件后的分类结果
移除输入组件 1 到 3 后,使用训练集进行感知器拟合,仍然可以得到较高的预测率:
```mathematica
inputComponentsToBeRemoved = {1, 2, 3};
reducedTrainingSet = CIP‘DataTransformation‘Remo
```
0
0
复制全文
相关推荐










