R语言判别分析--实验总结

原创已于 2022-10-18 12:47:00 修改 · 5.4k 阅读

47 ·

CC 4.0 BY-SA版权

文章标签：

#r语言 #开发语言 #机器学习

于 2018-12-04 21:27:17 首次发布

❥基于R的数据分析专栏收录该内容

7 篇文章

订阅专栏

本文深入探讨了统计分析、Logistic模型、Fisher判别、距离判别及Bayes判别的应用。通过实例展示了不同分析方法的效果评估，包括图形分析、模型系数解读及预测准确率计算。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1 基本统计分析

------图形分析

d6.1 = read.table('clipboard', header = T)
boxplot(x1~G,d6.1)
t.test(x1~G,d6.1)
boxplot(x2~G,d6.1)
t.test(x2~G,d6.1)


Welch Two Sample t-test

data:  x1 by G
t = 0.59897, df = 11.671, p-value = 0.5606
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -3.443696  6.043696
sample estimates:
mean in group 1 mean in group 2 
           0.92           -0.38 
           
Welch Two Sample t-test

data:  x2 by G
t = -3.2506, df = 17.655, p-value = 0.004527
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -11.118792  -2.381208
sample estimates:
mean in group 1 mean in group 2 
           2.10            8.85

2 Logistic模型分析

summary(glm(G-1~x1+x2,family=binomial,d6.1))

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.81637  -0.63629   0.04472   0.54520   2.13957  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)  -2.0761     1.1082  -1.873   0.0610 .
x1           -0.1957     0.1457  -1.344   0.1791  
x2            0.3813     0.1681   2.269   0.0233 *
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 27.726  on 19  degrees of freedom
Residual deviance: 17.036  on 17  degrees of freedom
AIC: 23.036

Number of Fisher Scoring iterations: 5

3 Fisher判别分析

3.1 lda函数使用方法

lda(formula,data,...)
formula 形如y~x1+x2+...的公式框架,data是数据集

3.2 类别分析标记

#直观分析
attach(d6.1) #绑定数据
plot(x1, x2);
text(x1, x2, G, adj=-0.5)
#标志点所属类别G
library(MASS)
ld=lda(G~x1+x2)
ld

Call:
lda(G ~ x1 + x2)

Prior probabilities of groups:
  1   2 
0.5 0.5 

Group means:
     x1   x2
1  0.92 2.10
2 -0.38 8.85

Coefficients of linear discriminants:
          LD1
x1 -0.1035305
x2  0.2247957

$y=a_1 * x_1 + a_2 * x_2$

3.3 新类预判

lp = predict(ld)
G1 = lp$class
data.frame(G,G1)

3.4 效果分析

#了解一下下
tab1 = table(G,G1);
tab1

#   G1
#G   1 2
#  1 9 1
#  2 1 9
  
  
#计算符合率
sum(diag(prop.table(tab1)))
#0.9

4 距离判别分析

4.1 两总体距离判别

**马氏距离**::
$D(X,G_i) = (X-\mu_i)'(\sum_i)^{-1}(X-\mu_i)$

同方差阵-------直线判别
异方差阵-------曲线判别

4.1.1 二次判别函数qda函数使用方法

qda(formula, data, ...)
formula 一个形如groups~x1+x2..的公式框架,data数据框

4.1.2 判别

#非线性判别模型
qd = qda(G~x1+x2);qd

Call:
qda(G ~ x1 + x2)

Prior probabilities of groups:
  1   2 
0.5 0.5 

Group means:
     x1   x2
1  0.92 2.10
2 -0.38 8.85

4.1.3 分析

qp = predict(qd)
G2 = qp$class
data.frame(G,G1,G2)

4.1.4 效果分析

tab2 = table(G,G2);tab2
 G2
#G   1 2
#  1 9 1
#  2 2 8
sum(diag(prop.table(tab2)))
#[1] 0.85

4.1.5 预测

predict(qd,data.frame(x1=8.1,x2=2.0))
#$`class`
#[1] 1
#Levels: 1 2
#
#$posterior
#          1           2
#1 0.9939952 0.006004808

4.2 多总体距离判别

4.2.1 训练预测

> d6.3 = read.table('clipboard', header = T)
> attach(d6.3)
> ld3 = lda(G2~Q+C+P)
> ld3
Call:
lda(G2 ~ Q + C + P)

Prior probabilities of groups:
   1    2    3 
0.25 0.40 0.35 

Group means:
         Q        C      P
1 8.400000 5.900000 48.200
2 7.712500 7.250000 69.875
3 5.957143 3.714286 34.000

Coefficients of linear discriminants:
          LD1         LD2
Q -0.81173396  0.88406311
C -0.63090549  0.20134565
P  0.01579385 -0.08775636

Proportion of trace:
   LD1    LD2 
0.7403 0.2597

4.2.2 预测对比

> lp3 = predict(ld3)
> lG3 = lp3$class
> data.frame(G2,lG3)
   G2 lG3
1   1   1
2   1   1
3   1   1
4   1   1
5   1   1
6   2   1
7   2   2
8   2   2
9   2   2
10  2   2
11  2   2
12  2   2
13  2   3
14  3   3
15  3   3
16  3   3
17  3   3
18  3   3
19  3   3
20  3   3

4.2.3 分析效果

> ltab3 = table(G3,lG3)
Error in table(G3, lG3) : object 'G3' not found
> ltab3 = table(G2,lG3)
> ltab3
   lG3
G2  1 2 3
  1 5 0 0
  2 1 6 1
  3 0 0 7
 ##

4.2.4 预测

> predict(ld3, data.frame(Q = 8, C = 7.5, P = 6.5))
	$`class`
	[1] 1
	Levels: 1 2 3
	
	$posterior
	          1            2            3
	1 0.9999632 3.640207e-05 4.438143e-07
	
	$x
	        LD1      LD2
	1 -2.461009 4.996961

5 Bayes判别分析

5.1 判别分析

> ld42 = lda(G2~Q+C+P,prior = c(5,8,7)/20)
> Z = predict(ld42)
> data.frame(G2,ld42G=Z$class)
   G2 ld42G
1   1     1
2   1     1
3   1     1
4   1     1
5   1     1
6   2     1
7   2     2
8   2     2
9   2     2
10  2     2
11  2     2
12  2     2
13  2     3
14  3     3
15  3     3
16  3     3
17  3     3
18  3     3
19  3     3
20  3     3

5.2 效果分析

> T = table(G2,Z$class)
> T
   
G2  1 2 3
  1 5 0 0
  2 1 6 1
  3 0 0 7
> sum(diag(T))/sum(T)
[1] 0.9