文章目录
1 基本统计分析
------图形分析
d6.1 = read.table('clipboard', header = T)
boxplot(x1~G,d6.1)
t.test(x1~G,d6.1)
boxplot(x2~G,d6.1)
t.test(x2~G,d6.1)
Welch Two Sample t-test
data: x1 by G
t = 0.59897, df = 11.671, p-value = 0.5606
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.443696 6.043696
sample estimates:
mean in group 1 mean in group 2
0.92 -0.38
Welch Two Sample t-test
data: x2 by G
t = -3.2506, df = 17.655, p-value = 0.004527
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-11.118792 -2.381208
sample estimates:
mean in group 1 mean in group 2
2.10 8.85
2 Logistic模型分析
summary(glm(G-1~x1+x2,family=binomial,d6.1))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.81637 -0.63629 0.04472 0.54520 2.13957
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.0761 1.1082 -1.873 0.0610 .
x1 -0.1957 0.1457 -1.344 0.1791
x2 0.3813 0.1681 2.269 0.0233 *
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 27.726 on 19 degrees of freedom
Residual deviance: 17.036 on 17 degrees of freedom
AIC: 23.036
Number of Fisher Scoring iterations: 5
3 Fisher判别分析
3.1 lda函数使用方法
lda(formula,data,...)
formula 形如y~x1+x2+...的公式框架,data是数据集
3.2 类别分析标记
#直观分析
attach(d6.1) #绑定数据
plot(x1, x2);
text(x1, x2, G, adj=-0.5)
#标志点所属类别G
library(MASS)
ld=lda(G~x1+x2)
ld
Call:
lda(G ~ x1 + x2)
Prior probabilities of groups:
1 2
0.5 0.5
Group means:
x1 x2
1 0.92 2.10
2 -0.38 8.85
Coefficients of linear discriminants:
LD1
x1 -0.1035305
x2 0.2247957
y = a 1 ∗ x 1 + a 2 ∗ x 2 y=a_1 * x_1 + a_2 * x_2 y=a1∗x1+a2∗x2
3.3 新类预判
lp = predict(ld)
G1 = lp$class
data.frame(G,G1)
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 1 2
7 1 1
8 1 1
9 1 1
10 1 1
11 2 2
12 2 2
13 2 2
14 2 2
15 2 1
16 2 2
17 2 2
18 2 2
19 2 2
20 2 2
3.4 效果分析
#了解一下下
tab1 = table(G,G1);
tab1
# G1
#G 1 2
# 1 9 1
# 2 1 9
#计算符合率
sum(diag(prop.table(tab1)))
#0.9
4 距离判别分析
4.1 两总体距离判别
**马氏距离**::
$D(X,G_i) = (X-\mu_i)'(\sum_i)^{-1}(X-\mu_i)$
同方差阵-------直线判别
异方差阵-------曲线判别
4.1.1 二次判别函数qda函数使用方法
qda(formula, data, ...)
formula 一个形如groups~x1+x2..的公式框架,data数据框
4.1.2 判别
#非线性判别模型
qd = qda(G~x1+x2);qd
Call:
qda(G ~ x1 + x2)
Prior probabilities of groups:
1 2
0.5 0.5
Group means:
x1 x2
1 0.92 2.10
2 -0.38 8.85
4.1.3 分析
qp = predict(qd)
G2 = qp$class
data.frame(G,G1,G2)
#
G G1 G2
1 1 1 2
2 1 1 1
3 1 1 1
4 1 1 1
5 1 1 1
6 1 2 1
7 1 1 1
8 1 1 1
9 1 1 1
10 1 1 1
11 2 2 2
12 2 2 2
13 2 2 2
14 2 2 2
15 2 1 1
16 2 2 1
17 2 2 2
18 2 2 2
19 2 2 2
20 2 2 2
4.1.4 效果分析
tab2 = table(G,G2);tab2
G2
#G 1 2
# 1 9 1
# 2 2 8
sum(diag(prop.table(tab2)))
#[1] 0.85
4.1.5 预测
predict(qd,data.frame(x1=8.1,x2=2.0))
#$`class`
#[1] 1
#Levels: 1 2
#
#$posterior
# 1 2
#1 0.9939952 0.006004808
4.2 多总体距离判别
4.2.1 训练预测
> d6.3 = read.table('clipboard', header = T)
> attach(d6.3)
> ld3 = lda(G2~Q+C+P)
> ld3
Call:
lda(G2 ~ Q + C + P)
Prior probabilities of groups:
1 2 3
0.25 0.40 0.35
Group means:
Q C P
1 8.400000 5.900000 48.200
2 7.712500 7.250000 69.875
3 5.957143 3.714286 34.000
Coefficients of linear discriminants:
LD1 LD2
Q -0.81173396 0.88406311
C -0.63090549 0.20134565
P 0.01579385 -0.08775636
Proportion of trace:
LD1 LD2
0.7403 0.2597
4.2.2 预测对比
> lp3 = predict(ld3)
> lG3 = lp3$class
> data.frame(G2,lG3)
G2 lG3
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 2 1
7 2 2
8 2 2
9 2 2
10 2 2
11 2 2
12 2 2
13 2 3
14 3 3
15 3 3
16 3 3
17 3 3
18 3 3
19 3 3
20 3 3
4.2.3 分析效果
> ltab3 = table(G3,lG3)
Error in table(G3, lG3) : object 'G3' not found
> ltab3 = table(G2,lG3)
> ltab3
lG3
G2 1 2 3
1 5 0 0
2 1 6 1
3 0 0 7
##
4.2.4 预测
> predict(ld3, data.frame(Q = 8, C = 7.5, P = 6.5))
$`class`
[1] 1
Levels: 1 2 3
$posterior
1 2 3
1 0.9999632 3.640207e-05 4.438143e-07
$x
LD1 LD2
1 -2.461009 4.996961
5 Bayes判别分析
5.1 判别分析
> ld42 = lda(G2~Q+C+P,prior = c(5,8,7)/20)
> Z = predict(ld42)
> data.frame(G2,ld42G=Z$class)
G2 ld42G
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 2 1
7 2 2
8 2 2
9 2 2
10 2 2
11 2 2
12 2 2
13 2 3
14 3 3
15 3 3
16 3 3
17 3 3
18 3 3
19 3 3
20 3 3
5.2 效果分析
> T = table(G2,Z$class)
> T
G2 1 2 3
1 5 0 0
2 1 6 1
3 0 0 7
> sum(diag(T))/sum(T)
[1] 0.9