西瓜数据集2.0
编号,色泽,根蒂,敲声,纹理,脐部,触感,好瓜
1, 青绿,蜷缩,浊响,清晰,凹陷,硬滑,是
2, 乌黑,蜷缩,沉闷,清晰,凹陷,硬滑,是
3, 乌黑,蜷缩,浊响,清晰,凹陷,硬滑,是
4, 青绿,蜷缩,沉闷,清晰,凹陷,硬滑,是
5, 浅白,蜷缩,浊响,清晰,凹陷,硬滑,是
6, 青绿,稍蜷,浊响,清晰,稍凹,软粘,是
7, 乌黑,稍蜷,浊响,稍糊,稍凹,软粘,是
8, 乌黑,稍蜷,浊响,清晰,稍凹,硬滑,是
9, 乌黑,稍蜷,沉闷,稍糊,稍凹,硬滑,否
10, 青绿,硬挺,清脆,清晰,平坦,软粘,否
11, 浅白,硬挺,清脆,模糊,平坦,硬滑,否
12, 浅白,蜷缩,浊响,模糊,平坦,软粘,否
13, 青绿,稍蜷,浊响,稍糊,凹陷,硬滑,否
14, 浅白,稍蜷,沉闷,稍糊,凹陷,硬滑,否
15, 乌黑,稍蜷,浊响,清晰,稍凹,软粘,否
16, 浅白,蜷缩,浊响,模糊,平坦,硬滑,否
17, 青绿,蜷缩,沉闷,稍糊,稍凹,硬滑,否
求信息熵
import math
def ent(*ps: float) -> float:
sum = 0.0
for p in ps:
if p == 0.0:
sum += 0
else:
sum += -1 * p * math.log(p, 2)
return sum
第一次划分
p1 = 8 / 17
p2 = 9 / 17
Ent = ent(p1, p2) = 0.9975025463691153
p1 = 3 / 6
p2 = 3 / 6
Ent1_1 = ent(p1, p2) = 1.0
p1 = 4 / 6
p2 = 2 / 6
Ent1_2 = ent(p1, p2) = 0.9182958340544896
p1 = 1 / 5
p2 = 4 / 5
Ent1_3 = ent(p1, p2) = 0.7219280948873623
Ent1 = 6 / 17 * Ent1_1 + 6 / 17 * Ent1_2 + 5 / 17 * Ent1_3 = 0.88937738110375
p1 = 5 / 8