python 绘制多分类变量折线图

本文介绍了一种处理在绘图过程中遇到的重复年龄数据的方法,通过计算相同年龄下男女收入的平均值,解决了x轴上相同值对应不同y轴值的问题,实现了更清晰的可视化效果。

在绘图过程中, 数据点中会出现: x轴上有相同的值, 且虽然x轴上的值相同,但对应的y轴上的值却不相同

解决办法是: y轴的值取均值
由于无法导入Excel文件,所有给出导入data1的关键两列
性别男:
age =[22, 22, 26, 27, 28, 28, 29, 31, 32, 33, 33, 34, 36, 36, 37, 37, 40, 40, 40, 41, 41, 43, 46, 54]

income=[3.84, 5.1, 10.311, 7.62235, 13.40915, 9.81175, 7.79915, 8.5383, 15.8475, 6.14075, 11.70575, 16.90015, 7.4332, 8.4, 11.81885, 10.9641, 16.28885, 8.3686, 16.03515, 11.47285, 10.03015, 8.2129, 9.3126, 4.18]

性别女:
age =[20, 21, 21, 21, 22, 22, 22, 22, 23, 23, 24, 24, 24, 24, 24, 24, 25, 25, 25, 25, 25, 25, 26, 26, 26, 26, 26, 27, 27, 27, 27, 28, 28, 28, 29, 29, 29, 30, 30, 30, 30, 30, 31, 31, 33, 33, 33, 33, 33, 34, 34, 34, 34, 35, 35, 35, 35, 36, 36, 37, 38, 38, 38, 40, 41, 41, 42, 42, 43, 43, 43, 45, 45, 46, 50, 55]

income=[3.89305, 3.4939, 4.0552, 7.7886, 3.8159, 1.56, 5.04755, 4.19345, 5.02, 4.91825, 5.13065, 4.626, 2.3, 4.1, 7.3733, 6.2727, 3.15, 1.88, 8.53235, 5.729, 7.02025, 5.6223, 4.93595, 2.5, 7.6097, 5.0954, 6.20535, 4.8189, 5.33175, 4.9, 5.06555, 6.62415, 1.8, 5.9229, 5.1578, 6.35895, 6.07935, 5.22925, 9.571, 1.73, 4.3796, 3.0, 5.80935, 7.30525, 5.6684, 6.45945, 9.84195, 4.0, 2.42, 6.4276, 7.91, 2.5, 2.49, 5.8917, 5.4481, 1.5, 3.3, 6.92275, 6.17745, 5.91685, 3.26, 8.3799, 2.6, 3.71, 4.39, 7.93215, 6.6727, 1.98, 2.28, 9.1315, 4.9019, 2.81, 1.8, 3.4, 3.6, 3.99125]

1.未处理:

import os
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib

os.chdir(r'D:\pycharm程序文件\练习1')
data = pd.read_csv('creditcard_exp.csv')

matplotlib.rcParams['axes.unicode_minus']=False#解决保存图像时负号'-'显示为方块的问题
plt.rcParams['font.sans-serif'] = ['SimHei'] # 指定默认字体


# 用copy()将'Income','gender','Age'这三列取出
data1 = data[['gender','Age','Income']].copy()

# 由于x轴是年龄, 所以对data1按年龄从小到大排序
data2 = data1.sort_values(by=['Age'],ascending = True) # True为升序(也是默认值), False为降序

income_man = list(data2[data2['gender']==1]['Income'])
income_woman = list(data2[data2['gender']==0]['Income'])

age_man = list(data2[data2.gender==1]['Age'])
age_woman = list(data2[data2.gender==0]['Age'])

plt.plot(age_man,income_man,label = "男人的收入")
plt.plot(age_woman,income_woman,label="女人的收入")
plt.legend()
plt.xlabel('年龄')
plt.ylabel('年收入/万元')
plt.show()


绘图结果:
在这里插入图片描述

2.处理后:

import os
import matplotlib
import pandas as pd
import matplotlib.pyplot as plt


os.chdir(r'D:\pycharm程序文件\练习1')

data = pd.read_csv('creditcard_exp.csv')

matplotlib.rcParams['axes.unicode_minus']=False#解决保存图像时负号'-'显示为方块的问题
plt.rcParams['font.sans-serif'] = ['SimHei'] # 指定默认字体


# 取出三列需要的变量
data1 = data[['gender','Age','Income']].copy()

# 将data1 按性别分类为两份数据表格
data_man = data1[data1['gender']==1]
data_women = data1[data1['gender']==0]

# 对data_man, data_women 按年龄从小到大排序
data_man_sort = data_man.sort_values(by=['Age'],ascending=True)
data_women_sort = data_women.sort_values(by=['Age'],ascending=True)

# 将年龄data_man_sort, data_women_sort这两列取出不重复的元素
man_age_unique = data_man_sort['Age'].unique()
woman_age_unique = data_women_sort['Age'].unique()


# 将man_age_unique, woman_age_unique的转化为列表; 为了之后的遍历使用
list_man_age = list(man_age_unique)
list_women_age = list(woman_age_unique)


# 使用遍历, 将每个年龄的平均收入输出
list_man_income = []
list_women_income = []

for i in list_man_age:
    list1 = data_man_sort[data_man_sort['Age']==i]['Income']
    list_man_income.append( list1.mean() )

for i in list_women_age:
    list2 = data_women_sort[data_women_sort['Age']==i]['Income']
    list_women_income.append( list2.mean() )


x = list_man_age
y = list_man_income
x1 = list_women_age
y1 = list_women_income

# 绘制折线图和散点图
plt.plot(x,y,label = '男性:年龄~收入')
plt.scatter(x,y)
plt.plot(x1,y1,label = '女性:年龄~收入')
plt.scatter(x1,y1)
plt.legend()
plt.xlabel('年龄')
plt.ylabel('年收入/万元')
plt.title('不同性别年收入图')
plt.show()






绘图结果:
在这里插入图片描述

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值