python数据清洗的三个常用的处理方式！_python数据清洗csv文件中的空格-CSDN博客

本文链接：https://blog.csdn.net/chengxuyuan_110/article/details/128376133

关于python数据处理过程中三个主要的数据清洗说明，分别是缺失值/空格/重复值的数据清洗。

这里还是使用pandas来获取excel或者csv的数据源来进行数据处理。若是没有pandas的非标准库需要使用pip的方式安装一下。

pip install pandas

准备一下需要处理的脏数据，这里选用的是excel数据，也可以选择其他的格式数据，下面是源数据截图。

在这里插入图片描述

使用pandas的read_excel()函数读取出我们需要处理的data.xlsx文件。

# Importing the pandas library and giving it an alias of pd.
import pandas as pd

# Reading the excel file and storing it in a variable called `result_`
result_ = pd.read_excel('D:/test/data.xlsx')

# Printing the dataframe.
print(result_)

注意，若是新的python环境直接安装pandas模块后执行上面的读取excel数据代码可能会出现没有openpyxl模块的情况。

这时候，我们使用pip的方式再次安装一下openpyxl即可。

pip install openpyxl

完成后再次执行读取excel数据的代码块会成功的返回结果。

#           姓名    年龄    班级   成绩 表现
# 0   Python 集中营  10  1210   99  A
# 1   Python 集中营  11  1211  100  A
# 2   Python 集中营  12  1212  101  A
# 3   Python 集中营  13  1213  102  A
# 4   Python 集中营  14  1214  103  A
# 5   Python 集中营  15  1215  104  A
# 6   Python 集中营  16  1216  105  A
# 7   Python 集中营  17  1217  106  A
# 8   Python 集中营  18  1218  107  A
# 9   Python 集中营  19  1219  108  A
# 10  Python 集中营  20  1220  109  A
# 11  Python 集中营  21  1221  110  A
# 12  Python 集中营  22  1222  111  A
# 13  Python 集中营  23  1223  112  A
# 14  Python 集中营  24  1224  113  A
# 15  Python 集中营  25  1225  114  A
# 16  Python 集中营  26  1226  115  A
# 17  Python 集中营  27  1227  116  A
# 18  Python 集中营  28  1228  117  A
#
# Process finished with exit code 0

准备好数据源之后，我们使用三个方式来完成对源数据的数据清洗。

1.strip函数清除空格

首先，将所有的列名称提取出来，使用DataFrame对象的columns函数进行提取。

# Extracting the column names from the dataframe and storing it in a variable called `columns_`.
columns_