Pandas 是 Python 中一个强大的数据分析和处理库,提供了高效、灵活且易于使用的数据结构,使数据清洗、分析和可视化变得更加简单,主要的数据结构有Series和DataFrame。
全部题型以及原始数据:
首先安装pandas库
pip install pandas
1.将下面的字典创建为DataFrame
data = {"grammer":["Python","C","Java","GO",np.nan,"SQL","PHP","Python"], "score":[1,2,np.nan,4,5,6,7,10]}
import pandas as pd
import numpy as np
data = {"grammer":["Python","C","Java","GO",np.nan,"SQL","PHP","Python"], "score":[1,2,np.nan,4,5,6,7,10]}
df=pd.DataFrame(data)
df
2.提取含有字符串"Python"的⾏
df[df['grammer']=='Python']
3.输出df的所有列名
df.columns
4.修改第⼆列列名为'popularity'
df.rename(columns={'score':'popularity'},inplace=True)
5.统计grammer列中每种编程语⾔出现的次数
df['grammer'].value_counts()
6.将空值⽤上下值的平均值填充
df['popularity']=df['popularity'].fillna(df['popularity'].interpolate())
7.提取popularity列中值⼤于3的⾏
df[df['popularity']>3]
8.按照grammer列进⾏去除重复值
df.drop_duplicates(['grammer'],inplace=True)
9.计算popularity列平均值
df['popularity'].mean()
10.将grammer列转换为list
df['grammer'].to_list()
11.将DataFrame保存为EXCEL
df.to_excel('test.xlsx')
12.查看数据⾏列数
df.shape
13.提取popularity列值⼤于3⼩于7的⾏
df[(df['popularity']>3) & (df['popularity']<7)]
14.交换两列位置
colu=df.columns[[1,0]]
df=df[colu]
df
15.提取popularity列最⼤值所在⾏
df[df['popularity']==df['popularity'].max()]
16.查看最后5⾏数据
df.tail()
17.删除最后⼀⾏数据
df.drop(df.tail(1).index,axis=0,inplace=True)
18.添加⼀⾏数据['Perl',6.6]
row={'popularity':6.6,'grammer':'Perl'}
df=df.append(row,ignore_index=True)
19.对数据按照"popularity"列值的⼤⼩进⾏排序
df.sort_values(by='popularity',ascending=True)
df
20.统计grammer列每个字符串的⻓度
df['grammer'].str.len()
更多练习题↓↓↓