SlideShare a Scribd company logo
PythonForDataScience Cheat Sheet
Pandas Basics
Learn Python for Data Science Interactively at www.DataCamp.com
Pandas
DataCamp
Learn Python for Data Science Interactively
Series
DataFrame
4
7
-5
3
D
C
B
AA one-dimensional labeled array
capable of holding any data type
Index
Index
Columns
A two-dimensional labeled
data structure with columns
of potentially different types
The Pandas library is built on NumPy and provides easy-to-use
data structures and data analysis tools for the Python
programming language.
>>> import pandas as pd
Use the following import convention:
Pandas Data Structures
>>> s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd'])
>>> data = {'Country': ['Belgium', 'India', 'Brazil'],
'Capital': ['Brussels', 'New Delhi', 'Brasília'],
'Population': [11190846, 1303171035, 207847528]}
>>> df = pd.DataFrame(data,
columns=['Country', 'Capital', 'Population'])
Selection
>>> s['b'] Get one element
-5
>>> df[1:] Get subset of a DataFrame
Country Capital Population
1 India New Delhi 1303171035
2 Brazil Brasília 207847528
By Position
>>> df.iloc([0],[0]) Select single value by row &
'Belgium' column
>>> df.iat([0],[0])
'Belgium'
By Label
>>> df.loc([0], ['Country']) Select single value by row &
'Belgium' column labels
>>> df.at([0], ['Country'])
'Belgium'
By Label/Position
>>> df.ix[2] Select single row of
Country Brazil subset of rows
Capital Brasília
Population 207847528
>>> df.ix[:,'Capital'] Select a single column of
0 Brussels subset of columns
1 New Delhi
2 Brasília
>>> df.ix[1,'Capital'] Select rows and columns
'New Delhi'
Boolean Indexing
>>> s[~(s > 1)] Series s where value is not >1
>>> s[(s < -1) | (s > 2)] s where value is <-1 or >2
>>> df[df['Population']>1200000000] Use filter to adjust DataFrame
Setting
>>> s['a'] = 6 Set index a of Series s to 6
Applying Functions
>>> f = lambda x: x*2
>>> df.apply(f) Apply function
>>> df.applymap(f) Apply function element-wise
Retrieving Series/DataFrame Information
>>> df.shape (rows,columns)
>>> df.index	 Describe index	
>>> df.columns Describe DataFrame columns
>>> df.info() Info on DataFrame
>>> df.count() Number of non-NA values
Getting
Also see NumPy Arrays
Selecting, Boolean Indexing & Setting Basic Information
Summary
>>> df.sum() Sum of values
>>> df.cumsum() Cummulative sum of values
>>> df.min()/df.max() Minimum/maximum values
>>> df.idmin()/df.idmax() Minimum/Maximum index value
>>> df.describe() Summary statistics
>>> df.mean() Mean of values
>>> df.median() Median of values
Dropping
>>> s.drop(['a', 'c']) Drop values from rows (axis=0)
>>> df.drop('Country', axis=1) Drop values from columns(axis=1)
Data Alignment
>>> s.add(s3, fill_value=0)
a 10.0
b -5.0
c 5.0
d 7.0
>>> s.sub(s3, fill_value=2)
>>> s.div(s3, fill_value=4)
>>> s.mul(s3, fill_value=3)
>>> s3 = pd.Series([7, -2, 3], index=['a', 'c', 'd'])
>>> s + s3
a 10.0
b NaN
c 5.0
d 7.0
Arithmetic Operations with Fill Methods
Internal Data Alignment
NA values are introduced in the indices that don’t overlap:
You can also do the internal data alignment yourself with
the help of the fill methods:
Sort & Rank
>>> df.sort_index(by='Country') Sort by row or column index
>>> s.order()		 Sort a series by its values
>>> df.rank() Assign ranks to entries
Belgium Brussels
India New Delhi
Brazil Brasília
1
2
3
Country Capital
11190846
1303171035
207847528
Population
I/O
Read and Write to CSV
>>> pd.read_csv('file.csv', header=None, nrows=5)
>>> pd.to_csv('myDataFrame.csv')
Read and Write to Excel
>>> pd.read_excel('file.xlsx')
>>> pd.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet1')
Read multiple sheets from the same file
>>> xlsx = pd.ExcelFile('file.xls')
>>> df = pd.read_excel(xlsx, 'Sheet1')
>>> help(pd.Series.loc)
Asking For Help
Read and Write to SQL Query or Database Table
>>> from sqlalchemy import create_engine
>>> engine = create_engine('sqlite:///:memory:')
>>> pd.read_sql("SELECT * FROM my_table;", engine)
>>> pd.read_sql_table('my_table', engine)
>>> pd.read_sql_query("SELECT * FROM my_table;", engine)
>>> pd.to_sql('myDf', engine)
read_sql()is a convenience wrapper around read_sql_table() and
read_sql_query()

More Related Content

What's hot (20)

PDF
Python For Data Science Cheat Sheet
Karlijn Willems
 
PDF
Numpy python cheat_sheet
Nishant Upadhyay
 
PPTX
Data Analysis with Python Pandas
Neeru Mittal
 
PDF
Statistics for data scientists
Ajay Ohri
 
PDF
Python matplotlib cheat_sheet
Nishant Upadhyay
 
PDF
pandas: Powerful data analysis tools for Python
Wes McKinney
 
PPTX
Presentation on data preparation with pandas
AkshitaKanther
 
PDF
Introduction to NumPy
Huy Nguyen
 
PDF
Introduction to Python Pandas for Data Analytics
Phoenix
 
PDF
Introduction to Pandas and Time Series Analysis [PyCon DE]
Alexander Hendorf
 
PPTX
Python Seaborn Data Visualization
Sourabh Sahu
 
PPTX
K-means Clustering
Sajib Sen
 
PPTX
Introduction to matplotlib
Piyush rai
 
PPTX
MatplotLib.pptx
Paras Intotech
 
ODP
Data Analysis in Python
Richard Herrell
 
PPTX
DataFrame in Python Pandas
Sangita Panchal
 
PDF
Pandas
maikroeder
 
PPT
Relational Algebra and relational queries .ppt
ShahidSultan24
 
PDF
Python Cheat Sheet
GlowTouch
 
Python For Data Science Cheat Sheet
Karlijn Willems
 
Numpy python cheat_sheet
Nishant Upadhyay
 
Data Analysis with Python Pandas
Neeru Mittal
 
Statistics for data scientists
Ajay Ohri
 
Python matplotlib cheat_sheet
Nishant Upadhyay
 
pandas: Powerful data analysis tools for Python
Wes McKinney
 
Presentation on data preparation with pandas
AkshitaKanther
 
Introduction to NumPy
Huy Nguyen
 
Introduction to Python Pandas for Data Analytics
Phoenix
 
Introduction to Pandas and Time Series Analysis [PyCon DE]
Alexander Hendorf
 
Python Seaborn Data Visualization
Sourabh Sahu
 
K-means Clustering
Sajib Sen
 
Introduction to matplotlib
Piyush rai
 
MatplotLib.pptx
Paras Intotech
 
Data Analysis in Python
Richard Herrell
 
DataFrame in Python Pandas
Sangita Panchal
 
Pandas
maikroeder
 
Relational Algebra and relational queries .ppt
ShahidSultan24
 
Python Cheat Sheet
GlowTouch
 

Viewers also liked (20)

PDF
Scikit-learn Cheatsheet-Python
Dr. Volkan OBAN
 
PDF
Follow up SPARK
Sainu Geanina
 
PPTX
Python for Data Analysis: Chapter 2
智哉 今西
 
PPT
Statistical Test
guestdbf093
 
PPTX
Intro to Python Data Analysis in Wakari
Karissa Rae McKelvey
 
PDF
Python for Data Science
Harri Hämäläinen
 
PDF
A+ cheat sheet
abnmi
 
PDF
Linux cheat-sheet
Craig Cannon
 
DOCX
Naive Bayes Example using R
Dr. Volkan OBAN
 
PDF
Python
Vinayak Hegde
 
PDF
Advanced R cheat sheet
Dr. Volkan OBAN
 
PDF
Data Exploration and Visualization with R
Yanchang Zhao
 
PPTX
Practical Data Analysis in Python
Hilary Mason
 
PPTX
Data analysis with pandas
Outreach Digital
 
PDF
2013.06.18 Time Series Analysis Workshop ..Applications in Physiology, Climat...
NUI Galway
 
PDF
Getting started with pandas
maikroeder
 
PDF
Cheat sheets for data scientists
Ajay Ohri
 
PPTX
Python and Data Analysis
Praveen Nair
 
PDF
R Reference Card for Data Mining
Yanchang Zhao
 
PDF
Regression and Classification with R
Yanchang Zhao
 
Scikit-learn Cheatsheet-Python
Dr. Volkan OBAN
 
Follow up SPARK
Sainu Geanina
 
Python for Data Analysis: Chapter 2
智哉 今西
 
Statistical Test
guestdbf093
 
Intro to Python Data Analysis in Wakari
Karissa Rae McKelvey
 
Python for Data Science
Harri Hämäläinen
 
A+ cheat sheet
abnmi
 
Linux cheat-sheet
Craig Cannon
 
Naive Bayes Example using R
Dr. Volkan OBAN
 
Advanced R cheat sheet
Dr. Volkan OBAN
 
Data Exploration and Visualization with R
Yanchang Zhao
 
Practical Data Analysis in Python
Hilary Mason
 
Data analysis with pandas
Outreach Digital
 
2013.06.18 Time Series Analysis Workshop ..Applications in Physiology, Climat...
NUI Galway
 
Getting started with pandas
maikroeder
 
Cheat sheets for data scientists
Ajay Ohri
 
Python and Data Analysis
Praveen Nair
 
R Reference Card for Data Mining
Yanchang Zhao
 
Regression and Classification with R
Yanchang Zhao
 
Ad

Similar to Python Pandas for Data Science cheatsheet (20)

PDF
Pandas pythonfordatascience
Nishant Upadhyay
 
PDF
2 pandasbasic
pramod naik
 
PDF
Getting started with Pandas Cheatsheet.pdf
SudhakarVenkey
 
PDF
pandas dataframe notes.pdf
AjeshSurejan2
 
PPTX
Python Programming.pptx
SudhakarVenkey
 
PDF
3 pandasadvanced
pramod naik
 
PPTX
DataFrame Creation.pptx
SarveshMariappan
 
PPTX
interenship.pptx
Naveen316549
 
PPTX
PANDAS IN PYTHON (Series and DataFrame)
Harshitha190299
 
PPTX
introduction to data structures in pandas
vidhyapm2
 
PPTX
Presentation on Pandas in _ detail .pptx
16115yogendraSingh
 
PPTX
Introduction To Pandas:Basics with syntax and examples.pptx
sonali sonavane
 
PPTX
dataframe_operations and various functions
JayanthiM19
 
PPTX
Python-for-Data-Analysis.pptx
ParveenShaik21
 
PPTX
Data Visualization_pandas in hadoop.pptx
Rahul Borate
 
PPTX
Pandas Dataframe reading data Kirti final.pptx
Kirti Verma
 
PPTX
Unit 1 Ch 2 Data Frames digital vis.pptx
abida451786
 
PPTX
Pythonggggg. Ghhhjj-for-Data-Analysis.pptx
sahilurrahemankhan
 
PPTX
ppanda.pptx
DOLKUMARCHANDRA
 
PPTX
PPT on Data Science Using Python
NishantKumar1179
 
Pandas pythonfordatascience
Nishant Upadhyay
 
2 pandasbasic
pramod naik
 
Getting started with Pandas Cheatsheet.pdf
SudhakarVenkey
 
pandas dataframe notes.pdf
AjeshSurejan2
 
Python Programming.pptx
SudhakarVenkey
 
3 pandasadvanced
pramod naik
 
DataFrame Creation.pptx
SarveshMariappan
 
interenship.pptx
Naveen316549
 
PANDAS IN PYTHON (Series and DataFrame)
Harshitha190299
 
introduction to data structures in pandas
vidhyapm2
 
Presentation on Pandas in _ detail .pptx
16115yogendraSingh
 
Introduction To Pandas:Basics with syntax and examples.pptx
sonali sonavane
 
dataframe_operations and various functions
JayanthiM19
 
Python-for-Data-Analysis.pptx
ParveenShaik21
 
Data Visualization_pandas in hadoop.pptx
Rahul Borate
 
Pandas Dataframe reading data Kirti final.pptx
Kirti Verma
 
Unit 1 Ch 2 Data Frames digital vis.pptx
abida451786
 
Pythonggggg. Ghhhjj-for-Data-Analysis.pptx
sahilurrahemankhan
 
ppanda.pptx
DOLKUMARCHANDRA
 
PPT on Data Science Using Python
NishantKumar1179
 
Ad

More from Dr. Volkan OBAN (20)

PDF
Conference Paper:IMAGE PROCESSING AND OBJECT DETECTION APPLICATION: INSURANCE...
Dr. Volkan OBAN
 
PDF
Covid19py Python Package - Example
Dr. Volkan OBAN
 
PDF
Object detection with Python
Dr. Volkan OBAN
 
PDF
Python - Rastgele Orman(Random Forest) Parametreleri
Dr. Volkan OBAN
 
DOCX
Linear Programming wi̇th R - Examples
Dr. Volkan OBAN
 
DOCX
"optrees" package in R and examples.(optrees:finds optimal trees in weighted ...
Dr. Volkan OBAN
 
DOCX
k-means Clustering in Python
Dr. Volkan OBAN
 
DOCX
R forecasting Example
Dr. Volkan OBAN
 
DOCX
k-means Clustering and Custergram with R
Dr. Volkan OBAN
 
PDF
Data Science and its Relationship to Big Data and Data-Driven Decision Making
Dr. Volkan OBAN
 
DOCX
Data Visualization with R.ggplot2 and its extensions examples.
Dr. Volkan OBAN
 
PDF
Pandas,scipy,numpy cheatsheet
Dr. Volkan OBAN
 
PPTX
ReporteRs package in R. forming powerpoint documents-an example
Dr. Volkan OBAN
 
PPTX
ReporteRs package in R. forming powerpoint documents-an example
Dr. Volkan OBAN
 
DOCX
R-ggplot2 package Examples
Dr. Volkan OBAN
 
DOCX
R Machine Learning packages( generally used)
Dr. Volkan OBAN
 
DOCX
treemap package in R and examples.
Dr. Volkan OBAN
 
DOCX
Mosaic plot in R.
Dr. Volkan OBAN
 
DOCX
imager package in R and examples..
Dr. Volkan OBAN
 
PDF
R-Data table Cheat Sheet
Dr. Volkan OBAN
 
Conference Paper:IMAGE PROCESSING AND OBJECT DETECTION APPLICATION: INSURANCE...
Dr. Volkan OBAN
 
Covid19py Python Package - Example
Dr. Volkan OBAN
 
Object detection with Python
Dr. Volkan OBAN
 
Python - Rastgele Orman(Random Forest) Parametreleri
Dr. Volkan OBAN
 
Linear Programming wi̇th R - Examples
Dr. Volkan OBAN
 
"optrees" package in R and examples.(optrees:finds optimal trees in weighted ...
Dr. Volkan OBAN
 
k-means Clustering in Python
Dr. Volkan OBAN
 
R forecasting Example
Dr. Volkan OBAN
 
k-means Clustering and Custergram with R
Dr. Volkan OBAN
 
Data Science and its Relationship to Big Data and Data-Driven Decision Making
Dr. Volkan OBAN
 
Data Visualization with R.ggplot2 and its extensions examples.
Dr. Volkan OBAN
 
Pandas,scipy,numpy cheatsheet
Dr. Volkan OBAN
 
ReporteRs package in R. forming powerpoint documents-an example
Dr. Volkan OBAN
 
ReporteRs package in R. forming powerpoint documents-an example
Dr. Volkan OBAN
 
R-ggplot2 package Examples
Dr. Volkan OBAN
 
R Machine Learning packages( generally used)
Dr. Volkan OBAN
 
treemap package in R and examples.
Dr. Volkan OBAN
 
Mosaic plot in R.
Dr. Volkan OBAN
 
imager package in R and examples..
Dr. Volkan OBAN
 
R-Data table Cheat Sheet
Dr. Volkan OBAN
 

Recently uploaded (20)

PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PDF
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PPTX
03_Ariane BERCKMOES_Ethias.pptx_AIBarometer_release_event
FinTech Belgium
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PPTX
Powerful Uses of Data Analytics You Should Know
subhashenia
 
PPTX
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PPTX
What Is Data Integration and Transformation?
subhashenia
 
PPTX
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PPTX
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PDF
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
PPTX
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
03_Ariane BERCKMOES_Ethias.pptx_AIBarometer_release_event
FinTech Belgium
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
Powerful Uses of Data Analytics You Should Know
subhashenia
 
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
What Is Data Integration and Transformation?
subhashenia
 
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
BinarySearchTree in datastructures in detail
kichokuttu
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 

Python Pandas for Data Science cheatsheet

  • 1. PythonForDataScience Cheat Sheet Pandas Basics Learn Python for Data Science Interactively at www.DataCamp.com Pandas DataCamp Learn Python for Data Science Interactively Series DataFrame 4 7 -5 3 D C B AA one-dimensional labeled array capable of holding any data type Index Index Columns A two-dimensional labeled data structure with columns of potentially different types The Pandas library is built on NumPy and provides easy-to-use data structures and data analysis tools for the Python programming language. >>> import pandas as pd Use the following import convention: Pandas Data Structures >>> s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd']) >>> data = {'Country': ['Belgium', 'India', 'Brazil'], 'Capital': ['Brussels', 'New Delhi', 'Brasília'], 'Population': [11190846, 1303171035, 207847528]} >>> df = pd.DataFrame(data, columns=['Country', 'Capital', 'Population']) Selection >>> s['b'] Get one element -5 >>> df[1:] Get subset of a DataFrame Country Capital Population 1 India New Delhi 1303171035 2 Brazil Brasília 207847528 By Position >>> df.iloc([0],[0]) Select single value by row & 'Belgium' column >>> df.iat([0],[0]) 'Belgium' By Label >>> df.loc([0], ['Country']) Select single value by row & 'Belgium' column labels >>> df.at([0], ['Country']) 'Belgium' By Label/Position >>> df.ix[2] Select single row of Country Brazil subset of rows Capital Brasília Population 207847528 >>> df.ix[:,'Capital'] Select a single column of 0 Brussels subset of columns 1 New Delhi 2 Brasília >>> df.ix[1,'Capital'] Select rows and columns 'New Delhi' Boolean Indexing >>> s[~(s > 1)] Series s where value is not >1 >>> s[(s < -1) | (s > 2)] s where value is <-1 or >2 >>> df[df['Population']>1200000000] Use filter to adjust DataFrame Setting >>> s['a'] = 6 Set index a of Series s to 6 Applying Functions >>> f = lambda x: x*2 >>> df.apply(f) Apply function >>> df.applymap(f) Apply function element-wise Retrieving Series/DataFrame Information >>> df.shape (rows,columns) >>> df.index Describe index >>> df.columns Describe DataFrame columns >>> df.info() Info on DataFrame >>> df.count() Number of non-NA values Getting Also see NumPy Arrays Selecting, Boolean Indexing & Setting Basic Information Summary >>> df.sum() Sum of values >>> df.cumsum() Cummulative sum of values >>> df.min()/df.max() Minimum/maximum values >>> df.idmin()/df.idmax() Minimum/Maximum index value >>> df.describe() Summary statistics >>> df.mean() Mean of values >>> df.median() Median of values Dropping >>> s.drop(['a', 'c']) Drop values from rows (axis=0) >>> df.drop('Country', axis=1) Drop values from columns(axis=1) Data Alignment >>> s.add(s3, fill_value=0) a 10.0 b -5.0 c 5.0 d 7.0 >>> s.sub(s3, fill_value=2) >>> s.div(s3, fill_value=4) >>> s.mul(s3, fill_value=3) >>> s3 = pd.Series([7, -2, 3], index=['a', 'c', 'd']) >>> s + s3 a 10.0 b NaN c 5.0 d 7.0 Arithmetic Operations with Fill Methods Internal Data Alignment NA values are introduced in the indices that don’t overlap: You can also do the internal data alignment yourself with the help of the fill methods: Sort & Rank >>> df.sort_index(by='Country') Sort by row or column index >>> s.order() Sort a series by its values >>> df.rank() Assign ranks to entries Belgium Brussels India New Delhi Brazil Brasília 1 2 3 Country Capital 11190846 1303171035 207847528 Population I/O Read and Write to CSV >>> pd.read_csv('file.csv', header=None, nrows=5) >>> pd.to_csv('myDataFrame.csv') Read and Write to Excel >>> pd.read_excel('file.xlsx') >>> pd.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet1') Read multiple sheets from the same file >>> xlsx = pd.ExcelFile('file.xls') >>> df = pd.read_excel(xlsx, 'Sheet1') >>> help(pd.Series.loc) Asking For Help Read and Write to SQL Query or Database Table >>> from sqlalchemy import create_engine >>> engine = create_engine('sqlite:///:memory:') >>> pd.read_sql("SELECT * FROM my_table;", engine) >>> pd.read_sql_table('my_table', engine) >>> pd.read_sql_query("SELECT * FROM my_table;", engine) >>> pd.to_sql('myDf', engine) read_sql()is a convenience wrapper around read_sql_table() and read_sql_query()