SlideShare a Scribd company logo
Introduction to Pandas
(and Time Series Analysis)
Alexander C. S. Hendorf
@hendorf
Best-of Version
Alexander C. S. Hendorf
Königsweg GmbH
Königsweg affiliate high-tech startups and the industry
EuroPython Organisator + Programm Chair
mongoDB master 2016, MUG Leader
Speaker mongoDB days, EuroPython, PyData…
@hendorf
Introduction to Pandas and Time Series Analysis [PyCon DE]
Origin und Goals
-Open Source Python Library
-practical real-world data analysis - fast, efficient & easy
-gapless workflow (no switching to e.g. R language)
-2008 started by Wes McKinney, 

now PyData stack at Continuum Analytics ("Anaconda")
-very stable project with regular updates
-https://github.com/pydata/pandas
Main Features
-Support for CSV, Excel, JSON, SQL, SAS, clipboard, HDF5,…
-Data cleansing
-Re-shape & merge data (joins & merge) & pivoting
-Data Visualisation
-Well integrated in Jupyter (iPython) notebooks
-Database-like operations
-Performant
Today
Part 1:
Basic functionality of Pandas
Part 2:
A deeper look at the index with the TimeSeries Index
Git featuring this presentation's code examples:
https://github.com/Koenigsweg/data-timeseries-analysis-with-pandas
2014-08-21T22:50:00,12.0	
2014-08-17T13:20:00,16.0	
2014-08-06T01:20:00,14.0	
2014-09-27T06:50:00,11.0	
2014-08-25T21:50:00,13.0	
2014-08-14T05:20:00,13.0	
2014-09-14T05:20:00,16.0	
2014-08-03T02:50:00,21.0	
2014-09-29T03:00:00,13	
2014-09-06T08:20:00,16.0	
2014-08-19T07:20:00,13.0	
2014-09-27T22:50:00,10.0	
2014-08-28T08:20:00,12.0	
2014-08-17T01:00:00,14	
2014-09-27T14:00:00,17	
2014-09-10T18:00:00,18	
2014-09-22T23:00:00,8	
2014-09-20T03:00:00,9	
2014-08-29T09:50:00,16.0	
2014-08-16T01:50:00,13.0	
2014-08-28T22:00:00,14
Introduction to Pandas and Time Series Analysis [PyCon DE]
I/O and viewing data
-convention import pandas as pd
-example pd.read_csv()
-very flexible, ~40 optional parameters included (delimiter,
header, dtype, parse_dates,…)
-preview data with .head(#number of lines) and .tail(#)
Introduction to Pandas and Time Series Analysis [PyCon DE]
ax = df[:100].plot()
ax.axhline(16, color='r', linestyle='-')
df.plot(kind='bar')
Visualisation
-matplotlib (http://matplotlib.org) integrated, .plot()
-custom- and extendable, plot() returns ax
-Bar-, Area-, Scatter-, Boxplots u.a.
-Alternatives: 

Bokeh (http://bokeh.pydata.org/en/latest/)

Seaborn (https://stanford.edu/~mwaskom/software/seaborn/index.html)
Structure
pd.Series
Index
pd.DataFrame
Data
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
…
Structure: DataSeries
-one dimensional, labeled series, may contain any data type
-the label of the series is usually called index
-index automatically created if not given
-One data type, 

datatype can be set or transformed dynamically in a pythonic fashion

e. g. explicitly set
simple	series,	auto	data	type	auto,	index	auto
simple	series,	auto	data	type	auto,	index	auto
simple	series,	auto	data	type	set,	index	auto
simple	series,	auto	data	type	set,	numerical	index	given
simple	series,	auto	data	type	set,	text-label	index	given
access	via	index	/	label
access	via	index	/	position
access	multiple	via	index	/	label
access	multiple	via	index	/	position	range
access	multiple	via	index	/	multiple	positions
access	via	boolean	index	/	lambda	function
.loc()
index	label
.iloc()
index	position
.ix()
index	guessing		
label/position	fallback
.name
(column)	names
.sample()
sampling	data	set
Selecting Data
-Slicing
-Boolean indexing
series[x], series[[x, y]]
series[2], series[[2, 3]], series[2:3]
series.ix() / .iloc() / .loc()
series.sample()
Structure: DataFrame
-Twodimensional, labeled data structure of e. g.
-DataSeries
-2-D numpy.ndarray
-other DataFrames
-index automatically created if not given
Structure: Index
-Index
-automatically created if not given
-can be reset or replaced
-types: position, timestamp, time range, labels,…
-one or more dimensions
-may contain a value more than once (NOT UNIQUE!)
Examples
-work with series / calculation
-create and add a new series
-how to deal with null (NaN) values
-method calls directly from Series/ DataFrames
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
Modifying Series/DataFrames
-Methods applied to Series or DataFrames do not change them, but

return the result as Series or DataFrames
-With parameter inplace the result can be deployed directly into Series /
DataFrames
- Series can be removed from DF with drop()
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
Data Aggregation
-describe()
-groupby()
-groupby([]) & unstack()
-mean(), sum(), median(),…
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
NaN Values & Replacing
-NaN is representation of null values
-series.describe() ignore NaN
-NaNs:
-remove drop()
-replace with default
- forward- or backwards-fill, interpolate
End Part 1
-DataSeries & DataFrame
-I/O
-Data analysis & aggregation
-Indexes
-Visualisation
-Interacting with the data
Part 2
A deeper look at the index with the TimeSeries Index
-TimeSeriesIndex
-pd.to_datetime() ! US date friendly
-Data Aggregation examples
Introduction to Pandas and Time Series Analysis [PyCon DE]
before TimeSeries Index: unordered
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
Resampling
-H hourly frequency
-T minutely frequency
-S secondly frequency
-L milliseonds
-U microseconds
-N nanoseconds
-D calendar day frequency
-W weekly frequency
-M month end frequency
-Q quarter end frequency
-A year end frequency
- B business day frequency
- C custom business day frequency (experimental)
- BM business month end frequency
- CBM custom business month end frequency
- MS month start frequency
- BMS business month start frequency
- CBMS custom business month start frequency
- BQ business quarter endfrequency
- QS quarter start frequency
- BQS business quarter start frequency
- BA business year end frequency
- AS year start frequency
- BAS business year start frequency
- BH business hour frequency
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
Attributions	
Panda	Picture	
By	Ailuropoda	at	en.wikipedia	(Transferred	from	en.wikipedia)	[GFDL	(http://www.gnu.org/copyleft/fdl.html),	CC-BY-SA-3.0	(http://creativecommons.org/licenses/by-sa/
3.0/)	or	CC	BY-SA	2.5-2.0-1.0	(http://creativecommons.org/licenses/by-sa/2.5-2.0-1.0)],	from	Wikimedia	Commons
Alexander C. S. Hendorf
ah@koenigsweg.com
@hendorf
Code-Examples
https://github.com/Koenigsweg/data-timeseries-analysis-with-pandas

More Related Content

What's hot (20)

PDF
Pandas
maikroeder
 
PPTX
Python Seaborn Data Visualization
Sourabh Sahu
 
PPTX
Introduction to matplotlib
Piyush rai
 
PPTX
Introduction to numpy
Gaurav Aggarwal
 
PDF
pandas: Powerful data analysis tools for Python
Wes McKinney
 
PPTX
Presentation on data preparation with pandas
AkshitaKanther
 
PDF
Python libraries
Prof. Dr. K. Adisesha
 
PPTX
Functions in Python
Kamal Acharya
 
PDF
Python NumPy Tutorial | NumPy Array | Edureka
Edureka!
 
PDF
Introduction to NumPy
Huy Nguyen
 
PDF
Introduction to Python Pandas for Data Analytics
Phoenix
 
PPTX
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Andrew Ferlitsch
 
PPTX
Class, object and inheritance in python
Santosh Verma
 
PPTX
Pandas
Jyoti shukla
 
PDF
Introduction to NumPy (PyData SV 2013)
PyData
 
PPTX
Sorting in python
Simplilearn
 
PPTX
Sorting algorithms
Trupti Agrawal
 
PPTX
NUMPY-2.pptx
MahendraVusa
 
PDF
Python programming : Classes objects
Emertxe Information Technologies Pvt Ltd
 
PDF
pandas - Python Data Analysis
Andrew Henshaw
 
Pandas
maikroeder
 
Python Seaborn Data Visualization
Sourabh Sahu
 
Introduction to matplotlib
Piyush rai
 
Introduction to numpy
Gaurav Aggarwal
 
pandas: Powerful data analysis tools for Python
Wes McKinney
 
Presentation on data preparation with pandas
AkshitaKanther
 
Python libraries
Prof. Dr. K. Adisesha
 
Functions in Python
Kamal Acharya
 
Python NumPy Tutorial | NumPy Array | Edureka
Edureka!
 
Introduction to NumPy
Huy Nguyen
 
Introduction to Python Pandas for Data Analytics
Phoenix
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Andrew Ferlitsch
 
Class, object and inheritance in python
Santosh Verma
 
Pandas
Jyoti shukla
 
Introduction to NumPy (PyData SV 2013)
PyData
 
Sorting in python
Simplilearn
 
Sorting algorithms
Trupti Agrawal
 
NUMPY-2.pptx
MahendraVusa
 
Python programming : Classes objects
Emertxe Information Technologies Pvt Ltd
 
pandas - Python Data Analysis
Andrew Henshaw
 

Similar to Introduction to Pandas and Time Series Analysis [PyCon DE] (20)

PDF
Introduction to Pandas and Time Series Analysis [Budapest BI Forum]
Alexander Hendorf
 
PDF
Introduction to Data Analtics with Pandas [PyCon Cz]
Alexander Hendorf
 
PPTX
4)12th_L-1_PYTHON-PANDAS-I.pptx
AdityavardhanSingh15
 
PDF
SAP HANA SPS09 - Series Data
SAP Technology
 
PPTX
Pandas data transformational data structure patterns and challenges final
Rajesh M
 
PDF
Databases for Data Science
Alexander Hendorf
 
PPTX
python-pandas-For-Data-Analysis-Manipulate.pptx
PLOKESH8
 
PPTX
ETL
butest
 
PPT
A Hands-on Intro to Data Science and R Presentation.ppt
Sanket Shikhar
 
PDF
[SSA] 04.sql on hadoop(2014.02.05)
Steve Min
 
PPTX
To understand the importance of Python libraries in data analysis.
GurpinderSingh98
 
PPTX
Unit 3_Numpy_VP.pptx
vishnupriyapm4
 
PPT
Hands on Mahout!
OSCON Byrum
 
PDF
data science hot.pdf
AashishKaushal4
 
PPTX
Hybrid architecture integrateduserviewdata-peyman_mohajerian
Data Con LA
 
PPTX
Lipstick On Pig
bigdatagurus_meetup
 
PPTX
Putting Lipstick on Apache Pig at Netflix
Jeff Magnusson
 
PPTX
Netflix - Pig with Lipstick by Jeff Magnusson
Hakka Labs
 
PPTX
Migrating from Closed to Open Source - Fonda Ingram & Ken Sanford
Sri Ambati
 
PDF
An Introduction to Spark with Scala
Chetan Khatri
 
Introduction to Pandas and Time Series Analysis [Budapest BI Forum]
Alexander Hendorf
 
Introduction to Data Analtics with Pandas [PyCon Cz]
Alexander Hendorf
 
4)12th_L-1_PYTHON-PANDAS-I.pptx
AdityavardhanSingh15
 
SAP HANA SPS09 - Series Data
SAP Technology
 
Pandas data transformational data structure patterns and challenges final
Rajesh M
 
Databases for Data Science
Alexander Hendorf
 
python-pandas-For-Data-Analysis-Manipulate.pptx
PLOKESH8
 
ETL
butest
 
A Hands-on Intro to Data Science and R Presentation.ppt
Sanket Shikhar
 
[SSA] 04.sql on hadoop(2014.02.05)
Steve Min
 
To understand the importance of Python libraries in data analysis.
GurpinderSingh98
 
Unit 3_Numpy_VP.pptx
vishnupriyapm4
 
Hands on Mahout!
OSCON Byrum
 
data science hot.pdf
AashishKaushal4
 
Hybrid architecture integrateduserviewdata-peyman_mohajerian
Data Con LA
 
Lipstick On Pig
bigdatagurus_meetup
 
Putting Lipstick on Apache Pig at Netflix
Jeff Magnusson
 
Netflix - Pig with Lipstick by Jeff Magnusson
Hakka Labs
 
Migrating from Closed to Open Source - Fonda Ingram & Ken Sanford
Sri Ambati
 
An Introduction to Spark with Scala
Chetan Khatri
 
Ad

More from Alexander Hendorf (9)

PDF
Deep Learning for Fun and Profit [PyConDE 2018]
Alexander Hendorf
 
PDF
Agile Datenanalsyse - der schnelle Weg zum Mehrwert
Alexander Hendorf
 
PDF
Einführung Datenanalyse mit Pandas [data2day]
Alexander Hendorf
 
PDF
Data Mangling with mongoDB the Right Way [PyData London] 2016]
Alexander Hendorf
 
PDF
NoSQL oder: Freiheit ist nicht schmerzfrei - IT Tage
Alexander Hendorf
 
PDF
Neat Analytics with Pandas 4 3 [PyParis]
Alexander Hendorf
 
PDF
Data analysis and visualization with mongo db [mongodb world 2016]
Alexander Hendorf
 
PDF
Time travel and time series analysis with pandas + statsmodels
Alexander Hendorf
 
PDF
Data mangling with mongo db the right way [pyconit 2016]
Alexander Hendorf
 
Deep Learning for Fun and Profit [PyConDE 2018]
Alexander Hendorf
 
Agile Datenanalsyse - der schnelle Weg zum Mehrwert
Alexander Hendorf
 
Einführung Datenanalyse mit Pandas [data2day]
Alexander Hendorf
 
Data Mangling with mongoDB the Right Way [PyData London] 2016]
Alexander Hendorf
 
NoSQL oder: Freiheit ist nicht schmerzfrei - IT Tage
Alexander Hendorf
 
Neat Analytics with Pandas 4 3 [PyParis]
Alexander Hendorf
 
Data analysis and visualization with mongo db [mongodb world 2016]
Alexander Hendorf
 
Time travel and time series analysis with pandas + statsmodels
Alexander Hendorf
 
Data mangling with mongo db the right way [pyconit 2016]
Alexander Hendorf
 
Ad

Recently uploaded (20)

PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
PDF
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
PPTX
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PPTX
Powerful Uses of Data Analytics You Should Know
subhashenia
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PDF
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
BinarySearchTree in datastructures in detail
kichokuttu
 
Powerful Uses of Data Analytics You Should Know
subhashenia
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 

Introduction to Pandas and Time Series Analysis [PyCon DE]