SlideShare a Scribd company logo
Introduction Data Analytics
with Pandas
Alexander C. S. Hendorf
@hendorf
Alexander C. S. Hendorf
Königsweg GmbH
Strategic data consulting for startups and the industry.
EuroPython & PyConDE 

Organisator + Programm Chair
mongoDB master, PSF managing member
Speaker mongoDB days, EuroPython, PyData…
@hendorf
Introduction to Data Analtics with Pandas [PyCon Cz]
Origin und Goals
-Open Source Python Library
-practical real-world data analysis - fast, efficient & easy
-gapless workflow (no switching to e.g. R language)
-2008 started by Wes McKinney, 

now PyData stack at Continuum Analytics ("Anaconda")
-very stable project with regular updates
-https://github.com/pydata/pandas
Main Features
-Support for CSV, Excel, JSON, SQL, SAS, clipboard, HDF5,…
-Data cleansing
-Re-shape & merge data (joins & merge) & pivoting
-Data Visualisation
-Well integrated in Jupyter (iPython) notebooks
-Database-like operations
-Performant
Today
Basic functionality of Pandas
Git featuring this presentation's code examples:
https://github.com/Koenigsweg/data-timeseries-analysis-with-pandas
2014-08-21T22:50:00,12.0
2014-08-17T13:20:00,16.0
2014-08-06T01:20:00,14.0
2014-09-27T06:50:00,11.0
2014-08-25T21:50:00,13.0
2014-08-14T05:20:00,13.0
2014-09-14T05:20:00,16.0
2014-08-03T02:50:00,21.0
2014-09-29T03:00:00,13
2014-09-06T08:20:00,16.0
2014-08-19T07:20:00,13.0
2014-09-27T22:50:00,10.0
2014-08-28T08:20:00,12.0
2014-08-17T01:00:00,14
2014-09-27T14:00:00,17
2014-09-10T18:00:00,18
2014-09-22T23:00:00,8
2014-09-20T03:00:00,9
2014-08-29T09:50:00,16.0
2014-08-16T01:50:00,13.0
2014-08-28T22:00:00,14
Introduction to Data Analtics with Pandas [PyCon Cz]
I/O and viewing data
-convention import pandas as pd
-example pd.read_csv()
-very flexible, ~40 optional parameters included (delimiter,
header, dtype, parse_dates,…)
-preview data with .head(#number of lines) and .tail(#)
Introduction to Data Analtics with Pandas [PyCon Cz]
ax = df[:100].plot()
ax.axhline(16, color='r', linestyle='-')
df.plot(kind='bar')
Visualisation
-matplotlib (http://matplotlib.org) integrated, .plot()
-custom- and extendable, plot() returns ax
-Bar-, Area-, Scatter-, Boxplots u.a.
-Alternatives: 

Bokeh (http://bokeh.pydata.org/en/latest/)

Seaborn (https://stanford.edu/~mwaskom/software/seaborn/index.html)
Structure
pd.Series
Index
pd.DataFrame
Data
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
…
Structure: DataSeries
-one dimensional, labeled series, may contain any data type
-the label of the series is usually called index
-index automatically created if not given
-One data type, 

datatype can be set or transformed dynamically in a pythonic fashion

e. g. explicitly set
simple series, auto data type auto, index auto
simple series, auto data type auto, index auto
simple series, auto data type set, index auto
simple series, auto data type set, numerical index given
simple series, auto data type set, text-label index given
access via index / label
access via index / position
access multiple via index / label
access multiple via index / position range
access multiple via index / multiple positions
access via boolean index / lambda function
.loc()
index label
.iloc()
index position
.ix()
index guessing
label/position
fallback
X
.name
(column) names
.sample()
sampling data set
Selecting Data
-Slicing
-Boolean indexing
series[x], series[[x, y]]
series[2], series[[2, 3]], series[2:3]
series.iloc() / .loc()
series.sample()
Structure: DataFrame
-Twodimensional, labeled data structure of e. g.
-DataSeries
-2-D numpy.ndarray
-other DataFrames
-index automatically created if not given
Structure: Index
-Index
-automatically created if not given
-can be reset or replaced
-types: position, timestamp, time range, labels,…
-one or more dimensions
-may contain a value more than once (NOT UNIQUE!)
Examples
-work with series / calculation
-create and add a new series
-how to deal with null (NaN) values
-method calls directly from Series/ DataFrames
Introduction to Data Analtics with Pandas [PyCon Cz]
Introduction to Data Analtics with Pandas [PyCon Cz]
Introduction to Data Analtics with Pandas [PyCon Cz]
Introduction to Data Analtics with Pandas [PyCon Cz]
Introduction to Data Analtics with Pandas [PyCon Cz]
Modifying Series/DataFrames
-Methods applied to Series or DataFrames do not change them, but

return the result as Series or DataFrames
-With parameter inplace the result can be deployed directly into Series /
DataFrames
- Series can be removed from DF with drop()
Introduction to Data Analtics with Pandas [PyCon Cz]
Introduction to Data Analtics with Pandas [PyCon Cz]
Introduction to Data Analtics with Pandas [PyCon Cz]
Introduction to Data Analtics with Pandas [PyCon Cz]
Data Aggregation
-describe()
-groupby()
-groupby([]) & unstack()
-mean(), sum(), median(),…
Introduction to Data Analtics with Pandas [PyCon Cz]
Introduction to Data Analtics with Pandas [PyCon Cz]
NaN Values & Replacing
-NaN is representation of null values
-series.describe() ignore NaN
-NaNs:
-remove drop()
-replace with default
- forward- or backwards-fill, interpolate
End Part 1
-DataSeries & DataFrame
-I/O
-Data analysis & aggregation
-Indexes
-Visualisation
-Interacting with the data
Example Indexes
A deeper look at the index with the TimeSeries Index
-TimeSeriesIndex
-pd.to_datetime() ! US date friendly
-Data Aggregation examples
Introduction to Data Analtics with Pandas [PyCon Cz]
before TimeSeries Index: unordered
Introduction to Data Analtics with Pandas [PyCon Cz]
Introduction to Data Analtics with Pandas [PyCon Cz]
Introduction to Data Analtics with Pandas [PyCon Cz]
Introduction to Data Analtics with Pandas [PyCon Cz]
Introduction to Data Analtics with Pandas [PyCon Cz]
Introduction to Data Analtics with Pandas [PyCon Cz]
Introduction to Data Analtics with Pandas [PyCon Cz]
Introduction to Data Analtics with Pandas [PyCon Cz]
Resampling
-H hourly frequency
-T minutely frequency
-S secondly frequency
-L milliseonds
-U microseconds
-N nanoseconds
-D calendar day frequency
-W weekly frequency
-M month end frequency
-Q quarter end frequency
-A year end frequency
- B business day frequency
- C custom business day frequency (experimental)
- BM business month end frequency
- CBM custom business month end frequency
- MS month start frequency
- BMS business month start frequency
- CBMS custom business month start frequency
- BQ business quarter endfrequency
- QS quarter start frequency
- BQS business quarter start frequency
- BA business year end frequency
- AS year start frequency
- BAS business year start frequency
- BH business hour frequency
Attributions
Panda Picture
By Ailuropoda at en.wikipedia (Transferred from en.wikipedia) [GFDL (http://www.gnu.org/copyleft/fdl.html), CC-BY-SA-3.0 (http://
creativecommons.org/licenses/by-sa/3.0/) or CC BY-SA 2.5-2.0-1.0 (http://creativecommons.org/licenses/by-sa/2.5-2.0-1.0)], from
Wikimedia Commons
#16
180+sessions
18free
trainings
panels open
spaces
5dtalks &
trainings
2dsprints
beginners’ day
Tickets start @ 375€
Rimini
. Venice !
Bologna ! ✈ .
Florence ! . #
$
Armin Rohnacher • Katharine Jarmul • Tracy Osborn
Jan Willem Tulp • Aisha Bello & Daniele Procida
interactive
sessions
Extra discounts for students & post docs.
Django Girls
25. - 27. October 2017
PyCon.de
Alexander C. S. Hendorf
ah@koenigsweg.com
@hendorf
Code-Examples
https://github.com/Koenigsweg/data-timeseries-analysis-with-pandas

More Related Content

What's hot (20)

PDF
Introduction to R for data science
Long Nguyen
 
PPTX
Basic of python for data analysis
Pramod Toraskar
 
PDF
Data Structures for Statistical Computing in Python
Wes McKinney
 
PPTX
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
Fabio Fumarola
 
PDF
pandas - Python Data Analysis
Andrew Henshaw
 
PPTX
First impressions of SparkR: our own machine learning algorithm
InfoFarm
 
PPTX
Pandas
Jyoti shukla
 
PDF
Three Functional Programming Technologies for Big Data
Dynamical Software, Inc.
 
PPTX
Frequent Itemset Mining(FIM) on BigData
Raju Gupta
 
PPTX
Hadoop eco system-first class
alogarg
 
PPTX
Introduction to Map Reduce
Apache Apex
 
PPTX
statistical computation using R- an intro..
Kamarudheen KV
 
PDF
8. R Graphics with R
FAO
 
PDF
Faster persistent data structures through hashing
Johan Tibell
 
PDF
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
soujavajug
 
PDF
R statistics with mongo db
MongoDB
 
PDF
Introduction to TensorFlow
Matthias Feys
 
PDF
20190909_PGconf.ASIA_KaiGai
Kohei KaiGai
 
PPT
Map Reduce
Michel Bruley
 
PDF
5 R Tutorial Data Visualization
Sakthi Dasans
 
Introduction to R for data science
Long Nguyen
 
Basic of python for data analysis
Pramod Toraskar
 
Data Structures for Statistical Computing in Python
Wes McKinney
 
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
Fabio Fumarola
 
pandas - Python Data Analysis
Andrew Henshaw
 
First impressions of SparkR: our own machine learning algorithm
InfoFarm
 
Pandas
Jyoti shukla
 
Three Functional Programming Technologies for Big Data
Dynamical Software, Inc.
 
Frequent Itemset Mining(FIM) on BigData
Raju Gupta
 
Hadoop eco system-first class
alogarg
 
Introduction to Map Reduce
Apache Apex
 
statistical computation using R- an intro..
Kamarudheen KV
 
8. R Graphics with R
FAO
 
Faster persistent data structures through hashing
Johan Tibell
 
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
soujavajug
 
R statistics with mongo db
MongoDB
 
Introduction to TensorFlow
Matthias Feys
 
20190909_PGconf.ASIA_KaiGai
Kohei KaiGai
 
Map Reduce
Michel Bruley
 
5 R Tutorial Data Visualization
Sakthi Dasans
 

Similar to Introduction to Data Analtics with Pandas [PyCon Cz] (20)

PDF
Introduction to Pandas and Time Series Analysis [Budapest BI Forum]
Alexander Hendorf
 
PPTX
pandasppt with informative topics coverage.pptx
vallarasu200364
 
PPT
Python Panda Library for python programming.ppt
tejaskumbhani111
 
PDF
PyData Paris 2015 - Track 1.2 Gilles Louppe
Pôle Systematic Paris-Region
 
PPTX
python-pandas-For-Data-Analysis-Manipulate.pptx
PLOKESH8
 
PPTX
unit 5_Real time Data Analysis vsp.pptx
prakashvs7
 
PPTX
Presentation on the basic of numpy and Pandas
ipazhaniraj
 
PDF
Lecture on Python Pandas for Decision Making
ssuser46aec4
 
PDF
Python pandas I .pdf gugugigg88iggigigih
rajveerpersonal21
 
PPTX
Unit 3_Numpy_Vsp.pptx
prakashvs7
 
PPT
Python Pandas
Sunil OS
 
PPTX
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
Ogunsina1
 
PPTX
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python (3).pptx
smartashammari
 
PPTX
Introduction To Pandas:Basics with syntax and examples.pptx
sonali sonavane
 
PPTX
Unit 3_Numpy_VP.pptx
vishnupriyapm4
 
PDF
pandas: Powerful data analysis tools for Python
Wes McKinney
 
PDF
pandas.pdf
AjeshSurejan2
 
PDF
pandas (1).pdf
AjeshSurejan2
 
PPTX
Data Analysis with Python Pandas
Neeru Mittal
 
ODP
Data analysis using python
Himanshu Awasthi
 
Introduction to Pandas and Time Series Analysis [Budapest BI Forum]
Alexander Hendorf
 
pandasppt with informative topics coverage.pptx
vallarasu200364
 
Python Panda Library for python programming.ppt
tejaskumbhani111
 
PyData Paris 2015 - Track 1.2 Gilles Louppe
Pôle Systematic Paris-Region
 
python-pandas-For-Data-Analysis-Manipulate.pptx
PLOKESH8
 
unit 5_Real time Data Analysis vsp.pptx
prakashvs7
 
Presentation on the basic of numpy and Pandas
ipazhaniraj
 
Lecture on Python Pandas for Decision Making
ssuser46aec4
 
Python pandas I .pdf gugugigg88iggigigih
rajveerpersonal21
 
Unit 3_Numpy_Vsp.pptx
prakashvs7
 
Python Pandas
Sunil OS
 
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
Ogunsina1
 
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python (3).pptx
smartashammari
 
Introduction To Pandas:Basics with syntax and examples.pptx
sonali sonavane
 
Unit 3_Numpy_VP.pptx
vishnupriyapm4
 
pandas: Powerful data analysis tools for Python
Wes McKinney
 
pandas.pdf
AjeshSurejan2
 
pandas (1).pdf
AjeshSurejan2
 
Data Analysis with Python Pandas
Neeru Mittal
 
Data analysis using python
Himanshu Awasthi
 
Ad

More from Alexander Hendorf (10)

PDF
Deep Learning for Fun and Profit [PyConDE 2018]
Alexander Hendorf
 
PDF
Databases for Data Science
Alexander Hendorf
 
PDF
Agile Datenanalsyse - der schnelle Weg zum Mehrwert
Alexander Hendorf
 
PDF
Einführung Datenanalyse mit Pandas [data2day]
Alexander Hendorf
 
PDF
Data Mangling with mongoDB the Right Way [PyData London] 2016]
Alexander Hendorf
 
PDF
NoSQL oder: Freiheit ist nicht schmerzfrei - IT Tage
Alexander Hendorf
 
PDF
Neat Analytics with Pandas 4 3 [PyParis]
Alexander Hendorf
 
PDF
Data analysis and visualization with mongo db [mongodb world 2016]
Alexander Hendorf
 
PDF
Time travel and time series analysis with pandas + statsmodels
Alexander Hendorf
 
PDF
Data mangling with mongo db the right way [pyconit 2016]
Alexander Hendorf
 
Deep Learning for Fun and Profit [PyConDE 2018]
Alexander Hendorf
 
Databases for Data Science
Alexander Hendorf
 
Agile Datenanalsyse - der schnelle Weg zum Mehrwert
Alexander Hendorf
 
Einführung Datenanalyse mit Pandas [data2day]
Alexander Hendorf
 
Data Mangling with mongoDB the Right Way [PyData London] 2016]
Alexander Hendorf
 
NoSQL oder: Freiheit ist nicht schmerzfrei - IT Tage
Alexander Hendorf
 
Neat Analytics with Pandas 4 3 [PyParis]
Alexander Hendorf
 
Data analysis and visualization with mongo db [mongodb world 2016]
Alexander Hendorf
 
Time travel and time series analysis with pandas + statsmodels
Alexander Hendorf
 
Data mangling with mongo db the right way [pyconit 2016]
Alexander Hendorf
 
Ad

Recently uploaded (20)

PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PDF
Executive Business Intelligence Dashboards
vandeslie24
 
PPTX
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
PPTX
Platform for Enterprise Solution - Java EE5
abhishekoza1981
 
PPT
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
PDF
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
PPTX
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
PPTX
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
PPTX
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
PDF
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
PDF
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
PPTX
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PPTX
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 
PPTX
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PDF
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
Executive Business Intelligence Dashboards
vandeslie24
 
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
Platform for Enterprise Solution - Java EE5
abhishekoza1981
 
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 

Introduction to Data Analtics with Pandas [PyCon Cz]