SlideShare a Scribd company logo
Intro to Python Data Analysis in
Wakari
Karissa McKelvey
Software Developer
Continuum Analytics
@karissamck
November 8, 2013
PyData NYC
$ WHOAMI

karissamck.com
@karissamck
truthy.indiana.edu
More Tweets, Mote Votes
MY GOALS
Get you excited about data analysis in Wakari
Walk through some basic analysis packages
and wakari workflows
Kick-start your journey
WHO ARE YOU?
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in Wakari
Putting Science back in Comp Sci
• Much of the software stack is for systems
programming --- C++, Java, .NET, ObjC, web
- Complex numbers?
- Vectorized primitives?
• Software stack for scientists is not as helpful
as it should be
• Fortran is still where many scientists end up
Intro to Python Data Analysis in Wakari
Why Python?
High Performance with BIG DATA
Packages for data analysis and visualization
Syntax – Gets out of your way
Community Driven
Ready for web applications, too.
Intro to Python Data Analysis in Wakari
• “Python is good for data cleanup, R for
statistical models”

“Which is the better Data Analysis language? R or Python?” Quora.
http://www.quora.com/Data-Analysis/Which-is-the-better-Data-analysis-language-R-or-Python
• “Python is good for data cleanup, R for
statistical models”
• “R is quirky and weird but the statisticians love
it and there really isn’t any compelling reason
to switch”

“Which is the better Data Analysis language? R or Python?” Quora.
http://www.quora.com/Data-Analysis/Which-is-the-better-Data-analysis-language-R-or-Python
• “Python is good for data cleanup, R for
statistical models”
• “R is quirky and weird but the statisticians love
it and there really isn’t any compelling reason
to switch”
• “You’re running an MCMC simulation on a
laptop? Perhaps you should write it in
C++/FORTRAN”

“Which is the better Data Analysis language? R or Python?” Quora.
http://www.quora.com/Data-Analysis/Which-is-the-better-Data-analysis-language-R-or-Python
“You’re running an MCMC simulation on
a laptop? Perhaps you should write it in
C++/FORTRAN”

Ready for DATA, and then some
Numba: just-in-time compiler to LLVM
through @decorators
numba.pydata.org
Numba: just-in-time compiler to LLVM
through @decorators*
numba.pydata.org
*aka, fast. easy.
Intro to Python Data Analysis in Wakari
Basic packages for data analysis and visualization
NumPy: The foundation of the
Python Data Analysis stack
NumPy: Array-oriented
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in Wakari
Pandas: Builds upon NumPy
Matplotlib: 2D plotting library
IPython: Interactive Python
(+ in the Web)
tab completion
magic %-commands
Inline plots
Anaconda: pulls it all together
Intro to Python Data Analysis in Wakari
wakari.io

Browser-based Python & Linux environment
IPython Notebook

Scientific Packages
Terminal

Share files, IPython notebooks, and plots with pay-as-you-go compute
Sharing in Wakari
• Packages IPython
notebooks, files, folders, data, and
environment
• Get a link
• Share that link.
Reproducible Research
Intro to Python Data Analysis in Wakari
“A rule of thumb among biotechnology venture
capitalists is that half of published research
cannot be replicated”
How do we replicate research today?
collaborate on
How do we replicate research today?
collaborate on
How do we replicate research today?
data analysis
How do we collaborate today?
How do we collaborate today?
How do we collaborate today?
How do we collaborate today?
????????
How do we replicate research today?
wakari.io

Browser-based Python & Linux environment
Enterprise or Cloud

Online at wakari.io or install locally for access to your hardware and data
wakari.io

Browser-based Python & Linux environment
Coming Soon
Project-based interaction
user

Projects starting at 10$/month with unlimited team members
Interactive Plotting

Next-generation collaborative data manipulation, analysis, and presentation
Talks to see
• Jack Vanderplas (Washington)
– Efficient computing with Numpy
• 29th Floor combo 3pm (Right now, next door!)

• Julia Evans (N/A)
– A practical introduction to IPython Notebook &
pandas
• Here, 4:45pm.
Talks to see
• Sarah Guido (Michigan)
– A Beginner’s Guide to Machine Learning with
scikit-learn

• Imram Haque (Counsyl)
– Beyond the dict

• Peter Wang (Continuum)
– Bokeh Workshop
Special Thanks
Ben Zaitlin
Mark Florisson
Clayton Davis
Bryan Van de Ven
Travis Oliphant
Karissa McKelvey
@karissamck

More Related Content

What's hot (20)

PDF
Data Analysis and Statistics in Python using pandas and statsmodels
Wes McKinney
 
KEY
Large Scale Data Analysis Tools
boorad
 
PDF
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
Wes McKinney
 
KEY
Cascalog
nathanmarz
 
PDF
Big data analysis in python @ PyCon.tw 2013
Jimmy Lai
 
PDF
Spark Summit 2015 keynote: Making Big Data Simple with Spark
Databricks
 
PDF
New Directions for Spark in 2015 - Spark Summit East
Databricks
 
PDF
SciPy 2011 pandas lightning talk
Wes McKinney
 
PDF
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Databricks
 
PDF
Spark Summit EU 2015: Reynold Xin Keynote
Databricks
 
PDF
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Databricks
 
PDF
Workflow Hacks #1 - dots. Tokyo
Taro L. Saito
 
PDF
Extending Pandas using Apache Arrow and Numba
Uwe Korn
 
PDF
Scaling PyData Up and Out
Travis Oliphant
 
PPTX
Graph databases: Tinkerpop and Titan DB
Mohamed Taher Alrefaie
 
PDF
Strata NYC 2015 - Supercharging R with Apache Spark
Databricks
 
PDF
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Databricks
 
PPTX
Distributed Deep Learning on Hadoop Clusters
DataWorks Summit/Hadoop Summit
 
PDF
Scala: the unpredicted lingua franca for data science
Andy Petrella
 
PDF
Sparkler - Spark Crawler
Thamme Gowda
 
Data Analysis and Statistics in Python using pandas and statsmodels
Wes McKinney
 
Large Scale Data Analysis Tools
boorad
 
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
Wes McKinney
 
Cascalog
nathanmarz
 
Big data analysis in python @ PyCon.tw 2013
Jimmy Lai
 
Spark Summit 2015 keynote: Making Big Data Simple with Spark
Databricks
 
New Directions for Spark in 2015 - Spark Summit East
Databricks
 
SciPy 2011 pandas lightning talk
Wes McKinney
 
Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predict...
Databricks
 
Spark Summit EU 2015: Reynold Xin Keynote
Databricks
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Databricks
 
Workflow Hacks #1 - dots. Tokyo
Taro L. Saito
 
Extending Pandas using Apache Arrow and Numba
Uwe Korn
 
Scaling PyData Up and Out
Travis Oliphant
 
Graph databases: Tinkerpop and Titan DB
Mohamed Taher Alrefaie
 
Strata NYC 2015 - Supercharging R with Apache Spark
Databricks
 
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Databricks
 
Distributed Deep Learning on Hadoop Clusters
DataWorks Summit/Hadoop Summit
 
Scala: the unpredicted lingua franca for data science
Andy Petrella
 
Sparkler - Spark Crawler
Thamme Gowda
 

Viewers also liked (17)

PDF
pandas - Python Data Analysis
Andrew Henshaw
 
PDF
Getting started with pandas
maikroeder
 
PPTX
Practical Data Analysis in Python
Hilary Mason
 
ODP
Data Analysis in Python
Richard Herrell
 
PPTX
Python and Data Analysis
Praveen Nair
 
PPTX
Network theory - PyCon 2015
Sarah Guido
 
PPTX
Python for Data Analysis: Chapter 2
智哉 今西
 
PDF
Creative Data Analysis with Python
Grant Paton-Simpson
 
PDF
Learning Python from Data
Mosky Liu
 
PPTX
Data analysis with pandas
Outreach Digital
 
PDF
Creating Your First Predictive Model In Python
Robert Dempsey
 
PDF
Categorical Data Analysis in Python
Jaidev Deshpande
 
PDF
Graph Analyses with Python and NetworkX
Benjamin Bengfort
 
PPTX
Python+numpy pandas 4편
Yong Joon Moon
 
PDF
Python Pandas for Data Science cheatsheet
Dr. Volkan OBAN
 
PDF
Data analysis with Pandas and Spark
Felix Crisan
 
PDF
pandas: a Foundational Python Library for Data Analysis and Statistics
Wes McKinney
 
pandas - Python Data Analysis
Andrew Henshaw
 
Getting started with pandas
maikroeder
 
Practical Data Analysis in Python
Hilary Mason
 
Data Analysis in Python
Richard Herrell
 
Python and Data Analysis
Praveen Nair
 
Network theory - PyCon 2015
Sarah Guido
 
Python for Data Analysis: Chapter 2
智哉 今西
 
Creative Data Analysis with Python
Grant Paton-Simpson
 
Learning Python from Data
Mosky Liu
 
Data analysis with pandas
Outreach Digital
 
Creating Your First Predictive Model In Python
Robert Dempsey
 
Categorical Data Analysis in Python
Jaidev Deshpande
 
Graph Analyses with Python and NetworkX
Benjamin Bengfort
 
Python+numpy pandas 4편
Yong Joon Moon
 
Python Pandas for Data Science cheatsheet
Dr. Volkan OBAN
 
Data analysis with Pandas and Spark
Felix Crisan
 
pandas: a Foundational Python Library for Data Analysis and Statistics
Wes McKinney
 
Ad

Similar to Intro to Python Data Analysis in Wakari (20)

PDF
London level39
Travis Oliphant
 
PDF
PyData Boston 2013
Travis Oliphant
 
PDF
An Incomplete Data Tools Landscape for Hackers in 2015
Wes McKinney
 
PDF
Python for Data Science: A Comprehensive Guide
priyanka rajput
 
PDF
Python on Science ? Yes, We can.
Marcel Caraciolo
 
PDF
Slides 111017220255-phpapp01
Ken Mwai
 
PDF
Python for Science and Engineering: a presentation to A*STAR and the Singapor...
pythoncharmers
 
PDF
Migrating from matlab to python
ActiveState
 
PPTX
Data analysis using python in Jupyter notebook.pptx
ssuserc26f8f
 
PDF
RDM 2020: Python, Numpy, and Pandas
Henry Schreiner
 
PPTX
CS301_Detailed_Overview_MATLAB_Mathematica_Python.pptx
fn723290
 
PDF
An Overview of Python for Data Analytics
IRJET Journal
 
PDF
Python for Data Analysis 3rd Edition by Wes McKinney ISBN 9781098103989 10981...
bordihelomjl
 
PDF
Python for Data Analysis, 3rd Edition (Second Early Release) Wes Mckinney
dyrudfexhri
 
PDF
Python for Data Analysis_ Data Wrangling with Pandas, Numpy, and Ipython ( PD...
R.K.College of engg & Tech
 
PPTX
What is Python? An overview of Python for science.
Nicholas Pringle
 
PDF
PyCon Estonia 2019
Travis Oliphant
 
PDF
Introduction to Python Syntax and Semantics
Adam Cook
 
PDF
Python as the Zen of Data Science
Travis Oliphant
 
PDF
The Joy of SciPy
kammeyer
 
London level39
Travis Oliphant
 
PyData Boston 2013
Travis Oliphant
 
An Incomplete Data Tools Landscape for Hackers in 2015
Wes McKinney
 
Python for Data Science: A Comprehensive Guide
priyanka rajput
 
Python on Science ? Yes, We can.
Marcel Caraciolo
 
Slides 111017220255-phpapp01
Ken Mwai
 
Python for Science and Engineering: a presentation to A*STAR and the Singapor...
pythoncharmers
 
Migrating from matlab to python
ActiveState
 
Data analysis using python in Jupyter notebook.pptx
ssuserc26f8f
 
RDM 2020: Python, Numpy, and Pandas
Henry Schreiner
 
CS301_Detailed_Overview_MATLAB_Mathematica_Python.pptx
fn723290
 
An Overview of Python for Data Analytics
IRJET Journal
 
Python for Data Analysis 3rd Edition by Wes McKinney ISBN 9781098103989 10981...
bordihelomjl
 
Python for Data Analysis, 3rd Edition (Second Early Release) Wes Mckinney
dyrudfexhri
 
Python for Data Analysis_ Data Wrangling with Pandas, Numpy, and Ipython ( PD...
R.K.College of engg & Tech
 
What is Python? An overview of Python for science.
Nicholas Pringle
 
PyCon Estonia 2019
Travis Oliphant
 
Introduction to Python Syntax and Semantics
Adam Cook
 
Python as the Zen of Data Science
Travis Oliphant
 
The Joy of SciPy
kammeyer
 
Ad

Recently uploaded (20)

PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
Digital Circuits, important subject in CS
contactparinay1
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 

Intro to Python Data Analysis in Wakari

Editor's Notes

  • #3: I do web programming
  • #7: How many of you use python on a daily basis for data analysis?In the past year, raise your hand if you’ve worked primarily in python.
  • #19: Domain-specific librariesStatsmodels => statistical computingScikit-image => image manipulationOpenCV => Image processing with interface that can accept NumPy arraysPyTables => HDF5 integrationNumexpr => you can write expressions on your data with cache-aware expressions, it’s very efficient.There are more packages in the python scientific stack than just these. But, it’s good to know numpy so you can get down and dirty with your data and manipulate it if need be.
  • #20: PACKAGES!Occasional programmers can jump on
  • #21: PACKAGES!
  • #27: THIS SHOULD NEVER HAPPEN.At continuum analytics, we never want these words to be uttered again.
  • #31: Python in 60 secondsNumPyScipyPandasMatplotlibScikit-learn
  • #33: Homogenous
  • #41: We’re going to pull it all together in Wakari.
  • #57: And this is why sharing in wakari is so important