SlideShare a Scribd company logo
© 2016 Continuum Analytics - Confidential & Proprietary
Python for Data:

Past, Present, and Future
Peter Wang
CTO, Co-founder
Anaconda / Continuum Analytics
© 2017 Anaconda, Inc.
• Our Journey with Anaconda
• Why Python for Data?
• The Future
Agenda
2
3
My Journey with Anaconda
© 2017 Anaconda, Inc.
• Degree in Physics (Cornell Univ.)
• Computer graphics developer (C, C++)
• Scientific Python developer and consultant (Chaco, Traits, …)
• Founded Continuum Analytics in 2012 with Travis Oliphant
• Launched / Created: PyData conferences and community, Anaconda
distribution, conda package manager, Bokeh web visualization, Blaze
data library
• Think a lot about future of Python for data+science, machine learning
About Peter
4
When we started 5 years ago…
© 2017 Anaconda, Inc.
The birth of conda…
6
“Guido, please help
convince core dev to
work with us to solve
the packaging problem!”
“Meh. Feel free to
solve it yourselves.”
© 2017 Anaconda, Inc. 7
• 500+ Popular Python Packages
• Optimized & Compiled
• Free for Everyone
• Extensible via Conda Package Manager
• Sandbox Packages & Libraries
• Cross-Platform – Windows, Linux, Mac
• Not just Python - over 230 R packages
© 2017 Anaconda, Inc. 8
0
500
1,000
1,500
2,000
2,500
3,000
3,500 2015/1
2015/2
2015/3
2015/4
2015/5
2015/6
2015/7
2015/8
2015/9
2015/10
2015/11
2015/12
2016/1
2016/2
2016/3
2016/4
2016/5
2016/6
2016/7
2016/8
2016/9
2016/10
2016/11
2016/12
2017/1
2017/2
2017/3
2017/4
2017/5
2017/6
2017/7
Thousands
Anaconda& Miniconda Downloads
Anaconda Miniconda
Over 20 Million Downloads
© 2017 Anaconda, Inc.
The Growth of Data Science - Python Leading the Way
9
https://stackoverflow.blog/2017/09/06/incredible-growth-python/
© 2017 Anaconda, Inc.
Other Problems in 2012…
10
• Performance: You had to choose between vectorized system like NumPy,
or going to Cython or wrapping C code. No nice JIT like Julia.
• We created Numba
• No system for building simple data-driven web apps, like Shiny for R.
• We created Bokeh, to serve as both Shiny and D3 for Python
• No easy parallelism, or intrinsic parallel primitives like Spark.
• We created Dask, which has parallel arrays and dataframes.
• Also solves “data doesn't fit in RAM” problem.
© 2017 Anaconda, Inc. 11
• Everyone is learning it, major universities are teaching it
• Proven in production at Serious Places, not merely hip startups
• Vastly outstrips scripting language rivals like Ruby, Perl
• Growing faster than pure analysis langs like R, SAS, Matlab
• Data science, machine learning application is taking off like a rocket
• Python is most popular language for Deep Learning, the most
rapidly-innovating area of machine learning
• Python 2 vs 3 rift is less of an issue for most people
Python in 2017
https://www.youtube.com/watch?v=nU09j2gGHYg
Why Python for Data?
13
© 2017 Anaconda, Inc. 14
1973 19811968 1974
SQL
Numeric
19962005 1993 1991
© 2017 Anaconda, Inc.
Python & ABC
15
It is interactive, structured, high-
level, and intended to be used
instead of BASIC, Pascal, or AWK.
It is not meant to be a systems-
programming language but is
intended for teaching or
prototyping.
© 2017 Anaconda, Inc. 16
Analyst
• Uses graphical tools
• Can call functions,
cut & paste code
• Can change some
variables
Gets paid for:
Insight
Excel, VB, Tableau,
Analyst / Data
Developer
• Builds simple apps & workflows
• Used to be "just an analyst"
• Likes coding to solve problems
• Doesn't want to be a "full-time
programmer"
Gets paid (like a rock star) for:
Code that produces insight
SAS, R, Matlab,
Programmer
• Creates frameworks
& compilers
• Uses IDEs
• Degree in CompSci
• Knows multiple
languages
Gets paid for:
Code
C, C++, Java, JS,
Python Python Python
© 2017 Anaconda, Inc.
• VERY common misconception
• Python is probably the most misunderstood
language
• There are “tribes” and ecosystems in Python:
web dev, scipy, pydata, embedded, scripting,
3D graphics, etc.
• But businesses tend to pigeonhole it:
• IT/software/data engineering view: competes
with Java, C#, Ruby…
• Analytics, stats, data science view: competes
with R, SAS, Matlab, SPSS, BI systems
Data science != Software Development
17
© 2017 Anaconda, Inc.
• Data exploration and analysis are going to be a new kind of literacy that
will be required to do great work in any field.

• Language is a human instinct and is a natural path to insight. We see
this in our interaction with Python/PyData users, whose passion chiefly
stems from this expressiveness and agility.

• An analytical language is “thoughtware”, not “software”.
Era of Data Literacy
18
© 2017 Anaconda, Inc. 19
What’s Next?
20
© 2017 Anaconda, Inc.
• Python will become a preferred way to develop cognitive applications:
online model learning and training
• There will be a steady income stream for people who want to maintain
Python 2.x codebases
• Multi-language interoperability will be greatly improved once people
adopt the Apache Arrow format for storing data. This means Python code
running alongside Java/Scala/JVM will not be a second-class citizen.
• Constant improvements in memory and storage, as well as GPUs, mean
that people will continue doing lots of Python locally on big workstations.
A Few Predictions
21
Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)
© 2017 Anaconda, Inc.
• Not about licenses
• Empowering people &
communities to innovate
• Aligns us with users, customers,
innovators
• “Software is eating the world”
• Open source is eating software
Open Source and Developers
23
Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)
© 2017 Anaconda, Inc.
• Not about cost of software (“capital
expense”)
• Not even about maintenance of
software (“operational expense”)
• Core business goals:
• Avoid lock-in
• Harness innovation
Open Source and Businesses
25
Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)
5 Years
25+ Conferences
100s of talks
© 2017 Anaconda, Inc.
Questions?
28

More Related Content

Similar to Python for Data: Past, Present, Future (PyCon JP 2017 Keynote) (20)

PDF
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
Wes McKinney
 
PDF
Introduction to Data Science & Python.pdf
AnshumanDwivedi14
 
PPTX
2019 DSA 105 Introduction to Data Science Week 4
Ferdin Joe John Joseph PhD
 
PDF
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
Ferdin Joe John Joseph PhD
 
PDF
Top 10 Big Data Technologies | Edureka
Edureka!
 
PDF
Enabling Python to be a Better Big Data Citizen
Wes McKinney
 
PPTX
Know thy logos
Vishal V
 
PDF
The Future of Data Science
DataWorks Summit
 
PDF
PyData: The Next Generation
Wes McKinney
 
PDF
What Is The Future of Data Science With Python?
SofiaCarter4
 
PDF
introduction to python unit notes .pdf
Samyuktha481974
 
PDF
ITAC 2016 Where Open Source Meets Audit Analytics
Andrew Clark
 
PPTX
JavaOne 2016: Getting Started with Apache Spark: Use Scala, Java, Python, or ...
David Taieb
 
PPTX
Future of Python Certified Professionals in Data Science and Artificial Intel...
M M Nair
 
PDF
Unleashing the Potential: Navigating the Versatility and Simplicity of Python...
Flexsin
 
PPTX
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Simplilearn
 
PDF
Keynote at Converge 2019
Travis Oliphant
 
PDF
Python on Science ? Yes, We can.
Marcel Caraciolo
 
PDF
python training in chandigarh
priyansuthakur59093
 
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
Wes McKinney
 
Introduction to Data Science & Python.pdf
AnshumanDwivedi14
 
2019 DSA 105 Introduction to Data Science Week 4
Ferdin Joe John Joseph PhD
 
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
Ferdin Joe John Joseph PhD
 
Top 10 Big Data Technologies | Edureka
Edureka!
 
Enabling Python to be a Better Big Data Citizen
Wes McKinney
 
Know thy logos
Vishal V
 
The Future of Data Science
DataWorks Summit
 
PyData: The Next Generation
Wes McKinney
 
What Is The Future of Data Science With Python?
SofiaCarter4
 
introduction to python unit notes .pdf
Samyuktha481974
 
ITAC 2016 Where Open Source Meets Audit Analytics
Andrew Clark
 
JavaOne 2016: Getting Started with Apache Spark: Use Scala, Java, Python, or ...
David Taieb
 
Future of Python Certified Professionals in Data Science and Artificial Intel...
M M Nair
 
Unleashing the Potential: Navigating the Versatility and Simplicity of Python...
Flexsin
 
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Simplilearn
 
Keynote at Converge 2019
Travis Oliphant
 
Python on Science ? Yes, We can.
Marcel Caraciolo
 
python training in chandigarh
priyansuthakur59093
 

More from Peter Wang (9)

PDF
Rethinking Decentralization / Whither Privacy?
Peter Wang
 
PDF
Rethinking OSS In An Era of Cloud and ML
Peter Wang
 
PDF
Command line Data Tools
Peter Wang
 
PDF
Stories, Myth, and the Humane Network
Peter Wang
 
PDF
Thoughts on Business & Startups
Peter Wang
 
PDF
Bokeh Tutorial - PyData @ Strata San Jose 2015
Peter Wang
 
PDF
Interactive Visualization With Bokeh (SF Python Meetup)
Peter Wang
 
PDF
PyData: Past, Present Future (PyData SV 2014 Keynote)
Peter Wang
 
PDF
Python's Role in the Future of Data Analysis
Peter Wang
 
Rethinking Decentralization / Whither Privacy?
Peter Wang
 
Rethinking OSS In An Era of Cloud and ML
Peter Wang
 
Command line Data Tools
Peter Wang
 
Stories, Myth, and the Humane Network
Peter Wang
 
Thoughts on Business & Startups
Peter Wang
 
Bokeh Tutorial - PyData @ Strata San Jose 2015
Peter Wang
 
Interactive Visualization With Bokeh (SF Python Meetup)
Peter Wang
 
PyData: Past, Present Future (PyData SV 2014 Keynote)
Peter Wang
 
Python's Role in the Future of Data Analysis
Peter Wang
 
Ad

Recently uploaded (20)

PDF
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
PPTX
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
PDF
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
PPTX
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
PDF
Empower Your Tech Vision- Why Businesses Prefer to Hire Remote Developers fro...
logixshapers59
 
PDF
AOMEI Partition Assistant Crack 10.8.2 + WinPE Free Downlaod New Version 2025
bashirkhan333g
 
PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PPTX
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
PDF
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
PDF
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
PDF
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PPTX
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
Empower Your Tech Vision- Why Businesses Prefer to Hire Remote Developers fro...
logixshapers59
 
AOMEI Partition Assistant Crack 10.8.2 + WinPE Free Downlaod New Version 2025
bashirkhan333g
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
Ad

Python for Data: Past, Present, Future (PyCon JP 2017 Keynote)

  • 1. © 2016 Continuum Analytics - Confidential & Proprietary Python for Data:
 Past, Present, and Future Peter Wang CTO, Co-founder Anaconda / Continuum Analytics
  • 2. © 2017 Anaconda, Inc. • Our Journey with Anaconda • Why Python for Data? • The Future Agenda 2
  • 3. 3 My Journey with Anaconda
  • 4. © 2017 Anaconda, Inc. • Degree in Physics (Cornell Univ.) • Computer graphics developer (C, C++) • Scientific Python developer and consultant (Chaco, Traits, …) • Founded Continuum Analytics in 2012 with Travis Oliphant • Launched / Created: PyData conferences and community, Anaconda distribution, conda package manager, Bokeh web visualization, Blaze data library • Think a lot about future of Python for data+science, machine learning About Peter 4
  • 5. When we started 5 years ago…
  • 6. © 2017 Anaconda, Inc. The birth of conda… 6 “Guido, please help convince core dev to work with us to solve the packaging problem!” “Meh. Feel free to solve it yourselves.”
  • 7. © 2017 Anaconda, Inc. 7 • 500+ Popular Python Packages • Optimized & Compiled • Free for Everyone • Extensible via Conda Package Manager • Sandbox Packages & Libraries • Cross-Platform – Windows, Linux, Mac • Not just Python - over 230 R packages
  • 8. © 2017 Anaconda, Inc. 8 0 500 1,000 1,500 2,000 2,500 3,000 3,500 2015/1 2015/2 2015/3 2015/4 2015/5 2015/6 2015/7 2015/8 2015/9 2015/10 2015/11 2015/12 2016/1 2016/2 2016/3 2016/4 2016/5 2016/6 2016/7 2016/8 2016/9 2016/10 2016/11 2016/12 2017/1 2017/2 2017/3 2017/4 2017/5 2017/6 2017/7 Thousands Anaconda& Miniconda Downloads Anaconda Miniconda Over 20 Million Downloads
  • 9. © 2017 Anaconda, Inc. The Growth of Data Science - Python Leading the Way 9 https://stackoverflow.blog/2017/09/06/incredible-growth-python/
  • 10. © 2017 Anaconda, Inc. Other Problems in 2012… 10 • Performance: You had to choose between vectorized system like NumPy, or going to Cython or wrapping C code. No nice JIT like Julia. • We created Numba • No system for building simple data-driven web apps, like Shiny for R. • We created Bokeh, to serve as both Shiny and D3 for Python • No easy parallelism, or intrinsic parallel primitives like Spark. • We created Dask, which has parallel arrays and dataframes. • Also solves “data doesn't fit in RAM” problem.
  • 11. © 2017 Anaconda, Inc. 11 • Everyone is learning it, major universities are teaching it • Proven in production at Serious Places, not merely hip startups • Vastly outstrips scripting language rivals like Ruby, Perl • Growing faster than pure analysis langs like R, SAS, Matlab • Data science, machine learning application is taking off like a rocket • Python is most popular language for Deep Learning, the most rapidly-innovating area of machine learning • Python 2 vs 3 rift is less of an issue for most people Python in 2017
  • 13. Why Python for Data? 13
  • 14. © 2017 Anaconda, Inc. 14 1973 19811968 1974 SQL Numeric 19962005 1993 1991
  • 15. © 2017 Anaconda, Inc. Python & ABC 15 It is interactive, structured, high- level, and intended to be used instead of BASIC, Pascal, or AWK. It is not meant to be a systems- programming language but is intended for teaching or prototyping.
  • 16. © 2017 Anaconda, Inc. 16 Analyst • Uses graphical tools • Can call functions, cut & paste code • Can change some variables Gets paid for: Insight Excel, VB, Tableau, Analyst / Data Developer • Builds simple apps & workflows • Used to be "just an analyst" • Likes coding to solve problems • Doesn't want to be a "full-time programmer" Gets paid (like a rock star) for: Code that produces insight SAS, R, Matlab, Programmer • Creates frameworks & compilers • Uses IDEs • Degree in CompSci • Knows multiple languages Gets paid for: Code C, C++, Java, JS, Python Python Python
  • 17. © 2017 Anaconda, Inc. • VERY common misconception • Python is probably the most misunderstood language • There are “tribes” and ecosystems in Python: web dev, scipy, pydata, embedded, scripting, 3D graphics, etc. • But businesses tend to pigeonhole it: • IT/software/data engineering view: competes with Java, C#, Ruby… • Analytics, stats, data science view: competes with R, SAS, Matlab, SPSS, BI systems Data science != Software Development 17
  • 18. © 2017 Anaconda, Inc. • Data exploration and analysis are going to be a new kind of literacy that will be required to do great work in any field.
 • Language is a human instinct and is a natural path to insight. We see this in our interaction with Python/PyData users, whose passion chiefly stems from this expressiveness and agility.
 • An analytical language is “thoughtware”, not “software”. Era of Data Literacy 18
  • 19. © 2017 Anaconda, Inc. 19
  • 21. © 2017 Anaconda, Inc. • Python will become a preferred way to develop cognitive applications: online model learning and training • There will be a steady income stream for people who want to maintain Python 2.x codebases • Multi-language interoperability will be greatly improved once people adopt the Apache Arrow format for storing data. This means Python code running alongside Java/Scala/JVM will not be a second-class citizen. • Constant improvements in memory and storage, as well as GPUs, mean that people will continue doing lots of Python locally on big workstations. A Few Predictions 21
  • 23. © 2017 Anaconda, Inc. • Not about licenses • Empowering people & communities to innovate • Aligns us with users, customers, innovators • “Software is eating the world” • Open source is eating software Open Source and Developers 23
  • 25. © 2017 Anaconda, Inc. • Not about cost of software (“capital expense”) • Not even about maintenance of software (“operational expense”) • Core business goals: • Avoid lock-in • Harness innovation Open Source and Businesses 25
  • 28. © 2017 Anaconda, Inc. Questions? 28