SlideShare a Scribd company logo
Apache Zeppelin
The (very) short field trip
by G.Alléon & G.DupontTDS meetup - 2016.06.30
Who are we?
Guillaume Alleon - AIRBUS Group Innovation (corporate research center)
Research leader for more than 30 people from UK to China, tackling problems in massive data processing
and information extraction.
Was already in “big data” when it was still called HPC…
Gerard Dupont - AIRBUS Defence & Space (space systems)
Technical coordinator for R&T studies on distributed processing systems.
Spend way too much time processing web data for intelligence, now looking to the sky (satellite data ;-)
Zeppelin moto
“A web-based notebook that enables interactive data analytics.”
Origins & history
Missing piece in HADOOP landscape: a modern analytic playground.
2012.12 - Data analytics
solution (NFLabs)
2013.10 - Opensourced
2014.12 - ASF incubation
2015 - 3 stable releases
2016.05 - Maturing to Apache
top level project
3000 feet view
What’s cool about Zeppelin
⊕interactive
⊕out-of-the-box spark integration
⊕out-of-the-box visualization options
⊕direct access to DOM for customized visualization
⊕nice UI (bootstrap & angular)
⊕notebook run scheduler
⊕easy to configure
⊕extensibility, extensibility and extensibility...
What’s cool about Zeppelin
⊕interactive
⊕out-of-the-box spark integration
⊕out-of-the-box visualization options
⊕direct access to DOM for customized visualization
⊕nice UI (bootstrap & angular)
⊕notebook run scheduler
⊕easy to configure
⊕extensibility, extensibility and extensibility...
… the dark side
⊝hard to install
⊝need to build from the source
(for customized version)
⊝not (yet) multi-users
Overview/look & feel
Interpreter text
(aka your code)
Interpreter config
Interactive results
DEMO time
credits: https://www.weasyl.com/~uszatyarbuz
Under the hood
○ Interpreter isolation with their
own JVM
○ Dynamic dependencies loading
○ REST & websocket on front
○ Thrift in back
(or whatever you add)
○ Process scheduler (cron-like)
Roadmap
Enterprise Ready
○ Multi-tenancy
○ Job scheduler
○ HA
Usability Improvement
○ UX improvement
○ Table data support
○ Dynamic interpreter integration
○ Reusable analytic application catalog
Thx
Offical website: https://zeppelin.apache.org/
Notebook sample: https://www.zeppelinhub.com/viewer
Source code: https://github.com/apache/incubator-zeppelin
Mailing lists: http://zeppelin.apache.org/community.html
This TDS notebook: http://tinyurl.com/zeppelin-tds
Sources for this presentation:
○ http://www.slideshare.net/FlinkForward/moon-soo-lee-data-science-lifecycle-with-apache-flink-and-apache-zeppelin/23
○ http://www.slideshare.net/HadoopSummit/apache-zeppelin-helium-and-beyond
○ http://www.slideshare.net/felixcss/interactive-data-science-from-scratch-with-apache-zeppelin-and-apache-spark
○ http://www.slideshare.net/BrunoBonnin/explorez-vos-donnes-avec-apache-zeppelin
credits: https://www.weasyl.com/~uszatyarbuz
BACKUP
Origins & history
Active core teams
Descent number of
external contributors
Plenty of interpreters
(official and external)
0.6.0-SNAPSHOT
(pending stabilization)
3000 feet view

More Related Content

What's hot (20)

ODP
Introduction NL-HUG (April)
Evert Lammerts
 
PPTX
Session 09 learning relationships.pptx
bodaceacat
 
PDF
Notes on data-intensive processing with Hadoop Mapreduce
Evert Lammerts
 
PDF
Introduction to TensorFlow
Matthias Feys
 
PDF
Text Mining with Node.js - Philipp Burckhardt, Carnegie Mellon University
NodejsFoundation
 
PDF
Sociopath presentation
Galina Alperovich
 
PPTX
Your data isn't that big @ Big Things Meetup 2016-05-16
Boaz Menuhin
 
PDF
Large-Scale Data Storage and Processing for Scientists with Hadoop
Evert Lammerts
 
PPTX
Improving long-term preservation of EOS data by independently mapping HDF4 da...
The HDF-EOS Tools and Information Center
 
PDF
July Clojure Users Group Meeting: "Using Cascalog with Palo Alto Open Data"
Paco Nathan
 
PPTX
Hadoop Jute Record Python
Paul Tarjan
 
PPTX
Nov HUG 2009: Hadoop Record Reader In Python
Yahoo Developer Network
 
PDF
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
Big Data Spain
 
PPTX
Tech Talk - Underutilized Resources in Distributed System
Rishabh Dugar
 
PDF
simple introduction to hadoop
vishnu rao
 
PDF
Machine learning in python course contents
MRUNALINI
 
PDF
DSD-NL 2017 Digishape project: "Heel AHN2 is inmiddels ingeladen in de Micros...
Deltares
 
Introduction NL-HUG (April)
Evert Lammerts
 
Session 09 learning relationships.pptx
bodaceacat
 
Notes on data-intensive processing with Hadoop Mapreduce
Evert Lammerts
 
Introduction to TensorFlow
Matthias Feys
 
Text Mining with Node.js - Philipp Burckhardt, Carnegie Mellon University
NodejsFoundation
 
Sociopath presentation
Galina Alperovich
 
Your data isn't that big @ Big Things Meetup 2016-05-16
Boaz Menuhin
 
Large-Scale Data Storage and Processing for Scientists with Hadoop
Evert Lammerts
 
Improving long-term preservation of EOS data by independently mapping HDF4 da...
The HDF-EOS Tools and Information Center
 
July Clojure Users Group Meeting: "Using Cascalog with Palo Alto Open Data"
Paco Nathan
 
Hadoop Jute Record Python
Paul Tarjan
 
Nov HUG 2009: Hadoop Record Reader In Python
Yahoo Developer Network
 
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
Big Data Spain
 
Tech Talk - Underutilized Resources in Distributed System
Rishabh Dugar
 
simple introduction to hadoop
vishnu rao
 
Machine learning in python course contents
MRUNALINI
 
DSD-NL 2017 Digishape project: "Heel AHN2 is inmiddels ingeladen in de Micros...
Deltares
 

Viewers also liked (17)

PPT
Ashcraft.edu103.module3
KaitySue8
 
DOC
Jennyresumen
amandaaltamirano
 
PDF
Unit 8 - Textbook Lesson 1
Mirna Deakle
 
PPTX
The Networked Supply Chain - Gary Philbin, Chief Operating Officer, Dollar T...
SAP Ariba
 
PDF
Pedal to the Metal: Accelerating Spark with Silicon Innovation
Jen Aman
 
PPTX
Voorstelling 4Betterresults_voor_KMOs_en_ondernemers
Kurt Vandewalle
 
PPTX
清明節
Joanne Chen
 
PDF
Seminarie 'Sturen op effecten door slimme dashboards' 3 december 2015
Möbius Business Redesign
 
PDF
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Redis Labs
 
PPTX
我想請你吃飯 (繁体)
Na Li
 
PDF
HIgh Performance Redis- Tague Griffith, GoPro
Redis Labs
 
PPS
Condicionamiento y aprendizaje
Jakelin
 
PPT
[biurowi 5 - en] basic principles of fire protection
AktywBHP
 
PDF
Lista de verbos irregulares en inglés
Jakelin
 
DOCX
Chinese Link Lesson 20 worksheet 2016
Joanne Chen
 
PPTX
Chinese link textbook Lesson 6 vocabulary
Joanne Chen
 
PPT
A Presentation on "NGO's Role in Disaster Management" Presented by Mr. Deepak...
CDRN
 
Ashcraft.edu103.module3
KaitySue8
 
Jennyresumen
amandaaltamirano
 
Unit 8 - Textbook Lesson 1
Mirna Deakle
 
The Networked Supply Chain - Gary Philbin, Chief Operating Officer, Dollar T...
SAP Ariba
 
Pedal to the Metal: Accelerating Spark with Silicon Innovation
Jen Aman
 
Voorstelling 4Betterresults_voor_KMOs_en_ondernemers
Kurt Vandewalle
 
清明節
Joanne Chen
 
Seminarie 'Sturen op effecten door slimme dashboards' 3 december 2015
Möbius Business Redesign
 
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Redis Labs
 
我想請你吃飯 (繁体)
Na Li
 
HIgh Performance Redis- Tague Griffith, GoPro
Redis Labs
 
Condicionamiento y aprendizaje
Jakelin
 
[biurowi 5 - en] basic principles of fire protection
AktywBHP
 
Lista de verbos irregulares en inglés
Jakelin
 
Chinese Link Lesson 20 worksheet 2016
Joanne Chen
 
Chinese link textbook Lesson 6 vocabulary
Joanne Chen
 
A Presentation on "NGO's Role in Disaster Management" Presented by Mr. Deepak...
CDRN
 
Ad

Similar to Toulouse Data Science meetup - Apache zeppelin (20)

PDF
Apache Zeppelin Helium and Beyond
DataWorks Summit/Hadoop Summit
 
PPTX
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
Luke Han
 
PPTX
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Spark Summit
 
PPTX
Zeppelin at twitter (sf data science meetup, july 2016)
Prasad Wagle
 
PDF
Apache Zeppelin, Helium and Beyond
DataWorks Summit/Hadoop Summit
 
PDF
Data science lifecycle with Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
PPTX
Quick Tour On Zeppelin
Knoldus Inc.
 
PPTX
Future of data visualization
hadoopsphere
 
PPTX
Zeppelin – An Agile & interactive analytical platform
Abhra Pal
 
PDF
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
Flink Forward
 
PDF
Helium makes Zeppelin fly!
DataWorks Summit
 
PPTX
Multi User Data science with Zeppelin
Vinay Shukla
 
PDF
Apache Zeppelin 소개
KSLUG
 
PPTX
Zeppelin at Twitter
Prasad Wagle
 
PPTX
Emr zeppelin & Livy demystified
Omid Vahdaty
 
PDF
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter
Apache Zeppelin
 
PDF
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
seoul_engineer
 
PPTX
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
DataWorks Summit/Hadoop Summit
 
PDF
Apache Zeppelin and Spark for Enterprise Data Science
Bikas Saha
 
PPTX
Apache Zeppelin and Spark for Enterprise Data Science
Bikas Saha
 
Apache Zeppelin Helium and Beyond
DataWorks Summit/Hadoop Summit
 
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
Luke Han
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Spark Summit
 
Zeppelin at twitter (sf data science meetup, july 2016)
Prasad Wagle
 
Apache Zeppelin, Helium and Beyond
DataWorks Summit/Hadoop Summit
 
Data science lifecycle with Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Quick Tour On Zeppelin
Knoldus Inc.
 
Future of data visualization
hadoopsphere
 
Zeppelin – An Agile & interactive analytical platform
Abhra Pal
 
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
Flink Forward
 
Helium makes Zeppelin fly!
DataWorks Summit
 
Multi User Data science with Zeppelin
Vinay Shukla
 
Apache Zeppelin 소개
KSLUG
 
Zeppelin at Twitter
Prasad Wagle
 
Emr zeppelin & Livy demystified
Omid Vahdaty
 
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter
Apache Zeppelin
 
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
seoul_engineer
 
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
DataWorks Summit/Hadoop Summit
 
Apache Zeppelin and Spark for Enterprise Data Science
Bikas Saha
 
Apache Zeppelin and Spark for Enterprise Data Science
Bikas Saha
 
Ad

Recently uploaded (20)

PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Rethinking Security Operations - SOC Evolution Journey.pdf
Haris Chughtai
 
PDF
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
PDF
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
PDF
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
PDF
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
PDF
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
PPTX
Top Managed Service Providers in Los Angeles
Captain IT
 
PDF
Shuen Mei Parth Sharma Boost Productivity, Innovation and Efficiency wit...
AWS Chicago
 
PDF
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
PPTX
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
PDF
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
PDF
HR agent at Mediq: Lessons learned on Agent Builder & Maestro by Tacstone Tec...
UiPathCommunity
 
PDF
Productivity Management Software | Workstatus
Lovely Baghel
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PDF
UiPath vs Other Automation Tools Meeting Presentation.pdf
Tracy Dixon
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Rethinking Security Operations - SOC Evolution Journey.pdf
Haris Chughtai
 
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
Top Managed Service Providers in Los Angeles
Captain IT
 
Shuen Mei Parth Sharma Boost Productivity, Innovation and Efficiency wit...
AWS Chicago
 
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
HR agent at Mediq: Lessons learned on Agent Builder & Maestro by Tacstone Tec...
UiPathCommunity
 
Productivity Management Software | Workstatus
Lovely Baghel
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
UiPath vs Other Automation Tools Meeting Presentation.pdf
Tracy Dixon
 

Toulouse Data Science meetup - Apache zeppelin

  • 1. Apache Zeppelin The (very) short field trip by G.Alléon & G.DupontTDS meetup - 2016.06.30
  • 2. Who are we? Guillaume Alleon - AIRBUS Group Innovation (corporate research center) Research leader for more than 30 people from UK to China, tackling problems in massive data processing and information extraction. Was already in “big data” when it was still called HPC… Gerard Dupont - AIRBUS Defence & Space (space systems) Technical coordinator for R&T studies on distributed processing systems. Spend way too much time processing web data for intelligence, now looking to the sky (satellite data ;-)
  • 3. Zeppelin moto “A web-based notebook that enables interactive data analytics.”
  • 4. Origins & history Missing piece in HADOOP landscape: a modern analytic playground. 2012.12 - Data analytics solution (NFLabs) 2013.10 - Opensourced 2014.12 - ASF incubation 2015 - 3 stable releases 2016.05 - Maturing to Apache top level project
  • 6. What’s cool about Zeppelin ⊕interactive ⊕out-of-the-box spark integration ⊕out-of-the-box visualization options ⊕direct access to DOM for customized visualization ⊕nice UI (bootstrap & angular) ⊕notebook run scheduler ⊕easy to configure ⊕extensibility, extensibility and extensibility...
  • 7. What’s cool about Zeppelin ⊕interactive ⊕out-of-the-box spark integration ⊕out-of-the-box visualization options ⊕direct access to DOM for customized visualization ⊕nice UI (bootstrap & angular) ⊕notebook run scheduler ⊕easy to configure ⊕extensibility, extensibility and extensibility... … the dark side ⊝hard to install ⊝need to build from the source (for customized version) ⊝not (yet) multi-users
  • 8. Overview/look & feel Interpreter text (aka your code) Interpreter config Interactive results
  • 10. Under the hood ○ Interpreter isolation with their own JVM ○ Dynamic dependencies loading ○ REST & websocket on front ○ Thrift in back (or whatever you add) ○ Process scheduler (cron-like)
  • 11. Roadmap Enterprise Ready ○ Multi-tenancy ○ Job scheduler ○ HA Usability Improvement ○ UX improvement ○ Table data support ○ Dynamic interpreter integration ○ Reusable analytic application catalog
  • 12. Thx Offical website: https://zeppelin.apache.org/ Notebook sample: https://www.zeppelinhub.com/viewer Source code: https://github.com/apache/incubator-zeppelin Mailing lists: http://zeppelin.apache.org/community.html This TDS notebook: http://tinyurl.com/zeppelin-tds Sources for this presentation: ○ http://www.slideshare.net/FlinkForward/moon-soo-lee-data-science-lifecycle-with-apache-flink-and-apache-zeppelin/23 ○ http://www.slideshare.net/HadoopSummit/apache-zeppelin-helium-and-beyond ○ http://www.slideshare.net/felixcss/interactive-data-science-from-scratch-with-apache-zeppelin-and-apache-spark ○ http://www.slideshare.net/BrunoBonnin/explorez-vos-donnes-avec-apache-zeppelin credits: https://www.weasyl.com/~uszatyarbuz
  • 14. Origins & history Active core teams Descent number of external contributors Plenty of interpreters (official and external) 0.6.0-SNAPSHOT (pending stabilization)

Editor's Notes

  • #4: Interactive & extensible Ingestion, Discovery, Analytics, Visualization, Collaboration, Data product Toward better capitalization of analytical application (helium)
  • #5: ~4 years top level apache project after less than 18 months of incubation
  • #7: Scala & spark integration Direct DOM for super cool visualization