SlideShare a Scribd company logo
APACHE TOREE
ASIM JALIS
GALVANIZE
INTRO
ASIM JALIS
Galvanize/Zipfian, Data
Engineering
Cloudera, Microso!,
Salesforce
MS in Computer Science
from University of
Virginia
WHAT IS GALVANIZE’S DATA
ENGINEERING IMMERSIVE?
Immersive Peer Learning
Environment
Master High-Demand
Skills and Technologies
Heart of San Francisco in
SOMA
YOU GET TO . . .
Play with Terabytes of
Data
Spark, Hadoop, Hive,
Kafka, Storm, HBase
Data Science at Scale
Level UP your Career
Apache Toree
FOR MORE INFORMATION
asim.jalis@galvanize.com
http://galvanize.com
TALK OVERVIEW
WHAT IS THIS TALK ABOUT?
What is Apache Toree?
How can I create
IPython/Jupyter
notebooks for
Spark/Scala?
HOW MANY PEOPLE HERE ARE
FAMILIAR WITH
IPYTHON/JUPYTER NOTEBOOKS?
HOW MANY PEOPLE HERE ARE
FAMILIAR WITH APACHE SPARK?
HOW MANY PEOPLE HERE ARE
FAMILIAR WITH SCALA?
LITERATE PROGRAMMING
WHAT IS LITERATE
PROGRAMMING?
Proposed by Don Knuth
Write programs for
humans, not machines
Programs communicate
ideas to others
Default text is
documentation or
thoughts
Code is explicitly
marked out
LITERATE PROGRAM
This program prints hello world.
<<hello.c>>=
<<includes>>
<<main>>
@
Some includes.
<<includes>>=
#include <stdio.h>
@
Print hello world, then exit.
<<main>>=
int main(int argc, char *argv[]) {
printf("Hello World!n");
return 0;
}
@
WHAT PROBLEM DOES JUPYTER
SOLVE?
Suppose you want to share programming idea or tutorial
You write an article and embed code in it
Now imagine being able to execute the code in the article
HOW IS THIS DIFFERENT FROM
CODE COMMENTS?
Commented code is not technical literature
It cannot be published or read as an article
A literate program is an executable article
JUPYTER/IPYTHON
WHAT IS JUPYTER?
Create executable
documents
Originally for Python
Supports other systems
through kernels
JUPYTER DEMO
Write Markdown text
Write Scala Spark code
Execute
Repeat
Tab-completion
JUPYTER ARCH
JUPYTER ARCH
Jupyter server
Displays notebook in
browser
Executes code on
Python runtime
Displays output back
into notebook
FERNANDO PÉREZ
IPython/Jupyter
inventor
Particle Physics PhD,
University of Colorado—
Boulder
Now at UC Berkeley
Started IPython in 2001
IS IT IPYTHON OR JUPYTER?
Started out as IPython
notebook
Not specific to Python
Can work with Scala and
other languages
Jupyter captures its
language independence
APACHE TOREE
WHAT IS TOREE?
Toree is a Jupyter Kernel
Executes Scala
Runs Spark
Driver/Context
HOW DOES TOREE WORK?
TOREE ARCHITECTURE
Jupyter Server talks to
Toree Kernel
Toree Kernel talks to
Spark Driver
Spark Driver talks to
Spark Executors
HOW IS TOREE DIFFERENT FROM
ZEPPELIN
Toree compliant with Jupyter protocol
Toree is easy to install and use
Zeppelin does not use Jupyter protocol
Zeppelin wants to be a platform like Jupyter
TOREE COMMITS
Lot of activity last year
Stabilizing
WHO WROTE TOREE?
Top 2 contributors responsible for 50% of commits
chipsenkbeil has 318 commits
Lull3rSkat3r has 72 commits
ROBERT “CHIP” SENKBEIL AND
COREY STUBBS
WHY IS IT CALLED TOREE?
Nothing special about the name. Some
people in the group and just picked it out.
Some facts though, it is a purposeful
misspelling of the the Japanese word torii.
A torrii is a traditional gate for Shinto
shrines in Japan. —Corey Stubbs (personal
email)
ACTUAL TORII
TOREE HANDS-ON DEMO
WHAT WE WILL COVER
How to install Toree
How to automatically pull Java libraries in your notebook
How to publish notebooks on
How to turn notebooks into slide shows
http://nbviewer.jupyter.org/
QUICKSTART TUTORIAL
For details on how to do all these things
See https://github.com/asimjalis/apache-toree-quickstart
CONCLUSION
REFERENCES
Apache Toree Quickstart Tutorial
https://github.com/asimjalis/apache-toree-quickstart
Apache Toree on GitHub
https://github.com/apache/incubator-toree
Apache Toree Home
https://toree.incubator.apache.org/
QUESTIONS

More Related Content

What's hot (20)

PDF
Fast and Scalable Python
Travis Oliphant
 
PDF
Snorkel: Dark Data and Machine Learning with Christopher Ré
Jen Aman
 
PPTX
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
Josh Patterson
 
PDF
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark Summit
 
PPTX
Deep Learning Intro - Georgia Tech - CSE6242 - March 2015
Josh Patterson
 
PDF
Data science lifecycle with Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
PPTX
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Josh Patterson
 
PPTX
Pyspark vs Spark Let's Unravel the Bond!
ankitbhandari32
 
PPTX
Hadoop summit 2016
Adam Gibson
 
PDF
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Databricks
 
PPTX
Deep Learning: DL4J and DataVec
Josh Patterson
 
PPTX
Smart Data Conference: DL4J and DataVec
Josh Patterson
 
PPTX
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Simplilearn
 
PPTX
Deep Learning and Recurrent Neural Networks in the Enterprise
Josh Patterson
 
PPTX
Deep learning with DL4J - Hadoop Summit 2015
Josh Patterson
 
PDF
Large Scale Processing of Unstructured Text
DataWorks Summit
 
PPTX
How to Build Deep Learning Models
Josh Patterson
 
PDF
Hands on image recognition with scala spark and deep learning4j
Guglielmo Iozzia
 
PDF
sparklyr - Jeff Allen
Sri Ambati
 
PPS
Storm presentation
Shyam Raj
 
Fast and Scalable Python
Travis Oliphant
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Jen Aman
 
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
Josh Patterson
 
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark Summit
 
Deep Learning Intro - Georgia Tech - CSE6242 - March 2015
Josh Patterson
 
Data science lifecycle with Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Josh Patterson
 
Pyspark vs Spark Let's Unravel the Bond!
ankitbhandari32
 
Hadoop summit 2016
Adam Gibson
 
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Databricks
 
Deep Learning: DL4J and DataVec
Josh Patterson
 
Smart Data Conference: DL4J and DataVec
Josh Patterson
 
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Simplilearn
 
Deep Learning and Recurrent Neural Networks in the Enterprise
Josh Patterson
 
Deep learning with DL4J - Hadoop Summit 2015
Josh Patterson
 
Large Scale Processing of Unstructured Text
DataWorks Summit
 
How to Build Deep Learning Models
Josh Patterson
 
Hands on image recognition with scala spark and deep learning4j
Guglielmo Iozzia
 
sparklyr - Jeff Allen
Sri Ambati
 
Storm presentation
Shyam Raj
 

Similar to Apache Toree (12)

PDF
A Jupyter kernel for Scala and Apache Spark.pdf
Luciano Resende
 
PDF
Jupyter notebooks on steroids
Jose Enrique Ruiz
 
PDF
Jupyter, A Platform for Data Science at Scale
Matthias Bussonnier
 
PPTX
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Muralidharan Deenathayalan
 
PPTX
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Muralidharan Deenathayalan
 
PDF
Jupyter machine learning crash course
Olga Scrivner
 
PDF
Introduction to IPython & Notebook
Areski Belaid
 
PPTX
2018 02 20-jeg_index
Chester Chen
 
PDF
Data analysis with Pandas and Spark
Felix Crisan
 
PDF
Computable content: Notebooks, containers, and data-centric organizational le...
Domino Data Lab
 
PDF
PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...
Plotly
 
PDF
Setting-Up Python Environment (Jupyter Notebook)
NopphawanTamkuan
 
A Jupyter kernel for Scala and Apache Spark.pdf
Luciano Resende
 
Jupyter notebooks on steroids
Jose Enrique Ruiz
 
Jupyter, A Platform for Data Science at Scale
Matthias Bussonnier
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Muralidharan Deenathayalan
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Muralidharan Deenathayalan
 
Jupyter machine learning crash course
Olga Scrivner
 
Introduction to IPython & Notebook
Areski Belaid
 
2018 02 20-jeg_index
Chester Chen
 
Data analysis with Pandas and Spark
Felix Crisan
 
Computable content: Notebooks, containers, and data-centric organizational le...
Domino Data Lab
 
PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...
Plotly
 
Setting-Up Python Environment (Jupyter Notebook)
NopphawanTamkuan
 
Ad

More from Asim Jalis (6)

PDF
Neural Networks and Deep Learning
Asim Jalis
 
PDF
Data Science and Machine Learning Using Python and Scikit-learn
Asim Jalis
 
PDF
Data Engineering Quick Guide
Asim Jalis
 
PDF
Apache kudu
Asim Jalis
 
PDF
Neural Networks, Spark MLlib, Deep Learning
Asim Jalis
 
PDF
Spark MLlib and Viral Tweets
Asim Jalis
 
Neural Networks and Deep Learning
Asim Jalis
 
Data Science and Machine Learning Using Python and Scikit-learn
Asim Jalis
 
Data Engineering Quick Guide
Asim Jalis
 
Apache kudu
Asim Jalis
 
Neural Networks, Spark MLlib, Deep Learning
Asim Jalis
 
Spark MLlib and Viral Tweets
Asim Jalis
 
Ad

Recently uploaded (20)

PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PPTX
7 Easy Ways to Improve Clarity in Your BI Reports
sophiegracewriter
 
PPTX
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PDF
Top Civil Engineer Canada Services111111
nengineeringfirms
 
PDF
apidays Munich 2025 - The Double Life of the API Product Manager, Emmanuel Pa...
apidays
 
PPTX
Nursing Shift Supervisor 24/7 in a week .pptx
amjadtanveer
 
PPT
introdution to python with a very little difficulty
HUZAIFABINABDULLAH
 
PDF
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PPTX
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PDF
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
apidays
 
PPTX
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PPTX
Solution+Architecture+Review+-+Sample.pptx
manuvratsingh1
 
PPTX
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
7 Easy Ways to Improve Clarity in Your BI Reports
sophiegracewriter
 
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
short term internship project on Data visualization
JMJCollegeComputerde
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
Top Civil Engineer Canada Services111111
nengineeringfirms
 
apidays Munich 2025 - The Double Life of the API Product Manager, Emmanuel Pa...
apidays
 
Nursing Shift Supervisor 24/7 in a week .pptx
amjadtanveer
 
introdution to python with a very little difficulty
HUZAIFABINABDULLAH
 
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
apidays
 
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
Solution+Architecture+Review+-+Sample.pptx
manuvratsingh1
 
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 

Apache Toree