SlideShare a Scribd company logo
Making Big Data Simple with
Databricks Cloud
February 2015
Your company starts a
Big Data initiative
You have data, you have questions,
now what?
You have to navigate your data pipeline
Get a cluster up
and running
Build a
production
pipeline
Import and
explore data
Problem: the journey is complex and costly
Expensive to
build and hard to
manage
Disparate and
difficult tools
Months of re-
engineering to deploy
Get a cluster up
and running
Build a
production
pipeline
Import and
explore data
The Solution: Databricks
Cloud
A unified, cloud-hosted data
platform powered by Spark
From Challenges to Solutions
Clusters are expensive to
build and hard to manage
Tools are disparate and
difficult to integrate and use
Long deployment cycles
Hosted clusters in the Cloud
Unified Interactive Workspace
Instant deployment
What is Databricks Cloud
Managed Spark Clusters in the Cloud
Notebook Environment
3rd Party Applications
Production Pipeline Scheduler
Managed Spark Clusters in the Cloud
No lead time
No DevOps
Instant Spark clusters
The best place to run Spark
No expensive hardware
Explore and visualize data interactively
Program in Python, SQL, and Scala
Collaborate on-line in real-time
Notebook environment
WYSIWYG builder
Interactive plots
One-click publishing
Custom dashboards
Schedule production Spark jobs without complicated scripts
Production pipeline scheduler
Real-time Decisions
Learn, work and
collaborate in a single,
easy to use environment
Zero Management
Focus on your data,
not operationsInstant Results
From insights to products
in just hours
Open Platform
Use your favorite tools to
leverage a growing ecosystem.
Why Databricks Cloud
About Databricks
Founded by the creators of Spark in 2013
Largest organization contributing to Spark
End-to-end hosted platform, Databricks Cloud
Get Started Now!
Sign up for Databricks Cloud:
https://databricks.com/registration

More Related Content

What's hot (20)

PDF
Lessons from Running Large Scale Spark Workloads
Databricks
 
PDF
Spark Summit EU talk by Christos Erotocritou
Spark Summit
 
PDF
Spark streaming State of the Union - Strata San Jose 2015
Databricks
 
PDF
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
 
PDF
Spark Summit 2015 keynote: Making Big Data Simple with Spark
Databricks
 
PDF
Databricks with R: Deep Dive
Databricks
 
PPTX
Data Science with Spark & Zeppelin
Vinay Shukla
 
PDF
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Databricks
 
PDF
The BDAS Open Source Community
jeykottalam
 
PDF
GraphFrames: DataFrame-based graphs for Apache® Spark™
Databricks
 
PDF
Native Support of Prometheus Monitoring in Apache Spark 3.0
Databricks
 
PDF
What to Expect for Big Data and Apache Spark in 2017
Databricks
 
PDF
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
Databricks
 
PDF
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Databricks
 
PDF
Enabling Exploratory Analysis of Large Data with Apache Spark and R
Databricks
 
PDF
Accelerating Machine Learning and Deep Learning At Scale...With Apache Spark:...
Spark Summit
 
PDF
How Spark Fits into Baidu's Scale-(James Peng, Baidu)
Spark Summit
 
PPTX
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Databricks
 
PDF
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Databricks
 
PDF
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
Databricks
 
Lessons from Running Large Scale Spark Workloads
Databricks
 
Spark Summit EU talk by Christos Erotocritou
Spark Summit
 
Spark streaming State of the Union - Strata San Jose 2015
Databricks
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
 
Spark Summit 2015 keynote: Making Big Data Simple with Spark
Databricks
 
Databricks with R: Deep Dive
Databricks
 
Data Science with Spark & Zeppelin
Vinay Shukla
 
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Databricks
 
The BDAS Open Source Community
jeykottalam
 
GraphFrames: DataFrame-based graphs for Apache® Spark™
Databricks
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Databricks
 
What to Expect for Big Data and Apache Spark in 2017
Databricks
 
OAP: Optimized Analytics Package for Spark Platform with Daoyuan Wang and Yua...
Databricks
 
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Databricks
 
Enabling Exploratory Analysis of Large Data with Apache Spark and R
Databricks
 
Accelerating Machine Learning and Deep Learning At Scale...With Apache Spark:...
Spark Summit
 
How Spark Fits into Baidu's Scale-(James Peng, Baidu)
Spark Summit
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Databricks
 
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Databricks
 
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
Databricks
 

Similar to Databricks @ Strata SJ (20)

PDF
Modernizing to a Cloud Data Architecture
Databricks
 
PDF
Building a Turbo-fast Data Warehousing Platform with Databricks
Databricks
 
PPTX
Introduction to Databricks - AccentFuture
Accentfuture
 
PPTX
Databricks clusters in autopilot mode
Prakash Chockalingam
 
PDF
Scaling and Modernizing Data Platform with Databricks
Databricks
 
PPTX
Unleashing the Power of Cloud-Based Big Data Analytics.pptx
Golu187360
 
PPTX
Unleashing the Power of Cloud-Based Big Data Analytics.pptx
Golu187360
 
PDF
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Lillian Pierson
 
PPTX
Unleashing the Power of Cloud-Based Big Data Analytics.pptx
Golu187360
 
PDF
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
oj08
 
PDF
Infochimps #1 Big Data Platform for the Cloud
Brian Krpec
 
PDF
Six Steps to Modernize Your Data Ecosystem - Mindtree
samirandev1
 
PDF
6 Steps to Modernize Data Ecosystem with Mindtree
devraajsingh
 
PDF
Steps to Modernize Your Data Ecosystem with Mindtree Blog
sameerroshan
 
PDF
Steps to Modernize Your Data Ecosystem | Mindtree
AnikeyRoy
 
PDF
Databricks Online Course | Databricks Online Training
Accentfuture
 
PDF
Big dataimplementation hadoop_and_beyond
Patrick Bouillaud
 
PDF
Big-Data-Unveiled-A-Beginners-Guide-to-Modern-Data-Technologies
Ozias Rondon
 
PPTX
Big data solutions on cloud – the way forward
Kiththi Perera
 
PPTX
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Kiththi Perera
 
Modernizing to a Cloud Data Architecture
Databricks
 
Building a Turbo-fast Data Warehousing Platform with Databricks
Databricks
 
Introduction to Databricks - AccentFuture
Accentfuture
 
Databricks clusters in autopilot mode
Prakash Chockalingam
 
Scaling and Modernizing Data Platform with Databricks
Databricks
 
Unleashing the Power of Cloud-Based Big Data Analytics.pptx
Golu187360
 
Unleashing the Power of Cloud-Based Big Data Analytics.pptx
Golu187360
 
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Lillian Pierson
 
Unleashing the Power of Cloud-Based Big Data Analytics.pptx
Golu187360
 
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
oj08
 
Infochimps #1 Big Data Platform for the Cloud
Brian Krpec
 
Six Steps to Modernize Your Data Ecosystem - Mindtree
samirandev1
 
6 Steps to Modernize Data Ecosystem with Mindtree
devraajsingh
 
Steps to Modernize Your Data Ecosystem with Mindtree Blog
sameerroshan
 
Steps to Modernize Your Data Ecosystem | Mindtree
AnikeyRoy
 
Databricks Online Course | Databricks Online Training
Accentfuture
 
Big dataimplementation hadoop_and_beyond
Patrick Bouillaud
 
Big-Data-Unveiled-A-Beginners-Guide-to-Modern-Data-Technologies
Ozias Rondon
 
Big data solutions on cloud – the way forward
Kiththi Perera
 
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Kiththi Perera
 
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
PPT
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 4
Databricks
 
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
PDF
Democratizing Data Quality Through a Centralized Platform
Databricks
 
PDF
Learn to Use Databricks for Data Science
Databricks
 
PDF
Why APM Is Not the Same As ML Monitoring
Databricks
 
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
PDF
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
PDF
Sawtooth Windows for Feature Aggregations
Databricks
 
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
PDF
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
PDF
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
PDF
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Ad

Recently uploaded (20)

PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PPTX
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PPTX
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
PPTX
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
PDF
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 

Databricks @ Strata SJ

  • 1. Making Big Data Simple with Databricks Cloud February 2015
  • 2. Your company starts a Big Data initiative You have data, you have questions, now what?
  • 3. You have to navigate your data pipeline Get a cluster up and running Build a production pipeline Import and explore data
  • 4. Problem: the journey is complex and costly Expensive to build and hard to manage Disparate and difficult tools Months of re- engineering to deploy Get a cluster up and running Build a production pipeline Import and explore data
  • 5. The Solution: Databricks Cloud A unified, cloud-hosted data platform powered by Spark
  • 6. From Challenges to Solutions Clusters are expensive to build and hard to manage Tools are disparate and difficult to integrate and use Long deployment cycles Hosted clusters in the Cloud Unified Interactive Workspace Instant deployment
  • 7. What is Databricks Cloud Managed Spark Clusters in the Cloud Notebook Environment 3rd Party Applications Production Pipeline Scheduler
  • 8. Managed Spark Clusters in the Cloud No lead time No DevOps Instant Spark clusters The best place to run Spark No expensive hardware
  • 9. Explore and visualize data interactively Program in Python, SQL, and Scala Collaborate on-line in real-time Notebook environment
  • 10. WYSIWYG builder Interactive plots One-click publishing Custom dashboards
  • 11. Schedule production Spark jobs without complicated scripts Production pipeline scheduler
  • 12. Real-time Decisions Learn, work and collaborate in a single, easy to use environment Zero Management Focus on your data, not operationsInstant Results From insights to products in just hours Open Platform Use your favorite tools to leverage a growing ecosystem. Why Databricks Cloud
  • 13. About Databricks Founded by the creators of Spark in 2013 Largest organization contributing to Spark End-to-end hosted platform, Databricks Cloud
  • 14. Get Started Now! Sign up for Databricks Cloud: https://databricks.com/registration

Editor's Notes

  • #7: We do this by addressing every one of the three challenges: 1) To alleviate the need to set up and maintain clusters we provide a hosted solution, which makes it very easy to instantiate and manage clusters. 2) To obviate the need to deal with a zoo of tools, we are leveraging Apache Spark which integrates the functionalities of many of the existing big data tools and systems. 3) Finally, to dramatically simplify the use of these tools, we are introducing a web Workspace which allows users to interactively query and visualize the data.
  • #9: Notebooks provide interactive query processing and visualization, and support on-line collaboration, thus allowing multiple users conduct jointly data explorations. Currently, notebooks allows users to query and analyze data using python, sql, and scala.
  • #10: Notebooks provide interactive query processing and visualization, and support on-line collaboration, thus allowing multiple users conduct jointly data explorations. Currently, notebooks allows users to query and analyze data using python, sql, and scala.
  • #11: Once you create one or more notebooks, you can take the most interesting results from these notebooks and create sophisticated dashboards. You can do this through a simple and intuitive dashboard builder, and then publish a dashboard with the click of a click. The dashboards are interactive in that every plot can depend on one or more variables. When these variables are updated, the query behind each plot is automatically re-executed, and the plot is regenerated.
  • #12: Finally, the Workspace includes a job launcher that enables you to run arbitrary spark jobs programmatically. For example you can automatically schedule jobs to run when their input changes or periodically.