Democratizing Access
to Spark
Ali Ghodsi
Main Challenge
Big Data is Hard
2
Databricks
Goal
Democratize Big Data
3
Databricks Cloud Platform
Hosted Model
We ensureeverything works end-to-end
Rapid releases
Iterate quickly based on customer feedback
Dynamicuse
Customers scaledynamically based on needs
4
Databricks Cloud Platform
FS
CLOUD HADOOP DATA WAREHOUSE
Your Storage
DBMS
Databricks Platform
OPEN SOURCE
MANAGEMENT
Security Controls
BI connectivity
24x7 SLAs
Multi-tenancy
Production Jobs
Managed Clusters
SQL
Machine Learning
R
Graph
Streaming
Integrated Workspace NOTEBOOKS COLLABORATION REST APIsDASHBOARDS
5
How is this used so far?
Just-in-time DataWarehouse Use Case
• Separate compute from storage
• 3 top 10 mass media company shortenedtime fromidea-to-app
Advanced Analytics Use Case
• Machine learningand graph processing
• Top 2 gamingcompanies,Radius modeling of 20M companies
Real-time Use Case
• Data productusingspark streaming
• Top 5 creditcard company is doing loan approvals in real-time
6
Main Lesson
Many companies struggle with big dataprojects
• Steep learning curve formany developers
Getting trained on big datais costly and time consuming
• Acquiringmachines
• Setting up and configuringinfrastructure
• Build systemswithoutaccess to much documentation
7
How do we empower more developers?
Trained 2,000 on Sparkin 2014
Launched two Massive Open Online Courses (MOOCs) in 2015
• ~125,000 tookourcourses
• ~20,000 finishedthe course
• ~500,000 hoursspentlearningSpark
How do we multiply this to democratize access to Spark?
8
Announcing
Databricks Community Edition (beta)
Free edition of Databricks Platform
• Mini Spark clusters
• Notebooks,Dashboards
• REST APIs
Continuous delivery of content
• Course and MOOC material
• Spark how-to’sand documentation
9
Every attendee gets access today!
10
Democratizing Big Data
for Organizations
Will provide seamless transition to production
• Large clusters
• Productionpipelines
• Security and Governance
Databricks Community
Edition Demo
Michael Armbrust

More Related Content

PPTX
Spark Summit Keynote by Shaun Connolly
PPTX
Spark Summit East Keynote by Anjul Bhambhri
PPTX
Spark Summit Keynote by Suren Nathan
PPTX
Spark Summit Keynote by Seshu Adunuthula
PPTX
Snaplogic Live: Big Data in Motion
PDF
Modernizing to a Cloud Data Architecture
PDF
Managing R&D Data on Parallel Compute Infrastructure
PDF
Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Spark Summit Keynote by Shaun Connolly
Spark Summit East Keynote by Anjul Bhambhri
Spark Summit Keynote by Suren Nathan
Spark Summit Keynote by Seshu Adunuthula
Snaplogic Live: Big Data in Motion
Modernizing to a Cloud Data Architecture
Managing R&D Data on Parallel Compute Infrastructure
Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl

What's hot (20)

PPTX
Azure databricks by usama whaba khan
PDF
Intro to Delta Lake
PDF
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...
PDF
Accelerating Innovation with Unified Analytics with Ali Ghodsi
PDF
Operationalizing Machine Learning at Scale at Starbucks
PPTX
CTO View: Driving the On-Demand Economy with Predictive Analytics
PDF
Spark Summit Europe 2016 Keynote - Databricks CEO
PDF
Learn to Use Databricks for the Full ML Lifecycle
PPTX
Eugene Polonichko "Architecture of modern data warehouse"
PPTX
SnapLogic Live: Salesforce Integration
PPTX
The Power of Data
PDF
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
PPTX
Real-time Microservices and In-Memory Data Grids
PDF
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
PPTX
Intuit Analytics Cloud 101
PPTX
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
PPTX
Webinar: BI in the Sky - The New Rules of Cloud Analytics
PPTX
Optimize Data for the Logical Data Warehouse
PDF
The Fast Path to Building Operational Applications with Spark
PPTX
In-Memory Computing Webcast. Market Predictions 2017
Azure databricks by usama whaba khan
Intro to Delta Lake
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...
Accelerating Innovation with Unified Analytics with Ali Ghodsi
Operationalizing Machine Learning at Scale at Starbucks
CTO View: Driving the On-Demand Economy with Predictive Analytics
Spark Summit Europe 2016 Keynote - Databricks CEO
Learn to Use Databricks for the Full ML Lifecycle
Eugene Polonichko "Architecture of modern data warehouse"
SnapLogic Live: Salesforce Integration
The Power of Data
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Real-time Microservices and In-Memory Data Grids
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Intuit Analytics Cloud 101
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Webinar: BI in the Sky - The New Rules of Cloud Analytics
Optimize Data for the Logical Data Warehouse
The Fast Path to Building Operational Applications with Spark
In-Memory Computing Webcast. Market Predictions 2017
Ad

Viewers also liked (19)

PDF
Spark meetup TCHUG
PPTX
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
PPTX
Big Data Platform Industrialization
PPTX
Migrating Clinical Data in Various Formats to a Clinical Data Management System
PDF
Cloudera Impala 1.0
PPTX
Building a modern Application with DataFrames
PPTX
Building a modern Application with DataFrames
PDF
Jump Start into Apache® Spark™ and Databricks
PDF
Scalable And Incremental Data Profiling With Spark
PDF
Hadoop application architectures - using Customer 360 as an example
PPTX
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
PPTX
Apache spark 소개 및 실습
PDF
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
PPTX
[D2 COMMUNITY] Spark User Group - 스파크를 통한 딥러닝 이론과 실제
PDF
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
PPTX
File Format Benchmark - Avro, JSON, ORC & Parquet
PPTX
Spark Summit Presentation by Anjul Bhambhri
PPTX
Parallelizing Existing R Packages with SparkR
PPTX
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Spark meetup TCHUG
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Big Data Platform Industrialization
Migrating Clinical Data in Various Formats to a Clinical Data Management System
Cloudera Impala 1.0
Building a modern Application with DataFrames
Building a modern Application with DataFrames
Jump Start into Apache® Spark™ and Databricks
Scalable And Incremental Data Profiling With Spark
Hadoop application architectures - using Customer 360 as an example
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
Apache spark 소개 및 실습
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
[D2 COMMUNITY] Spark User Group - 스파크를 통한 딥러닝 이론과 실제
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
File Format Benchmark - Avro, JSON, ORC & Parquet
Spark Summit Presentation by Anjul Bhambhri
Parallelizing Existing R Packages with SparkR
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Ad

Similar to 2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo (20)

PDF
ICP for Data- Enterprise platform for AI, ML and Data Science
PDF
Hadoop in the Cloud
PDF
IT Modernization in Practice
PPTX
Accelerating Data Warehouse Modernization
PDF
Cloud Con 2015 - Integration & Web APIs
PPTX
Crimson 3 - Final case presentation
PPTX
How Data Drives Business at Choice Hotels
PPTX
Danny Bickson - Python based predictive analytics with GraphLab Create
PPTX
Liberate Legacy Data Sources with Precisely and Databricks
PDF
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
PPTX
IBM Relay 2015: Open for Data
 
PPTX
Building a Modern Analytic Database with Cloudera 5.8
PDF
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
PDF
Slides: Success Stories for Data-to-Cloud
PDF
Building Reactive Applications With Node.Js And Red Hat JBoss Data Grid (Gald...
PPTX
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
PPT
Indonesia new default short msp client presentation partnership with isv
PDF
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
PPTX
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
PPTX
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
ICP for Data- Enterprise platform for AI, ML and Data Science
Hadoop in the Cloud
IT Modernization in Practice
Accelerating Data Warehouse Modernization
Cloud Con 2015 - Integration & Web APIs
Crimson 3 - Final case presentation
How Data Drives Business at Choice Hotels
Danny Bickson - Python based predictive analytics with GraphLab Create
Liberate Legacy Data Sources with Precisely and Databricks
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
IBM Relay 2015: Open for Data
 
Building a Modern Analytic Database with Cloudera 5.8
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Slides: Success Stories for Data-to-Cloud
Building Reactive Applications With Node.Js And Red Hat JBoss Data Grid (Gald...
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
Indonesia new default short msp client presentation partnership with isv
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
How to run Real Time processing on Big Data / Ron Zavner (GigaSpaces)
Big Data Business Wins: Real-time Inventory Tracking with Hadoop

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake

Recently uploaded (20)

PPTX
Presentation - Summer Internship at Samatrix.io_template_2.pptx
PDF
SBOM Document Quality Guide - OpenChain SBOM Study Group
PDF
C language slides for c programming book by ANSI
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
PPTX
Greedy best-first search algorithm always selects the path which appears best...
PDF
Enscape 3D Crack + With 2025 Activation Key free
PPTX
Phoenix Marketo User Group: Building Nurtures that Work for Your Audience. An...
PDF
4K Video Downloader Crack + License Key 2025
PPTX
SIH2024_IDEA_dy_dx_deepfakedetection.pptx
PPTX
Independent Consultants’ Biggest Challenges in ERP Projects – and How Apagen ...
PDF
OpenTimelineIO Virtual Town Hall - August 2025
PPTX
SAP Business AI_L1 Overview_EXTERNAL.pptx
PDF
Multiverse AI Review 2025_ The Ultimate All-in-One AI Platform.pdf
PDF
Difference Between Website and Web Application.pdf
PDF
IObit Driver Booster Pro Crack Latest Version Download
PPTX
Beige and Black Minimalist Project Deck Presentation (1).pptx
PDF
Software Development Company - swapdigit | Best Mobile App Development In India
PPTX
oracle_ebs_12.2_project_cutoveroutage.pptx
PDF
solman-7.0-ehp1-sp21-incident-management
PDF
How to Set Realistic Project Milestones and Deadlines
Presentation - Summer Internship at Samatrix.io_template_2.pptx
SBOM Document Quality Guide - OpenChain SBOM Study Group
C language slides for c programming book by ANSI
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Greedy best-first search algorithm always selects the path which appears best...
Enscape 3D Crack + With 2025 Activation Key free
Phoenix Marketo User Group: Building Nurtures that Work for Your Audience. An...
4K Video Downloader Crack + License Key 2025
SIH2024_IDEA_dy_dx_deepfakedetection.pptx
Independent Consultants’ Biggest Challenges in ERP Projects – and How Apagen ...
OpenTimelineIO Virtual Town Hall - August 2025
SAP Business AI_L1 Overview_EXTERNAL.pptx
Multiverse AI Review 2025_ The Ultimate All-in-One AI Platform.pdf
Difference Between Website and Web Application.pdf
IObit Driver Booster Pro Crack Latest Version Download
Beige and Black Minimalist Project Deck Presentation (1).pptx
Software Development Company - swapdigit | Best Mobile App Development In India
oracle_ebs_12.2_project_cutoveroutage.pptx
solman-7.0-ehp1-sp21-incident-management
How to Set Realistic Project Milestones and Deadlines

2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo

  • 2. Main Challenge Big Data is Hard 2 Databricks Goal Democratize Big Data
  • 3. 3 Databricks Cloud Platform Hosted Model We ensureeverything works end-to-end Rapid releases Iterate quickly based on customer feedback Dynamicuse Customers scaledynamically based on needs
  • 4. 4 Databricks Cloud Platform FS CLOUD HADOOP DATA WAREHOUSE Your Storage DBMS Databricks Platform OPEN SOURCE MANAGEMENT Security Controls BI connectivity 24x7 SLAs Multi-tenancy Production Jobs Managed Clusters SQL Machine Learning R Graph Streaming Integrated Workspace NOTEBOOKS COLLABORATION REST APIsDASHBOARDS
  • 5. 5 How is this used so far? Just-in-time DataWarehouse Use Case • Separate compute from storage • 3 top 10 mass media company shortenedtime fromidea-to-app Advanced Analytics Use Case • Machine learningand graph processing • Top 2 gamingcompanies,Radius modeling of 20M companies Real-time Use Case • Data productusingspark streaming • Top 5 creditcard company is doing loan approvals in real-time
  • 6. 6 Main Lesson Many companies struggle with big dataprojects • Steep learning curve formany developers Getting trained on big datais costly and time consuming • Acquiringmachines • Setting up and configuringinfrastructure • Build systemswithoutaccess to much documentation
  • 7. 7 How do we empower more developers? Trained 2,000 on Sparkin 2014 Launched two Massive Open Online Courses (MOOCs) in 2015 • ~125,000 tookourcourses • ~20,000 finishedthe course • ~500,000 hoursspentlearningSpark How do we multiply this to democratize access to Spark?
  • 8. 8 Announcing Databricks Community Edition (beta) Free edition of Databricks Platform • Mini Spark clusters • Notebooks,Dashboards • REST APIs Continuous delivery of content • Course and MOOC material • Spark how-to’sand documentation
  • 9. 9 Every attendee gets access today!
  • 10. 10 Democratizing Big Data for Organizations Will provide seamless transition to production • Large clusters • Productionpipelines • Security and Governance