SlideShare a Scribd company logo
Š 2016 Mesosphere, Inc. All Rights Reserved. 1
@joerg_schad @dcos #smack
Powering Predictive
Mapping at Scale with
Spark, Kafka, and Elastic
Search
Spark Summit East
February 08, 2017
Š 2016 Mesosphere, Inc. All Rights Reserved. 2
JĂśrg Schad
Distributed Systems Engineer
@joerg_schad
Š 2016 Mesosphere, Inc. All Rights Reserved. 3
HYPERSCALE MEANS VOLUME AND VELOCITY
Batch Event ProcessingMicro-Batch
Days Hours Minutes Seconds Microseconds
Solves problems using predictive and prescriptive analyticsReports what has happened using descriptive analytics
Predictive User InterfaceReal-time Pricing and Routing Real-time AdvertisingBilling, Chargeback Product Recommendations
Š 2016 Mesosphere, Inc. All Rights Reserved. 4
SMACK stack
EVENTS
Ubiquitous data streams
from connected devices
INGEST
Apache Kafka
STORE
Apache Spark
ANALYZE
Apache Cassandra
ACT
Akka
Ingest millions of events
per second
Distributed & highly
scalable database
Real-time and batch
process data
Visualize data and build
data driven applications
DC/OS
Sensors
Devices
Clients
Š 2016 Mesosphere, Inc. All Rights Reserved. 5
NAIVE APPROACH
Typical Datacenter
siloed, over-provisioned servers,
low utilization
Industry Average
12-15% utilization
mySQL
microservice
Cassandra
Spark/Hadoop
Kafka
Š 2016 Mesosphere, Inc. All Rights Reserved. 6
Mesos &
DC/OS
Š 2016 Mesosphere, Inc. All Rights Reserved. 7
MULTIPLEXING OF DATA, SERVICES, USERS, ENVIRONMENTS
Typical Datacenter
siloed, over-provisioned servers,
low utilization
Mesos/ DC/OS
automated schedulers, workload multiplexing onto the
same machines
mySQL
microservice
Cassandra
Spark/Hadoop
Kafka
Š 2016 Mesosphere, Inc. All Rights Reserved. 8
DC/OS ENABLES MODERN DISTRIBUTED APPS
Datacenter Operating System (DC/OS)
Distributed Systems Kernel (Mesos)
Big Data + Analytics EnginesMicroservices (in containers)
Streaming
Batch
Machine Learning
Analytics
Functions &
Logic
Search
Time Series
SQL / NoSQL
Databases
Modern App Components
Distributed systems kernel to
abstract resources
Ecosystem of frameworks & apps
Consistent architecture to run on
top of kernel
User Interface (GUI & CLI)
Core system services
(e.g., distributed init, cron, service
discovery, package mgt & installer,
storage)
Any Infrastructure (Physical, Virtual, Cloud)
Š 2016 Mesosphere, Inc. All Rights Reserved. 9
EXAMPLE:
REAL-TIME
TRACKING
Š 2016 Mesosphere, Inc. All Rights Reserved. 10
GEO-ENABLED IoT
Š 2016 Mesosphere, Inc. All Rights Reserved. 11
DATA FLOW
Š 2016 Mesosphere, Inc. All Rights Reserved. 12
DEMO
Š 2016 Mesosphere, Inc. All Rights Reserved. 13
THANK YOU!
ANY
QUESTIONS?
@dcos
users@dcos.io
/groups/8295652
/dcos
/dcos/examples
/dcos/demos
chat.dcos.io
Š 2017 Mesosphere, Inc. All Rights Reserved. 14
Keep it running!
Š 2016 Mesosphere, Inc. All Rights Reserved. 15
SERVICE OPERATIONS
● Configuration Updates (ex: Scaling, re-configuration)
● Binary Upgrades
● Cluster Maintenance (ex: Backup, Restore, Restart)
● Monitor progress of operations
● Debug any runtime blockages
Š 2016 Mesosphere, Inc. All Rights Reserved. 16
Typical Use: distributed, large-scale data
processing; micro-batching
Why Spark Streaming?
● Micro-batching creates very low
latency, which can be faster
● Well defined role means it fits in well
with other pieces of the pipeline
APACHE SPARK (STREAMING)

More Related Content

What's hot (20)

PDF
Modeling Catastrophic Events in Spark: Spark Summit East Talk by Georg Hofman...
Spark Summit
 
PDF
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
Alluxio, Inc.
 
PDF
RISELab:Enabling Intelligent Real-Time Decisions
Jen Aman
 
PDF
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Spark Summit
 
PDF
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark Summit
 
PDF
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
Databricks
 
PDF
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Spark Summit
 
PDF
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Spark Summit
 
PDF
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Spark Summit
 
PPTX
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Cloudera, Inc.
 
PDF
How to teach your data scientist to leverage an analytics cluster with Presto...
Alluxio, Inc.
 
PDF
Apache Spark At Scale in the Cloud
Databricks
 
PDF
Architecture at Scale
Elasticsearch
 
PPTX
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Databricks
 
PDF
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Spark Summit
 
PDF
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark Summit
 
PDF
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Spark Summit
 
PDF
Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...
Spark Summit
 
PDF
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Spark Summit
 
PDF
Spark Streaming and MLlib - Hyderabad Spark Group
Phaneendra Chiruvella
 
Modeling Catastrophic Events in Spark: Spark Summit East Talk by Georg Hofman...
Spark Summit
 
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
Alluxio, Inc.
 
RISELab:Enabling Intelligent Real-Time Decisions
Jen Aman
 
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Spark Summit
 
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark Summit
 
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
Databricks
 
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...
Spark Summit
 
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Spark Summit
 
Realtime Analytical Query Processing and Predictive Model Building on High Di...
Spark Summit
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Cloudera, Inc.
 
How to teach your data scientist to leverage an analytics cluster with Presto...
Alluxio, Inc.
 
Apache Spark At Scale in the Cloud
Databricks
 
Architecture at Scale
Elasticsearch
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Databricks
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Spark Summit
 
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark Summit
 
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Spark Summit
 
Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...
Spark Summit
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Spark Summit
 
Spark Streaming and MLlib - Hyderabad Spark Group
Phaneendra Chiruvella
 

Viewers also liked (20)

PPTX
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Spark Summit
 
PDF
IoT and the Autonomous Vehicle in the Clouds: Simultaneous Localization and M...
Spark Summit
 
PPTX
High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...
Spark Summit
 
PDF
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Spark Summit
 
PDF
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
Spark Summit
 
PDF
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Spark Summit
 
PDF
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...
Spark Summit
 
PDF
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Spark Summit
 
PDF
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Spark Summit
 
PDF
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Spark Summit
 
PPTX
Tuning and Monitoring Deep Learning on Apache Spark
Databricks
 
PDF
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Spark Summit
 
PDF
Real-time Platform for Second Look Business Use Case Using Spark and Kafka: S...
Spark Summit
 
PPTX
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Spark Summit
 
PDF
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
Spark Summit
 
PDF
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Spark Summit
 
PDF
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark Summit
 
PDF
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...
Spark Summit
 
PDF
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Spark Summit
 
PDF
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Spark Summit
 
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Spark Summit
 
IoT and the Autonomous Vehicle in the Clouds: Simultaneous Localization and M...
Spark Summit
 
High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...
Spark Summit
 
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Spark Summit
 
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
Spark Summit
 
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Spark Summit
 
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...
Spark Summit
 
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Spark Summit
 
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Spark Summit
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Spark Summit
 
Tuning and Monitoring Deep Learning on Apache Spark
Databricks
 
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Spark Summit
 
Real-time Platform for Second Look Business Use Case Using Spark and Kafka: S...
Spark Summit
 
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Spark Summit
 
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
Spark Summit
 
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Spark Summit
 
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark Summit
 
Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...
Spark Summit
 
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Spark Summit
 
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Spark Summit
 
Ad

Similar to Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: Spark Summit East talk by Jorg Schad (20)

PDF
[DO16] Mesosphere : Microservices meet Fast Data on Azure
de:code 2017
 
PDF
DOD 2016 - JĂśrg Schad - How Fast Data and Microservices Change the Datacenter.
PROIDEA
 
PDF
Downtime is not an option - day 2 operations - JĂśrg Schad
Codemotion
 
PDF
OSDC 2018 | From batch to pipelines – why Apache Mesos and DC/OS are a soluti...
NETWAYS
 
PDF
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search wi...
Databricks
 
PDF
Webinar - Big Data: Let's SMACK - Jorg Schad
Codemotion
 
PPTX
DevOps in Age of Kubernetes
Mesosphere Inc.
 
PDF
Elastic data services on Apache Mesos via Mesosphere’s DCOS
harrythewiz
 
PDF
Introduction to DC/OS
Matt Jarvis
 
PDF
DevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
DevOps.com
 
PPTX
Journey to the Modern App with Containers, Microservices and Big Data
Lightbend
 
PDF
Introduction to DC/OS
Matt Jarvis
 
PPTX
Episode 3: Kubernetes and Big Data Services
Mesosphere Inc.
 
PDF
SMACK stack and beyond
Matt Jarvis
 
PPTX
Episode 4: Operating Kubernetes at Scale with DC/OS
Mesosphere Inc.
 
PPTX
EMC World 2016 - Introduction to Mesos and Mesosphere
David vonThenen
 
PDF
Mesos, DC/OS and the Architecture of the New Datacenter
QAware GmbH
 
PDF
Easy Docker Deployments with Mesosphere DCOS on Azure
Mesosphere Inc.
 
PDF
A Journey to Modern Apps with Containers, Microservices and Big Data
Edward Hsu
 
PDF
DCOS Presentation
Jan Repnak
 
[DO16] Mesosphere : Microservices meet Fast Data on Azure
de:code 2017
 
DOD 2016 - JĂśrg Schad - How Fast Data and Microservices Change the Datacenter.
PROIDEA
 
Downtime is not an option - day 2 operations - JĂśrg Schad
Codemotion
 
OSDC 2018 | From batch to pipelines – why Apache Mesos and DC/OS are a soluti...
NETWAYS
 
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search wi...
Databricks
 
Webinar - Big Data: Let's SMACK - Jorg Schad
Codemotion
 
DevOps in Age of Kubernetes
Mesosphere Inc.
 
Elastic data services on Apache Mesos via Mesosphere’s DCOS
harrythewiz
 
Introduction to DC/OS
Matt Jarvis
 
DevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
DevOps.com
 
Journey to the Modern App with Containers, Microservices and Big Data
Lightbend
 
Introduction to DC/OS
Matt Jarvis
 
Episode 3: Kubernetes and Big Data Services
Mesosphere Inc.
 
SMACK stack and beyond
Matt Jarvis
 
Episode 4: Operating Kubernetes at Scale with DC/OS
Mesosphere Inc.
 
EMC World 2016 - Introduction to Mesos and Mesosphere
David vonThenen
 
Mesos, DC/OS and the Architecture of the New Datacenter
QAware GmbH
 
Easy Docker Deployments with Mesosphere DCOS on Azure
Mesosphere Inc.
 
A Journey to Modern Apps with Containers, Microservices and Big Data
Edward Hsu
 
DCOS Presentation
Jan Repnak
 
Ad

More from Spark Summit (20)

PDF
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
PDF
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
PDF
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
PDF
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
PDF
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
PDF
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
PDF
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
PDF
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
PDF
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
PDF
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
PDF
Goal Based Data Production with Sim Simeonov
Spark Summit
 
PDF
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
PDF
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
Goal Based Data Production with Sim Simeonov
Spark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 

Recently uploaded (20)

PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PDF
InformaticsPractices-MS - Google Docs.pdf
seshuashwin0829
 
PDF
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PPTX
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
PDF
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
PDF
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
PDF
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PDF
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PPTX
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
InformaticsPractices-MS - Google Docs.pdf
seshuashwin0829
 
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 

Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: Spark Summit East talk by Jorg Schad

  • 1. Š 2016 Mesosphere, Inc. All Rights Reserved. 1 @joerg_schad @dcos #smack Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search Spark Summit East February 08, 2017
  • 2. Š 2016 Mesosphere, Inc. All Rights Reserved. 2 JĂśrg Schad Distributed Systems Engineer @joerg_schad
  • 3. Š 2016 Mesosphere, Inc. All Rights Reserved. 3 HYPERSCALE MEANS VOLUME AND VELOCITY Batch Event ProcessingMicro-Batch Days Hours Minutes Seconds Microseconds Solves problems using predictive and prescriptive analyticsReports what has happened using descriptive analytics Predictive User InterfaceReal-time Pricing and Routing Real-time AdvertisingBilling, Chargeback Product Recommendations
  • 4. Š 2016 Mesosphere, Inc. All Rights Reserved. 4 SMACK stack EVENTS Ubiquitous data streams from connected devices INGEST Apache Kafka STORE Apache Spark ANALYZE Apache Cassandra ACT Akka Ingest millions of events per second Distributed & highly scalable database Real-time and batch process data Visualize data and build data driven applications DC/OS Sensors Devices Clients
  • 5. Š 2016 Mesosphere, Inc. All Rights Reserved. 5 NAIVE APPROACH Typical Datacenter siloed, over-provisioned servers, low utilization Industry Average 12-15% utilization mySQL microservice Cassandra Spark/Hadoop Kafka
  • 6. Š 2016 Mesosphere, Inc. All Rights Reserved. 6 Mesos & DC/OS
  • 7. Š 2016 Mesosphere, Inc. All Rights Reserved. 7 MULTIPLEXING OF DATA, SERVICES, USERS, ENVIRONMENTS Typical Datacenter siloed, over-provisioned servers, low utilization Mesos/ DC/OS automated schedulers, workload multiplexing onto the same machines mySQL microservice Cassandra Spark/Hadoop Kafka
  • 8. Š 2016 Mesosphere, Inc. All Rights Reserved. 8 DC/OS ENABLES MODERN DISTRIBUTED APPS Datacenter Operating System (DC/OS) Distributed Systems Kernel (Mesos) Big Data + Analytics EnginesMicroservices (in containers) Streaming Batch Machine Learning Analytics Functions & Logic Search Time Series SQL / NoSQL Databases Modern App Components Distributed systems kernel to abstract resources Ecosystem of frameworks & apps Consistent architecture to run on top of kernel User Interface (GUI & CLI) Core system services (e.g., distributed init, cron, service discovery, package mgt & installer, storage) Any Infrastructure (Physical, Virtual, Cloud)
  • 9. Š 2016 Mesosphere, Inc. All Rights Reserved. 9 EXAMPLE: REAL-TIME TRACKING
  • 10. Š 2016 Mesosphere, Inc. All Rights Reserved. 10 GEO-ENABLED IoT
  • 11. Š 2016 Mesosphere, Inc. All Rights Reserved. 11 DATA FLOW
  • 12. Š 2016 Mesosphere, Inc. All Rights Reserved. 12 DEMO
  • 13. Š 2016 Mesosphere, Inc. All Rights Reserved. 13 THANK YOU! ANY QUESTIONS? @dcos [email protected] /groups/8295652 /dcos /dcos/examples /dcos/demos chat.dcos.io
  • 14. Š 2017 Mesosphere, Inc. All Rights Reserved. 14 Keep it running!
  • 15. Š 2016 Mesosphere, Inc. All Rights Reserved. 15 SERVICE OPERATIONS ● Configuration Updates (ex: Scaling, re-configuration) ● Binary Upgrades ● Cluster Maintenance (ex: Backup, Restore, Restart) ● Monitor progress of operations ● Debug any runtime blockages
  • 16. Š 2016 Mesosphere, Inc. All Rights Reserved. 16 Typical Use: distributed, large-scale data processing; micro-batching Why Spark Streaming? ● Micro-batching creates very low latency, which can be faster ● Well defined role means it fits in well with other pieces of the pipeline APACHE SPARK (STREAMING)