Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: Spark Summit East talk by Jorg Schad

7 likes1,350 views

The document discusses powering predictive mapping at scale using the SMACK stack, which includes Spark, Kafka, and Elasticsearch. It describes how the SMACK stack can ingest millions of events per second from connected devices, store the data in Apache Spark, and allow real-time and batch processing of the data. It also provides an example of using the stack for real-time tracking of geo-enabled IoT devices and demonstrates the data flow and a demo of the system.

Data & Analytics

© 2016 Mesosphere, Inc. All Rights Reserved. 1
@joerg_schad @dcos #smack
Powering Predictive
Mapping at Scale with
Spark, Kafka, and Elastic
Search
Spark Summit East
February 08, 2017

© 2016 Mesosphere, Inc. All Rights Reserved. 2
Jörg Schad
Distributed Systems Engineer
@joerg_schad

© 2016 Mesosphere, Inc. All Rights Reserved. 3
HYPERSCALE MEANS VOLUME AND VELOCITY
Batch Event ProcessingMicro-Batch
Days Hours Minutes Seconds Microseconds
Solves problems using predictive and prescriptive analyticsReports what has happened using descriptive analytics
Predictive User InterfaceReal-time Pricing and Routing Real-time AdvertisingBilling, Chargeback Product Recommendations

© 2016 Mesosphere, Inc. All Rights Reserved. 4
SMACK stack
EVENTS
Ubiquitous data streams
from connected devices
INGEST
Apache Kafka
STORE
Apache Spark
ANALYZE
Apache Cassandra
ACT
Akka
Ingest millions of events
per second
Distributed & highly
scalable database
Real-time and batch
process data
Visualize data and build
data driven applications
DC/OS
Sensors
Devices
Clients

© 2016 Mesosphere, Inc. All Rights Reserved. 5
NAIVE APPROACH
Typical Datacenter
siloed, over-provisioned servers,
low utilization
Industry Average
12-15% utilization
mySQL
microservice
Cassandra
Spark/Hadoop
Kafka

© 2016 Mesosphere, Inc. All Rights Reserved. 6
Mesos &
DC/OS

© 2016 Mesosphere, Inc. All Rights Reserved. 7
MULTIPLEXING OF DATA, SERVICES, USERS, ENVIRONMENTS
Typical Datacenter
siloed, over-provisioned servers,
low utilization
Mesos/ DC/OS
automated schedulers, workload multiplexing onto the
same machines
mySQL
microservice
Cassandra
Spark/Hadoop
Kafka

© 2016 Mesosphere, Inc. All Rights Reserved. 8
DC/OS ENABLES MODERN DISTRIBUTED APPS
Datacenter Operating System (DC/OS)
Distributed Systems Kernel (Mesos)
Big Data + Analytics EnginesMicroservices (in containers)
Streaming
Batch
Machine Learning
Analytics
Functions &
Logic
Search
Time Series
SQL / NoSQL
Databases
Modern App Components
Distributed systems kernel to
abstract resources
Ecosystem of frameworks & apps
Consistent architecture to run on
top of kernel
User Interface (GUI & CLI)
Core system services
(e.g., distributed init, cron, service
discovery, package mgt & installer,
storage)
Any Infrastructure (Physical, Virtual, Cloud)

© 2016 Mesosphere, Inc. All Rights Reserved. 9
EXAMPLE:
REAL-TIME
TRACKING

© 2016 Mesosphere, Inc. All Rights Reserved. 10
GEO-ENABLED IoT

© 2016 Mesosphere, Inc. All Rights Reserved. 11
DATA FLOW

© 2016 Mesosphere, Inc. All Rights Reserved. 12
DEMO

© 2016 Mesosphere, Inc. All Rights Reserved. 13
THANK YOU!
ANY
QUESTIONS?
@dcos
users@dcos.io
/groups/8295652
/dcos
/dcos/examples
/dcos/demos
chat.dcos.io

© 2017 Mesosphere, Inc. All Rights Reserved. 14
Keep it running!

© 2016 Mesosphere, Inc. All Rights Reserved. 15
SERVICE OPERATIONS
● Configuration Updates (ex: Scaling, re-configuration)
● Binary Upgrades
● Cluster Maintenance (ex: Backup, Restore, Restart)
● Monitor progress of operations
● Debug any runtime blockages

© 2016 Mesosphere, Inc. All Rights Reserved. 16
Typical Use: distributed, large-scale data
processing; micro-batching
Why Spark Streaming?
● Micro-batching creates very low
latency, which can be faster
● Well defined role means it fits in well
with other pieces of the pipeline
APACHE SPARK (STREAMING)

Ad

Recommended

PDF

Going Real-Time: Creating Frequently-Updating Datasets for Personalization: S...Spark Summit

PDF

Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...Spark Summit

PDF

A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...Spark Summit

PDF

Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Spark Summit

PDF

Trends for Big Data and Apache Spark in 2017 by Matei ZahariaSpark Summit

PPTX

R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk ...Spark Summit

PDF

Big Telco - Yousun JeongSpark Summit

PDF

Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...Spark Summit

PDF

Modeling Catastrophic Events in Spark: Spark Summit East Talk by Georg Hofman...Spark Summit

PDF

ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...Alluxio, Inc.

PDF

RISELab:Enabling Intelligent Real-Time DecisionsJen Aman

PDF

Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Spark Summit

PDF

Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)Spark Summit

PDF

Spark Summit San Francisco 2016 - Ali Ghodsi KeynoteDatabricks

PDF

Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...Spark Summit

PDF

Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...Spark Summit

PDF

Realtime Analytical Query Processing and Predictive Model Building on High Di...Spark Summit

PPTX

Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemCloudera, Inc.

PDF

How to teach your data scientist to leverage an analytics cluster with Presto...Alluxio, Inc.

PDF

Apache Spark At Scale in the CloudDatabricks

PDF

Architecture at ScaleElasticsearch

PPTX

From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataDatabricks

PDF

Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Spark Summit

PDF

Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark Summit

PDF

Smack Stack and Beyond—Building Fast Data Pipelines with Jorg SchadSpark Summit

PDF

Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...Spark Summit

PDF

Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaSpark Summit

PDF

Spark Streaming and MLlib - Hyderabad Spark GroupPhaneendra Chiruvella

PPTX

Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Spark Summit

PDF

IoT and the Autonomous Vehicle in the Clouds: Simultaneous Localization and M...Spark Summit

More Related Content

What's hot (20)

PDF

Modeling Catastrophic Events in Spark: Spark Summit East Talk by Georg Hofman...Spark Summit

PDF

ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...Alluxio, Inc.

PDF

RISELab:Enabling Intelligent Real-Time DecisionsJen Aman

PDF

Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Spark Summit

PDF

Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)Spark Summit

PDF

Spark Summit San Francisco 2016 - Ali Ghodsi KeynoteDatabricks

PDF

Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...Spark Summit

PDF

Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...Spark Summit

PDF

Realtime Analytical Query Processing and Predictive Model Building on High Di...Spark Summit

PPTX

Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemCloudera, Inc.

PDF

How to teach your data scientist to leverage an analytics cluster with Presto...Alluxio, Inc.

PDF

Apache Spark At Scale in the CloudDatabricks

PDF

Architecture at ScaleElasticsearch

PPTX

From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataDatabricks

PDF

Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Spark Summit

PDF

Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark Summit

PDF

Smack Stack and Beyond—Building Fast Data Pipelines with Jorg SchadSpark Summit

PDF

Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...Spark Summit

PDF

Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaSpark Summit

PDF

Spark Streaming and MLlib - Hyderabad Spark GroupPhaneendra Chiruvella

Modeling Catastrophic Events in Spark: Spark Summit East Talk by Georg Hofman...Spark Summit

ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...Alluxio, Inc.

RISELab:Enabling Intelligent Real-Time DecisionsJen Aman

Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Spark Summit

Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)Spark Summit

Spark Summit San Francisco 2016 - Ali Ghodsi KeynoteDatabricks

Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by ...Spark Summit

Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...Spark Summit

Realtime Analytical Query Processing and Predictive Model Building on High Di...Spark Summit

Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemCloudera, Inc.

How to teach your data scientist to leverage an analytics cluster with Presto...Alluxio, Inc.

Apache Spark At Scale in the CloudDatabricks

Architecture at ScaleElasticsearch

From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataDatabricks

Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Spark Summit

Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark Summit

Smack Stack and Beyond—Building Fast Data Pipelines with Jorg SchadSpark Summit

Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...Spark Summit

Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaSpark Summit

Spark Streaming and MLlib - Hyderabad Spark GroupPhaneendra Chiruvella

Viewers also liked (20)

PPTX

Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Spark Summit

PDF

IoT and the Autonomous Vehicle in the Clouds: Simultaneous Localization and M...Spark Summit

PPTX

High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...Spark Summit

PDF

Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungSpark Summit

PDF

ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...Spark Summit

PDF

Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Spark Summit

PDF

Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...Spark Summit

PDF

Using SparkR to Scale Data Science Applications in Production. Lessons from t...Spark Summit

PDF

Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...Spark Summit

PDF

Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Spark Summit

PPTX

Tuning and Monitoring Deep Learning on Apache SparkDatabricks

PDF

Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...Spark Summit

PDF

Real-time Platform for Second Look Business Use Case Using Spark and Kafka: S...Spark Summit

PPTX

Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...Spark Summit

PDF

BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...Spark Summit

PDF

Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Spark Summit

PDF

Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark Summit

PDF

Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...Spark Summit

PDF

Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Spark Summit

PDF

Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...Spark Summit

Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Spark Summit

IoT and the Autonomous Vehicle in the Clouds: Simultaneous Localization and M...Spark Summit

High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...Spark Summit

Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungSpark Summit

ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...Spark Summit

Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Spark Summit

Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...Spark Summit

Using SparkR to Scale Data Science Applications in Production. Lessons from t...Spark Summit

Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...Spark Summit

Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Spark Summit

Tuning and Monitoring Deep Learning on Apache SparkDatabricks

Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...Spark Summit

Real-time Platform for Second Look Business Use Case Using Spark and Kafka: S...Spark Summit

Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...Spark Summit

BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...Spark Summit

Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Spark Summit

Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark Summit

Building Deep Learning Powered Big Data: Spark Summit East talk by Jiao Wang ...Spark Summit

Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Spark Summit

Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...Spark Summit

Ad

Similar to Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: Spark Summit East talk by Jorg Schad (20)

PDF

[DO16] Mesosphere : Microservices meet Fast Data on Azure de:code 2017

PDF

DOD 2016 - Jörg Schad - How Fast Data and Microservices Change the Datacenter.PROIDEA

PDF

Downtime is not an option - day 2 operations - Jörg SchadCodemotion

PDF

OSDC 2018 | From batch to pipelines – why Apache Mesos and DC/OS are a soluti...NETWAYS

PDF

Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search wi...Databricks

PDF

Webinar - Big Data: Let's SMACK - Jorg SchadCodemotion

PPTX

DevOps in Age of KubernetesMesosphere Inc.

PDF

Elastic data services on Apache Mesos via Mesosphere’s DCOSharrythewiz

PDF

Introduction to DC/OSMatt Jarvis

PDF

DevOps vs. Site Reliability Engineering (SRE) in Age of KubernetesDevOps.com

PPTX

Journey to the Modern App with Containers, Microservices and Big DataLightbend

PDF

Introduction to DC/OSMatt Jarvis

PPTX

Episode 3: Kubernetes and Big Data ServicesMesosphere Inc.

PDF

SMACK stack and beyondMatt Jarvis

PPTX

Episode 4: Operating Kubernetes at Scale with DC/OSMesosphere Inc.

PPTX

EMC World 2016 - Introduction to Mesos and MesosphereDavid vonThenen

PDF

Mesos, DC/OS and the Architecture of the New DatacenterQAware GmbH

PDF

Easy Docker Deployments with Mesosphere DCOS on AzureMesosphere Inc.

PDF

A Journey to Modern Apps with Containers, Microservices and Big DataEdward Hsu

PDF

DCOS PresentationJan Repnak

[DO16] Mesosphere : Microservices meet Fast Data on Azure de:code 2017

DOD 2016 - Jörg Schad - How Fast Data and Microservices Change the Datacenter.PROIDEA

Downtime is not an option - day 2 operations - Jörg SchadCodemotion

OSDC 2018 | From batch to pipelines – why Apache Mesos and DC/OS are a soluti...NETWAYS

Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search wi...Databricks

Webinar - Big Data: Let's SMACK - Jorg SchadCodemotion

DevOps in Age of KubernetesMesosphere Inc.

Elastic data services on Apache Mesos via Mesosphere’s DCOSharrythewiz

Introduction to DC/OSMatt Jarvis

DevOps vs. Site Reliability Engineering (SRE) in Age of KubernetesDevOps.com

Journey to the Modern App with Containers, Microservices and Big DataLightbend

Introduction to DC/OSMatt Jarvis

Episode 3: Kubernetes and Big Data ServicesMesosphere Inc.

SMACK stack and beyondMatt Jarvis

Episode 4: Operating Kubernetes at Scale with DC/OSMesosphere Inc.

EMC World 2016 - Introduction to Mesos and MesosphereDavid vonThenen

Mesos, DC/OS and the Architecture of the New DatacenterQAware GmbH

Easy Docker Deployments with Mesosphere DCOS on AzureMesosphere Inc.

A Journey to Modern Apps with Containers, Microservices and Big DataEdward Hsu

DCOS PresentationJan Repnak

Ad

More from Spark Summit (20)

PDF

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit

PDF

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit

PDF

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit

PDF

Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit

PDF

A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit

PDF

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit

PDF

Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit

PDF

Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit

PDF

MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit

PDF

Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit

PDF

Powering a Startup with Apache Spark with Kevin KimSpark Summit

PDF

Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit

PDF

Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit

PDF

How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit

PDF

Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit

PDF

Goal Based Data Production with Sim SimeonovSpark Summit

PDF

Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit

PDF

Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit

PDF

Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit

PDF

MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit

Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit

A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit

Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit

Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit

MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit

Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit

Powering a Startup with Apache Spark with Kevin KimSpark Summit

Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit

Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit

How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit

Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit

Goal Based Data Production with Sim SimeonovSpark Summit

Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit

Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit

Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit

MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit

Recently uploaded (20)

PDF

JavaScript - Good or Bad? Tips for Google Tag Manager📊 Markus Baersch

PDF

InformaticsPractices-MS - Google Docs.pdfseshuashwin0829

PDF

Optimizing Large Language Models with vLLM and Related Tools.pdfTamanna36

PDF

apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)apidays

PPTX

Listify-Intelligent-Voice-to-Catalog-Agent.pptxnareshkottees

PPTX

04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025FinTech Belgium

PDF

Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdfKPycho

PDF

Business implication of Artificial Intelligence.pdfVishalChugh12

PDF

Data Science Course Certificate by Sigma Software UniversityStepan Kalika

PPTX

apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...apidays

PDF

apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)apidays

PDF

apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...apidays

PPTX

b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptxAnees487379

PPTX

thid ppt defines the ich guridlens and gives the information about the ICH gu...shaistabegum14

PPTX

apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...apidays

PDF

apidays Singapore 2025 - Surviving an interconnected world with API governanc...apidays

PPTX

apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...apidays

PDF

Driving Employee Engagement in a Hybrid World.pdfMia scott

PDF

The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...Lal Chandran

PPTX

apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...apidays

JavaScript - Good or Bad? Tips for Google Tag Manager📊 Markus Baersch

InformaticsPractices-MS - Google Docs.pdfseshuashwin0829

Optimizing Large Language Models with vLLM and Related Tools.pdfTamanna36

apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)apidays

Listify-Intelligent-Voice-to-Catalog-Agent.pptxnareshkottees

04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025FinTech Belgium

Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdfKPycho

Business implication of Artificial Intelligence.pdfVishalChugh12

Data Science Course Certificate by Sigma Software UniversityStepan Kalika

apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...apidays

apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)apidays

apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...apidays

b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptxAnees487379

thid ppt defines the ich guridlens and gives the information about the ICH gu...shaistabegum14

apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...apidays

apidays Singapore 2025 - Surviving an interconnected world with API governanc...apidays

apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...apidays

Driving Employee Engagement in a Hybrid World.pdfMia scott

The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...Lal Chandran

apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...apidays

Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: Spark Summit East talk by Jorg Schad

1. © 2016 Mesosphere, Inc. All Rights Reserved. 1 @joerg_schad @dcos #smack Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search Spark Summit East February 08, 2017

2. © 2016 Mesosphere, Inc. All Rights Reserved. 2 Jörg Schad Distributed Systems Engineer @joerg_schad

3. © 2016 Mesosphere, Inc. All Rights Reserved. 3 HYPERSCALE MEANS VOLUME AND VELOCITY Batch Event ProcessingMicro-Batch Days Hours Minutes Seconds Microseconds Solves problems using predictive and prescriptive analyticsReports what has happened using descriptive analytics Predictive User InterfaceReal-time Pricing and Routing Real-time AdvertisingBilling, Chargeback Product Recommendations

4. © 2016 Mesosphere, Inc. All Rights Reserved. 4 SMACK stack EVENTS Ubiquitous data streams from connected devices INGEST Apache Kafka STORE Apache Spark ANALYZE Apache Cassandra ACT Akka Ingest millions of events per second Distributed & highly scalable database Real-time and batch process data Visualize data and build data driven applications DC/OS Sensors Devices Clients

5. © 2016 Mesosphere, Inc. All Rights Reserved. 5 NAIVE APPROACH Typical Datacenter siloed, over-provisioned servers, low utilization Industry Average 12-15% utilization mySQL microservice Cassandra Spark/Hadoop Kafka

6. © 2016 Mesosphere, Inc. All Rights Reserved. 6 Mesos & DC/OS

7. © 2016 Mesosphere, Inc. All Rights Reserved. 7 MULTIPLEXING OF DATA, SERVICES, USERS, ENVIRONMENTS Typical Datacenter siloed, over-provisioned servers, low utilization Mesos/ DC/OS automated schedulers, workload multiplexing onto the same machines mySQL microservice Cassandra Spark/Hadoop Kafka

8. © 2016 Mesosphere, Inc. All Rights Reserved. 8 DC/OS ENABLES MODERN DISTRIBUTED APPS Datacenter Operating System (DC/OS) Distributed Systems Kernel (Mesos) Big Data + Analytics EnginesMicroservices (in containers) Streaming Batch Machine Learning Analytics Functions & Logic Search Time Series SQL / NoSQL Databases Modern App Components Distributed systems kernel to abstract resources Ecosystem of frameworks & apps Consistent architecture to run on top of kernel User Interface (GUI & CLI) Core system services (e.g., distributed init, cron, service discovery, package mgt & installer, storage) Any Infrastructure (Physical, Virtual, Cloud)

9. © 2016 Mesosphere, Inc. All Rights Reserved. 9 EXAMPLE: REAL-TIME TRACKING

10. © 2016 Mesosphere, Inc. All Rights Reserved. 10 GEO-ENABLED IoT

11. © 2016 Mesosphere, Inc. All Rights Reserved. 11 DATA FLOW

12. © 2016 Mesosphere, Inc. All Rights Reserved. 12 DEMO

13. © 2016 Mesosphere, Inc. All Rights Reserved. 13 THANK YOU! ANY QUESTIONS? @dcos [email protected] /groups/8295652 /dcos /dcos/examples /dcos/demos chat.dcos.io

14. © 2017 Mesosphere, Inc. All Rights Reserved. 14 Keep it running!

15. © 2016 Mesosphere, Inc. All Rights Reserved. 15 SERVICE OPERATIONS ● Configuration Updates (ex: Scaling, re-configuration) ● Binary Upgrades ● Cluster Maintenance (ex: Backup, Restore, Restart) ● Monitor progress of operations ● Debug any runtime blockages

16. © 2016 Mesosphere, Inc. All Rights Reserved. 16 Typical Use: distributed, large-scale data processing; micro-batching Why Spark Streaming? ● Micro-batching creates very low latency, which can be faster ● Well defined role means it fits in well with other pieces of the pipeline APACHE SPARK (STREAMING)