Mantis in Action
Neeraj Joshi and Justin Becker
6/12/2015
Managing a complex
operational environment
is hard
Mantis qcon nyc_2015
Mantis qcon nyc_2015
Mantis qcon nyc_2015
Developing an understanding
of what is going on
Knowing what works
Developing an understanding
of what is going on
Identify what doesn’t work
Developing an understanding
of what is going on
Determining impact when doesn’t work
Developing a deep understanding
is hard, due to complexity
• Hundreds of software services
• Processing billions of requests
• For millions of users
• Operating in multiple data centers
• Across the globe
Help us (operators) make sense
out of complexity, need tools
specifically, we need…
Insight tools
…to help comprehend what is going on in our
operational environments
Make the case a relationship exists,
between complexity and comprehension
So, in order to manage complex
environments, need to rethink insights, shift
the curve
Identified three insight
‘patterns’ to help shift the curve
• Long tail analysis
• Real time tracking and trending
• Ad hoc investigation
Grouped three patterns
into a new effort
Scalable Insights
Initiative
Goal is to help us manage
(comprehend) our environments
given an increase in complexity
Mention specific insights
tools to help shift curve
• Realtime Data Explorer
• Realtime Search
• Realtime Application Monitoring
Mention other generic insights
jobs to help shift curve
• Short term historical anomaly detection
• Threshold-based anomaly detection
• Realtime metrics generation
Overview of scalable
insights in action
• 65-75 total jobs running in 3 regions
• Global access to data
• Processing 4.7 million events per second at peak
• 20 specific data sources and generic adapters
• Ability to startup jobs in ~5 seconds
Demo
Mantis qcon nyc_2015
Mantis qcon nyc_2015
Mantis qcon nyc_2015
Mantis qcon nyc_2015
Mantis, a reactive stream
processing system
Some Basics Concepts
Stream
Sequence of Events
time
1 2 3 4
Higher Order Functions
Transformations applied to a Stream
to create a new stream
time
1 4 9 16
map (x=>sqrt(x))
time
1 2 3 4
S
O
U
R
C
E
Stream IN
S
I
N
K
Result OUT
Stage 1 Stage 2 Stage n
A sequence of functions applied to a stream
map window reduce merge
Mantis Job
Mantis Job
S
O
U
R
C
E
Stream IN
S
I
N
K
Result OUT
Stage 1 Stage 2 Stage n
Named Jobs
Text
aa
Job
Name Job
Version
Are Parameterized
Parameter
Parameter
Parameter
Have SLAs
Min/Max
Instances of the
job
Min/Max
runtime
Perpetual or
Transient
Can be chained
Device
Logs Source
Job
TopN
Job
Server
Logs Source
Job
Anomaly
Detector
Job
Metrics
Aggregator
Job
Alert
Service
That is fine but...
How does Mantis meet
the Scalable Insights
challenge?
• Cost (Utilization)
Sensitive
• Optimize for low latency
• High Throughput
• Resilient
Key Requirements
Minimizing Costs
Elastic Clusters
Elastic Jobs
Job Autoscaling
Scaling Config
CPU Strategy
Network
Strategy
Filtering at Source
Source
Job
Consumer
Job
Data Producers
Low latency - High
Throughput
To block or not to
block?
RxNetty (non-blocking)
vs Tomcat (blocking)
by Brendan Gregg
@brendangregg
https://docs.google.com/presentation/d/18i-d72m7tD4wKlzm-1PCR8g62l66_9Btbg5-fuFRqf0/edit#slide=id.g761289dab_0_77
CPU consumption
• RxNetty
consumes less
CPU / request
• Reduced thread
migration
• Lower object
allocation rate
CPU consumed reduces
as load increases
Lower latency
• RxNetty has
lower latency
under high load
• fewer lock
contentions
• fewer thread
migrations
Latency knee for
Tomcat ~ 400
Latency knee for Netty
~700
Async Processing
Non-blocking I/O
Async Processing
Designed for Resilience
http://signsofpolitics.blogspot.com/2009/03/around-and-about-resilience.html
Server Resilience
• Servers crashes
inevitable
• Server health constantly
monitored with
heartbeats
• Crashed servers
replaced
• Lost jobs relaunched
Network Resilience
• Long lived connections
can fail
• Connection topology is
constantly monitored
and corrected
Backpressure
Cold Source
Amazon SQS
Hot Source
Reactive push-pull
(Cold Source)
f1 f2 f3 f4
Cold
Source
102410241024
1024
100
100 100 100
Push mode
f1 f2 f3 f4
Cold
Source
MaxMaxMax
Max
100
100 100 100
Pull mode
f1 f2 f3 f4
Cold
Source
111
1
1
1 1 1
Backpressure Strategies
(Hot Source)
f2 f3 f4
Hot
Source
101010
100
10 10 10
Strategy
Function
90
Drop
Buffer
Scale-up
Questions

More Related Content

PDF
An Introduction to Priam
PPTX
Determinism in finance
PDF
Diagnosing Problems in Production - Cassandra
PDF
Webinar: Diagnosing Apache Cassandra Problems in Production
PDF
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
PDF
Taskerman - a distributed cluster task manager
PDF
Seattle Cassandra Meetup - HasOffers
PPTX
CrateDB - Giacomo Ceribelli
An Introduction to Priam
Determinism in finance
Diagnosing Problems in Production - Cassandra
Webinar: Diagnosing Apache Cassandra Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Taskerman - a distributed cluster task manager
Seattle Cassandra Meetup - HasOffers
CrateDB - Giacomo Ceribelli

What's hot (20)

PPTX
How Scylla Make Adding and Removing Nodes Faster and Safer
PPTX
Microservices for performance - GOTO Chicago 2016
PPTX
Mario on spark
PPT
GC free coding in @Java presented @Geecon
PDF
Redis as a Main Database, Scaling and HA
PDF
PTD and beyond
PDF
Keeping Latency Low and Throughput High with Application-level Priority Manag...
PDF
Pre fosdem2020 uber
PPTX
How to Build a Multi-DC Cassandra Cluster in AWS with OpsCenter LCM
PDF
Instaclustr introduction to managing cassandra
PDF
Apache Cassandra Management
PDF
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
PDF
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
PDF
Instaclustr Apache Cassandra Best Practices & Toubleshooting
PDF
Principles in Data Stream Processing | Matthias J Sax, Confluent
PPTX
Low latency microservices in java QCon New York 2016
PPTX
MySQL Multi-Master Replication
PDF
DSD-INT 2017 Run your hydro model quickly and easily in a sustainable cloud w...
PDF
Couchbase live 2016
PDF
Cassandra at Glogster
How Scylla Make Adding and Removing Nodes Faster and Safer
Microservices for performance - GOTO Chicago 2016
Mario on spark
GC free coding in @Java presented @Geecon
Redis as a Main Database, Scaling and HA
PTD and beyond
Keeping Latency Low and Throughput High with Application-level Priority Manag...
Pre fosdem2020 uber
How to Build a Multi-DC Cassandra Cluster in AWS with OpsCenter LCM
Instaclustr introduction to managing cassandra
Apache Cassandra Management
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
Instaclustr Apache Cassandra Best Practices & Toubleshooting
Principles in Data Stream Processing | Matthias J Sax, Confluent
Low latency microservices in java QCon New York 2016
MySQL Multi-Master Replication
DSD-INT 2017 Run your hydro model quickly and easily in a sustainable cloud w...
Couchbase live 2016
Cassandra at Glogster
Ad

Viewers also liked (18)

KEY
Cassandra nyc
PDF
Cassandra Summit 2013 Keynote
PDF
Distributed "Web Scale" Systems
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 1
PDF
Overview of DataStax OpsCenter
PPTX
How teams work is your data persuasive
PDF
Data Infrastructure for a World of Music
PDF
Cassandra summit 2013 how not to use cassandra
PDF
DataStax: Backup and Restore in Cassandra and OpsCenter
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 2
PDF
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
PDF
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
PPTX
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
PPTX
Cassandra Operations at Netflix
PDF
Spotify: Data center & Backend buildout
PPTX
Introduction to DataStax Enterprise Graph Database
PDF
Microservices at Spotify
PPTX
Understanding AntiEntropy in Cassandra
Cassandra nyc
Cassandra Summit 2013 Keynote
Distributed "Web Scale" Systems
Cassandra @ Sony: The good, the bad, and the ugly part 1
Overview of DataStax OpsCenter
How teams work is your data persuasive
Data Infrastructure for a World of Music
Cassandra summit 2013 how not to use cassandra
DataStax: Backup and Restore in Cassandra and OpsCenter
Cassandra @ Sony: The good, the bad, and the ugly part 2
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
Cassandra Operations at Netflix
Spotify: Data center & Backend buildout
Introduction to DataStax Enterprise Graph Database
Microservices at Spotify
Understanding AntiEntropy in Cassandra
Ad

Similar to Mantis qcon nyc_2015 (20)

PDF
Go Observability (in practice)
PDF
Telemetry: The Overlooked Treasure in Axon Server-Centric Applications
PPTX
Mine Your Simulation Model: Automated Discovery of Business Process Simulatio...
PDF
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
PPTX
Shikha fdp 62_14july2017
PDF
A competitive food retail architecture with microservices
PPTX
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
PDF
Intelligent Monitoring
PPTX
The End of a Myth: Ultra-Scalable Transactional Management
PDF
ASMUG February 2015 Knowledge Event
ODP
Self driving computers active learning workflows with human interpretable ve...
PDF
High Availability HPC ~ Microservice Architectures for Supercomputing
PDF
Real-Time Analytics With StarRocks (DWH+DL).pdf
PPTX
Develop in ludicrous mode with azure serverless
PPTX
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
PDF
Manage the Digital Transformation with Machine Learning in a Reactive Microse...
PDF
Gluecon Monitoring Microservices and Containers: A Challenge
PPTX
Monitoring Containerized Micro-Services In Azure
PDF
Streaming Analytics Unit 1 notes for engineers
PDF
OSMC 2023 | Journey to observability: tracking every function execution in pr...
Go Observability (in practice)
Telemetry: The Overlooked Treasure in Axon Server-Centric Applications
Mine Your Simulation Model: Automated Discovery of Business Process Simulatio...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
Shikha fdp 62_14july2017
A competitive food retail architecture with microservices
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Intelligent Monitoring
The End of a Myth: Ultra-Scalable Transactional Management
ASMUG February 2015 Knowledge Event
Self driving computers active learning workflows with human interpretable ve...
High Availability HPC ~ Microservice Architectures for Supercomputing
Real-Time Analytics With StarRocks (DWH+DL).pdf
Develop in ludicrous mode with azure serverless
Using Kafka on Event-driven Microservices Architectures - Apache Kafka Meetup
Manage the Digital Transformation with Machine Learning in a Reactive Microse...
Gluecon Monitoring Microservices and Containers: A Challenge
Monitoring Containerized Micro-Services In Azure
Streaming Analytics Unit 1 notes for engineers
OSMC 2023 | Journey to observability: tracking every function execution in pr...

Recently uploaded (20)

PDF
Connector Corner: Transform Unstructured Documents with Agentic Automation
PDF
Rapid Prototyping: A lecture on prototyping techniques for interface design
PPTX
Internet of Everything -Basic concepts details
PDF
Lung cancer patients survival prediction using outlier detection and optimize...
PDF
Auditboard EB SOX Playbook 2023 edition.
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PDF
giants, standing on the shoulders of - by Daniel Stenberg
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
PPTX
future_of_ai_comprehensive_20250822032121.pptx
PPTX
MuleSoft-Compete-Deck for midddleware integrations
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
PPTX
agenticai-neweraofintelligence-250529192801-1b5e6870.pptx
PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
PDF
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
PDF
4 layer Arch & Reference Arch of IoT.pdf
PDF
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
Connector Corner: Transform Unstructured Documents with Agentic Automation
Rapid Prototyping: A lecture on prototyping techniques for interface design
Internet of Everything -Basic concepts details
Lung cancer patients survival prediction using outlier detection and optimize...
Auditboard EB SOX Playbook 2023 edition.
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
Enhancing plagiarism detection using data pre-processing and machine learning...
giants, standing on the shoulders of - by Daniel Stenberg
Comparative analysis of machine learning models for fake news detection in so...
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
future_of_ai_comprehensive_20250822032121.pptx
MuleSoft-Compete-Deck for midddleware integrations
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
Improvisation in detection of pomegranate leaf disease using transfer learni...
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
agenticai-neweraofintelligence-250529192801-1b5e6870.pptx
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
4 layer Arch & Reference Arch of IoT.pdf
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf

Mantis qcon nyc_2015