Cortex: Prometheus as a
Service, One Year On
Tom Wilkie, PromCon 2017
tom.wilkie@gmail.com
https://github.com/weaveworks/cortex
Cortex: Prometheus as a Service, One Year On
https://www.youtube.com/watch?v=3Tb4Wc0kfCM
Cortex: Prometheus as a Service
• Natively multi tenant; isolate different
customers in the same services.
• Different story around scaling & HA
• “Virtually infinite” retention and durability
• Opportunities for performance
enhancements
Cortex
YourYourYourYour
Your Jobs
Alertmanager
Grafana Prometheus HA
Frontend
Ditributor
DynamoDB Memcache
Consul
Ingester
Write requests
Read requests
Control requests
Prometheus
Your Jobs
S3
Cortex Architecture
A Year’s Evolution
Problem #1: DynamoDB Write
Throughput
https://github.com/weaveworks/cortex/issues/254
Frontend
Ditributor
DynamoDB Memcache
Consul
Ingester
Write requests
Read requests
Control requests
Prometheus
Your Jobs
S3Table Manager
Cortex Architecture
Problem #2: DynamoDB Write
Throughput, again
Original schema:
• Hash Key: <user ID>:<hour>:<metric name>
• Range Key: <label name>:<label value>:<chunk ID>
New schema:
• Hash Key: <user ID>:<day>:<metric name>:<label name>
• Range Key: <chunk ID>:<chunk end time>
https://github.com/weaveworks/cortex/pull/262
Problem #3: Queries of Death
Frontend
Ditributor Querier
Table Manager DynamoDB Memcache
Consul
Ingester
Write requests
Read requests
Control requests
Prometheus
Your Jobs
S3
Cortex Architecture
Problem #3: Recording rules
and alerts
Frontend
Ditributor Querier
Table Manager DynamoDB Memcache
Consul
Ingester
Write requests
Read requests
Control requests
Prometheus
Your Jobs
S3
Ruler
Cortex Architecture
Problem #4: Long tail
Cortex: Prometheus as a Service, One Year On
https://www.weave.works/blog/the-long-tail-tools-to-investigate-high-long-tail-latency/
Problem #5: Cost
S3 DynamoDB
IOP Cost
($/IOP)
5x10-6 2x10-7
Storage Cost
($/GB/Month)
0.023 0.250
https://github.com/weaveworks/cortex/issues/141
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06
0.005
0.01
0.015
0.02
0.025
Object size (GB)
Cost ($)
S3
DynamoDB
Frontend
Ditributor Querier
Table Manager DynamoDB Memcache
Consul
RulerIngester
Write requests
Read requests
Control requests
Prometheus
Your Jobs
Cortex Architecture
Problem #6: DynamoDB, again
Frontend
Ditributor Querier
Table Manager BigTable Memcache
Consul
RulerIngester
Write requests
Read requests
Control requests
Prometheus
Your Jobs
Cortex Architecture
DynamoDB BigTable
99th Percentile Write
Latency (ms)
70-100 50-150
99th Percentile Read
Latency (ms)
100-2500 ~250
LOC ~2000 ~400
DynamoDB numbers courtesy of Weaveworks
Closing thoughts
1. DynamoDB Write Throughput
2. DynamoDB Write Throughput, again
3. Recording rules and alerts
4. Long tail
5. Cost
6. DynamoDB, again
Running for >12months
• Availability: querier unavailable for <12hrs ~99.9%
• Durability: lost <2 days of data >99.5%
• 99th percentile write performance ~60ms
• 99th percentile query performance <200ms
Future
• Direct chunk writes from Prometheus to Cortex Chunk Store
• Separate ingester index for better load balancing
• Use prometheus/tsdb for the ingesters
• Etcd & gossip for ring storage
• Chunks in Google Cloud Storage
One more thing…
I left Weaveworks at the begging of June to
focus on Prometheus & Cortex development.
Since then I’ve teamed up with David to
develop some ideas around Prometheus,
logging, and tracing.
We’re available for Prometheus hosting,
consulting, training and support.
email: hello@kausal.co
Metrics
Logs
Traces
Thank you!
Questions?

More Related Content

PDF
Things you wish you never knew about the Prometheus Remote Write API.
PDF
Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a Service
PDF
Prometheus Everything, Observing Kubernetes in the Cloud
PDF
Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...
PDF
Kubernetes and Prometheus
PDF
A Kong retrospective: from 0.10 to 0.13
PDF
Data Engineer's Lunch #23: Thanos/Cortex
PPTX
Monitoring on Kubernetes using prometheus
Things you wish you never knew about the Prometheus Remote Write API.
Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a Service
Prometheus Everything, Observing Kubernetes in the Cloud
Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...
Kubernetes and Prometheus
A Kong retrospective: from 0.10 to 0.13
Data Engineer's Lunch #23: Thanos/Cortex
Monitoring on Kubernetes using prometheus

What's hot (20)

PDF
Kong in 1.x Territory
PDF
Solving some of the scalability problems at booking.com
PDF
Using Kubernetes to deploy Django in GCP
PDF
CNCF explore k8s api using java client
PDF
Cortex: Horizontally Scalable, Highly Available Prometheus
PDF
Cncf k8s_network_part1
PDF
Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...
PDF
Self Created Load Balancer for MTA on AWS
PDF
Cncf k8s_network_02
PDF
Developing a user-friendly OpenResty application
PDF
Breaking Prometheus (Promcon Berlin '16)
PDF
From AWS to GCP, TABLEAPP Architecture Story
PDF
Building Thick Clients with Tower in Rust
PPTX
Altitude NY 2018: 132 websites, 1 service: Your local news runs on Fastly
PDF
A Cassandra driver from and for the Lua community
PPTX
PubSub++ - Atmosphere 2015
PDF
Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS
PDF
Traefik on google kubernetes engine
PPTX
Ingress overview
PDF
Tableapp architecture migration story for GCPUG.TW
Kong in 1.x Territory
Solving some of the scalability problems at booking.com
Using Kubernetes to deploy Django in GCP
CNCF explore k8s api using java client
Cortex: Horizontally Scalable, Highly Available Prometheus
Cncf k8s_network_part1
Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...
Self Created Load Balancer for MTA on AWS
Cncf k8s_network_02
Developing a user-friendly OpenResty application
Breaking Prometheus (Promcon Berlin '16)
From AWS to GCP, TABLEAPP Architecture Story
Building Thick Clients with Tower in Rust
Altitude NY 2018: 132 websites, 1 service: Your local news runs on Fastly
A Cassandra driver from and for the Lua community
PubSub++ - Atmosphere 2015
Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS
Traefik on google kubernetes engine
Ingress overview
Tableapp architecture migration story for GCPUG.TW
Ad

Similar to Cortex: Prometheus as a Service, One Year On (13)

PPTX
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)
PPTX
ДЕНИС КЛЕПIКОВ «Long Term storage for Prometheus» Lviv DevOps Conference 2019
PDF
Introduction to Prometheus and Cortex (WOUG)
PDF
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
PPTX
Monitoring Weave Cloud with Prometheus
PDF
Monitoring with prometheus at scale
PDF
Monitoring with prometheus at scale
PDF
Thanos: Global, durable Prometheus monitoring
PDF
Overview of data analytics service: Treasure Data Service
PDF
Prometheus 2.0 Project Update @ CloudNativeCon
PPTX
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
PPTX
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
PPT
Clustered Architecture Patterns Delivering Scalability And Availability
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)
ДЕНИС КЛЕПIКОВ «Long Term storage for Prometheus» Lviv DevOps Conference 2019
Introduction to Prometheus and Cortex (WOUG)
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Monitoring Weave Cloud with Prometheus
Monitoring with prometheus at scale
Monitoring with prometheus at scale
Thanos: Global, durable Prometheus monitoring
Overview of data analytics service: Treasure Data Service
Prometheus 2.0 Project Update @ CloudNativeCon
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
Clustered Architecture Patterns Delivering Scalability And Availability
Ad

Recently uploaded (20)

PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Hybrid model detection and classification of lung cancer
PDF
Five Habits of High-Impact Board Members
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
August Patch Tuesday
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
A review of recent deep learning applications in wood surface defect identifi...
PPTX
Tartificialntelligence_presentation.pptx
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
Web Crawler for Trend Tracking Gen Z Insights.pptx
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
STKI Israel Market Study 2025 version august
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPTX
Modernising the Digital Integration Hub
Final SEM Unit 1 for mit wpu at pune .pptx
Developing a website for English-speaking practice to English as a foreign la...
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Chapter 5: Probability Theory and Statistics
Enhancing emotion recognition model for a student engagement use case through...
Univ-Connecticut-ChatGPT-Presentaion.pdf
Hybrid model detection and classification of lung cancer
Five Habits of High-Impact Board Members
Hindi spoken digit analysis for native and non-native speakers
August Patch Tuesday
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
A review of recent deep learning applications in wood surface defect identifi...
Tartificialntelligence_presentation.pptx
Zenith AI: Advanced Artificial Intelligence
Web Crawler for Trend Tracking Gen Z Insights.pptx
DP Operators-handbook-extract for the Mautical Institute
STKI Israel Market Study 2025 version august
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Modernising the Digital Integration Hub

Cortex: Prometheus as a Service, One Year On