SlideShare a Scribd company logo
Brian Brazil
Founder
Monitoring Hadoop with
Prometheus
Making batch jobs manageable
Who am I?
Engineer passionate about running software reliably in production.
● TCD CS Degree
● Google SRE for 7 years, working on high-scale reliable systems such as
Adwords, Adsense, Ad Exchange, Billing, Database
● Boxever TL Systems&Infrastructure, applied processes and technology to let
allow company to scale and reduce operational load
● Contributor to many open source projects, including Prometheus, Ansible,
Python, Aurora and Zookeeper.
● Founder of Robust Perception, making scalability and efficiency available to
everyone
Prometheus
Inspired by Google’s Borgmon monitoring system.
Started in 2012 by ex-Googlers working in Soundcloud as an open source project.
Mainly written in Go. Publically launched in early 2015.
100+ companies using it including Digital Ocean, GoPro, Apple, Red Hat and
Google.
Why monitor?
● Know when things go wrong
○ To call in a human to prevent a business-level issue, or prevent an issue in advance
● Be able to debug and gain insight
● Trending to see changes over time, and drive technical/business decisions
● To feed into other systems/processes (e.g. QA, security, automation)
Your Services Shouldn’t be a Black Box
Services have Internals
Monitor the Internals
Monitor as a Service, not as Machines
Inclusive Monitoring
Don’t monitor just at the edges:
● Instrument client libraries
● Instrument server libraries (e.g. HTTP/RPC)
● Instrument business logic
Library authors get information about usage.
Application developers get monitoring of common components for free.
Dashboards and alerting can be provided out of the box, customised for your
organisation!
Prometheus is About Metrics, not Events
Event based monitoring such as logging is limited in how much data you can have
per event.
Each piece of data about each event needs to be stored and processed, which is
challenging to scale.
Metric based monitoring allows you to have thousands of metrics, allowing you to
track performance of every subsystem.
Prometheus regularly polls in-memory state of metrics.
What about Hadoop?
Batch jobs such as MapReduces are a very common way to use Hadoop.
How do you monitor your regular jobs are working today?
● Checking dashboards?
● Emails about every run?
● Emails on failure?
What do you really care about?
The thing you want to know is:
Has my batch job been successful recently
enough?
So let’s monitor that!
Introducing the Pushgateway
The Pushgateway holds metric state for ephemeral jobs.
Java snippet
CollectorRegistry registry = new CollectorRegistry();
JobClient.runJob(job); // Submit job to Hadoop and wait for completion.
Gauge lastSuccess = Gauge.build()
.name("my_batch_job_last_success")
.help("Last time my batch job succeeded, in unixtime.")
.register(registry);
lastSuccess.setToCurrentTime()
PushGateway pg = new PushGateway("127.0.0.1:9091");
pg.pushAdd(registry, "my_batch_job");
Prometheus Alerts
Prometheus has a powerful expression language that can be used in graphs, pre-
calculation and alerts.
Let’s alert if our batch job hasn’t succeeded in a day:
ALERT MyBatchJobNotSuccessfulRecently
IF time() - my_batch_job_last_success{job="my_batch_job"}
> 86400
New World!
No longer have to manually check dashboards or emails every single day for
every single batch job.
Monitoring and alerting is now aligned with what we care about.
More reliable, and scales better too!
Aside: Idempotency and Frequency
You shouldn’t care about a single failure.
To make things even easier to manage, write your batch jobs so that if one run
fails the next run will automatically catch up.
Then run your batch jobs at least twice as often as needed.
Result: A single failure is automatically handled, and if there is a problem you run
it again. No more messing with command line flags and config files!
Beyond Batch
Prometheus has integrations with 50+ other systems, including JMX, EC2,
MySQL, Postgresql, Redis, MongoDB, CouchDB, RethinkDB, Redis, Collected,
Graphite, Nagios, InfluxDB, Django, Mtail, Heka, Memcached, RabbitMQ, Redis,
RethinkDB, Rsyslog, HAProxy, Meteor.js, Java, Haskell, Python, Go, Ruby, .Net,
Machine, Cloudwatch, Minecraft…
Easy to run, easy to use, easy to scale.
A single Prometheus can handle over 100k samples per second!
Powerful Data Model
All metrics have arbitrary multi-dimensional labels.
No need to force your model into dotted strings.
Can aggregate, cut, and slice along them.
Supports any double value, labels support full unicode.
Powerful Query Language
Can multiply, add, aggregate, join, predict, take quantiles across many metrics in
the same query. Can evaluate right now, and graph back in time.
Answer questions like:
● What’s the 95th percentile latency in the European datacenter?
● How full will the disks be in 4 hours?
● Which services are the top 5 users of CPU?
Can alert based on any query.
Dashboards
What does this all mean for Hadoop?
Due to it’s extensive integrations, Prometheus can monitor Hadoop and the rest of
your infrastructure and applications.
With its powerful data model and query language, you can graph and alert on what
matters - not what your monitoring system limits you to.
Better alerts with fewer false positives means more sleep, higher reliability and
more confidence that your system is functioning correctly.
Resources
Official Project Website: prometheus.io
Official Mailing List: prometheus-developers@googlegroups.com
Demo: demo.robustperception.io
Robust Perception Website: www.robustperception.io
Queries: prometheus@robustperception.io

More Related Content

What's hot (20)

PDF
Native Support of Prometheus Monitoring in Apache Spark 3.0
Databricks
 
PPTX
High Performance Object Storage in 30 Minutes with Supermicro and MinIO
Rebekah Rodriguez
 
PPTX
Prometheus and Grafana
Lhouceine OUHAMZA
 
PDF
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
Ryan Blue
 
PPT
Monitoring using Prometheus and Grafana
Arvind Kumar G.S
 
PDF
Apache Flume
Arinto Murdopo
 
PPTX
Minio scale 15 x
Minio
 
PPTX
MeetUp Monitoring with Prometheus and Grafana (September 2018)
Lucas Jellema
 
PPTX
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
PPTX
Splunk Distributed Management Console
Splunk
 
PPTX
Apache Airflow Introduction
Liangjun Jiang
 
PDF
Parquet and AVRO
airisData
 
PPTX
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
DataWorks Summit/Hadoop Summit
 
PDF
Vue d'ensemble Dremio
Modern Data Stack France
 
PDF
Observability for Data Pipelines With OpenLineage
Databricks
 
PDF
Apache Airflow
Knoldus Inc.
 
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
PPTX
Apache Airflow in Production
Robert Sanders
 
PDF
OpenShift 4, the smarter Kubernetes platform
Kangaroot
 
PDF
Big data on google cloud
Tu Pham
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Databricks
 
High Performance Object Storage in 30 Minutes with Supermicro and MinIO
Rebekah Rodriguez
 
Prometheus and Grafana
Lhouceine OUHAMZA
 
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
Ryan Blue
 
Monitoring using Prometheus and Grafana
Arvind Kumar G.S
 
Apache Flume
Arinto Murdopo
 
Minio scale 15 x
Minio
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
Lucas Jellema
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
Splunk Distributed Management Console
Splunk
 
Apache Airflow Introduction
Liangjun Jiang
 
Parquet and AVRO
airisData
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
DataWorks Summit/Hadoop Summit
 
Vue d'ensemble Dremio
Modern Data Stack France
 
Observability for Data Pipelines With OpenLineage
Databricks
 
Apache Airflow
Knoldus Inc.
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Apache Airflow in Production
Robert Sanders
 
OpenShift 4, the smarter Kubernetes platform
Kangaroot
 
Big data on google cloud
Tu Pham
 

Viewers also liked (20)

PDF
Prometheus (Microsoft, 2016)
Brian Brazil
 
PDF
Promcon2016
wyukawa
 
PDF
Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...
Brian Brazil
 
PDF
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Brian Brazil
 
PDF
Monitoring your Python with Prometheus (Python Ireland April 2015)
Brian Brazil
 
PDF
Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016
ShuttleCloud
 
PPTX
Life of a Label (PromCon2016, Berlin)
Brian Brazil
 
PPTX
What does "monitoring" mean? (FOSDEM 2017)
Brian Brazil
 
PDF
Microservices and Prometheus (Microservices NYC 2016)
Brian Brazil
 
PDF
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Brian Brazil
 
PPTX
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
Brian Brazil
 
PDF
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Brian Brazil
 
PDF
Monitoring with prometheus
Kasper Nissen
 
PDF
Prometheus Overview
Brian Brazil
 
PDF
Ansible at FOSDEM (Ansible Dublin, 2016)
Brian Brazil
 
PPTX
An Exploration of the Formal Properties of PromQL
Brian Brazil
 
PPTX
Prometheus - Open Source Forum Japan
Brian Brazil
 
PPTX
So You Want to Write an Exporter
Brian Brazil
 
PDF
Breaking Prometheus (Promcon Berlin '16)
Matthew Campbell
 
PDF
Prometheus and Docker (Docker Galway, November 2015)
Brian Brazil
 
Prometheus (Microsoft, 2016)
Brian Brazil
 
Promcon2016
wyukawa
 
Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...
Brian Brazil
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Brian Brazil
 
Monitoring your Python with Prometheus (Python Ireland April 2015)
Brian Brazil
 
Prometheus Is Good for Your Small Startup - ShuttleCloud Corp. - 2016
ShuttleCloud
 
Life of a Label (PromCon2016, Berlin)
Brian Brazil
 
What does "monitoring" mean? (FOSDEM 2017)
Brian Brazil
 
Microservices and Prometheus (Microservices NYC 2016)
Brian Brazil
 
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Brian Brazil
 
Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...
Brian Brazil
 
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Brian Brazil
 
Monitoring with prometheus
Kasper Nissen
 
Prometheus Overview
Brian Brazil
 
Ansible at FOSDEM (Ansible Dublin, 2016)
Brian Brazil
 
An Exploration of the Formal Properties of PromQL
Brian Brazil
 
Prometheus - Open Source Forum Japan
Brian Brazil
 
So You Want to Write an Exporter
Brian Brazil
 
Breaking Prometheus (Promcon Berlin '16)
Matthew Campbell
 
Prometheus and Docker (Docker Galway, November 2015)
Brian Brazil
 
Ad

Similar to Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015) (20)

PPTX
Prometheus (Monitorama 2016)
Brian Brazil
 
PDF
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Brian Brazil
 
PPTX
Prometheus (Prometheus London, 2016)
Brian Brazil
 
PPTX
Prometheus for Monitoring Metrics (Fermilab 2018)
Brian Brazil
 
PDF
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
GetInData
 
PPTX
An Introduction to Prometheus (GrafanaCon 2016)
Brian Brazil
 
PPTX
Prometheus for Monitoring Metrics (Percona Live Europe 2017)
Brian Brazil
 
PDF
Monitoring Cloud Native Applications with Prometheus
Jacopo Nardiello
 
PDF
Prometheus
Mike Frampton
 
PPTX
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)
Brian Brazil
 
PDF
Prometheus Course from beginners to expert course
anil490062
 
ODP
Monitoring With Prometheus
Knoldus Inc.
 
PDF
DevOps Braga #15: Agentless monitoring with icinga and prometheus
DevOps Braga
 
PPTX
Prometheus
Aakanksha Mane
 
PDF
The hitchhiker’s guide to Prometheus
Bol.com Techlab
 
PDF
The hitchhiker’s guide to Prometheus
Bol.com Techlab
 
PDF
Prometheus monitoring
Hien Nguyen Van
 
PDF
Prometheus + Grafana = Awesome Monitoring
Henrique Galafassi Dalssaso
 
PDF
Prometheus - basics
Juraj Hantak
 
PDF
Monitoring with Prometheus
Richard Langlois P. Eng.
 
Prometheus (Monitorama 2016)
Brian Brazil
 
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Brian Brazil
 
Prometheus (Prometheus London, 2016)
Brian Brazil
 
Prometheus for Monitoring Metrics (Fermilab 2018)
Brian Brazil
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
GetInData
 
An Introduction to Prometheus (GrafanaCon 2016)
Brian Brazil
 
Prometheus for Monitoring Metrics (Percona Live Europe 2017)
Brian Brazil
 
Monitoring Cloud Native Applications with Prometheus
Jacopo Nardiello
 
Prometheus
Mike Frampton
 
Evolving Prometheus for the Cloud Native World (FOSDEM 2018)
Brian Brazil
 
Prometheus Course from beginners to expert course
anil490062
 
Monitoring With Prometheus
Knoldus Inc.
 
DevOps Braga #15: Agentless monitoring with icinga and prometheus
DevOps Braga
 
Prometheus
Aakanksha Mane
 
The hitchhiker’s guide to Prometheus
Bol.com Techlab
 
The hitchhiker’s guide to Prometheus
Bol.com Techlab
 
Prometheus monitoring
Hien Nguyen Van
 
Prometheus + Grafana = Awesome Monitoring
Henrique Galafassi Dalssaso
 
Prometheus - basics
Juraj Hantak
 
Monitoring with Prometheus
Richard Langlois P. Eng.
 
Ad

More from Brian Brazil (10)

PPTX
OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)
Brian Brazil
 
PPTX
Evolution of Monitoring and Prometheus (Dublin 2018)
Brian Brazil
 
PPTX
Evaluating Prometheus Knowledge in Interviews (PromCon 2018)
Brian Brazil
 
PPTX
Anatomy of a Prometheus Client Library (PromCon 2018)
Brian Brazil
 
PPTX
Evolution of the Prometheus TSDB (Percona Live Europe 2017)
Brian Brazil
 
PPTX
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)
Brian Brazil
 
PPTX
Rule 110 for Prometheus (PromCon 2017)
Brian Brazil
 
PPTX
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)
Brian Brazil
 
PPTX
Prometheus: From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)
Brian Brazil
 
PPTX
Provisioning and Capacity Planning (Travel Meets Big Data)
Brian Brazil
 
OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)
Brian Brazil
 
Evolution of Monitoring and Prometheus (Dublin 2018)
Brian Brazil
 
Evaluating Prometheus Knowledge in Interviews (PromCon 2018)
Brian Brazil
 
Anatomy of a Prometheus Client Library (PromCon 2018)
Brian Brazil
 
Evolution of the Prometheus TSDB (Percona Live Europe 2017)
Brian Brazil
 
Staleness and Isolation in Prometheus 2.0 (PromCon 2017)
Brian Brazil
 
Rule 110 for Prometheus (PromCon 2017)
Brian Brazil
 
Counting with Prometheus (CloudNativeCon+Kubecon Europe 2017)
Brian Brazil
 
Prometheus: From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)
Brian Brazil
 
Provisioning and Capacity Planning (Travel Meets Big Data)
Brian Brazil
 

Recently uploaded (20)

PPTX
internet básico presentacion es una red global
70965857
 
PPTX
西班牙武康大学毕业证书{UCAMOfferUCAM成绩单水印}原版制作
Taqyea
 
PPTX
Orchestrating things in Angular application
Peter Abraham
 
PDF
Web Hosting for Shopify WooCommerce etc.
Harry_Phoneix Harry_Phoneix
 
PPTX
Presentation3gsgsgsgsdfgadgsfgfgsfgagsfgsfgzfdgsdgs.pptx
SUB03
 
PPTX
sajflsajfljsdfljslfjslfsdfas;fdsfksadfjlsdflkjslgfs;lfjlsajfl;sajfasfd.pptx
theknightme
 
PPTX
L1A Season 1 Guide made by A hegy Eng Grammar fixed
toszolder91
 
PPT
Computer Securityyyyyyyy - Chapter 2.ppt
SolomonSB
 
PPTX
法国巴黎第二大学本科毕业证{Paris 2学费发票Paris 2成绩单}办理方法
Taqyea
 
PPTX
PM200.pptxghjgfhjghjghjghjghjghjghjghjghjghj
breadpaan921
 
PPTX
L1A Season 1 ENGLISH made by A hegy fixed
toszolder91
 
PPTX
Research Design - Report on seminar in thesis writing. PPTX
arvielobos1
 
PDF
Azure_DevOps introduction for CI/CD and Agile
henrymails
 
PPTX
英国假毕业证诺森比亚大学成绩单GPA修改UNN学生卡网上可查学历成绩单
Taqyea
 
PPT
Computer Securityyyyyyyy - Chapter 1.ppt
SolomonSB
 
PDF
Build Fast, Scale Faster: Milvus vs. Zilliz Cloud for Production-Ready AI
Zilliz
 
PPTX
本科硕士学历佛罗里达大学毕业证(UF毕业证书)24小时在线办理
Taqyea
 
PPTX
unit 2_2 copy right fdrgfdgfai and sm.pptx
nepmithibai2024
 
PPTX
Cost_of_Quality_Presentation_Software_Engineering.pptx
farispalayi
 
PPT
Agilent Optoelectronic Solutions for Mobile Application
andreashenniger2
 
internet básico presentacion es una red global
70965857
 
西班牙武康大学毕业证书{UCAMOfferUCAM成绩单水印}原版制作
Taqyea
 
Orchestrating things in Angular application
Peter Abraham
 
Web Hosting for Shopify WooCommerce etc.
Harry_Phoneix Harry_Phoneix
 
Presentation3gsgsgsgsdfgadgsfgfgsfgagsfgsfgzfdgsdgs.pptx
SUB03
 
sajflsajfljsdfljslfjslfsdfas;fdsfksadfjlsdflkjslgfs;lfjlsajfl;sajfasfd.pptx
theknightme
 
L1A Season 1 Guide made by A hegy Eng Grammar fixed
toszolder91
 
Computer Securityyyyyyyy - Chapter 2.ppt
SolomonSB
 
法国巴黎第二大学本科毕业证{Paris 2学费发票Paris 2成绩单}办理方法
Taqyea
 
PM200.pptxghjgfhjghjghjghjghjghjghjghjghjghj
breadpaan921
 
L1A Season 1 ENGLISH made by A hegy fixed
toszolder91
 
Research Design - Report on seminar in thesis writing. PPTX
arvielobos1
 
Azure_DevOps introduction for CI/CD and Agile
henrymails
 
英国假毕业证诺森比亚大学成绩单GPA修改UNN学生卡网上可查学历成绩单
Taqyea
 
Computer Securityyyyyyyy - Chapter 1.ppt
SolomonSB
 
Build Fast, Scale Faster: Milvus vs. Zilliz Cloud for Production-Ready AI
Zilliz
 
本科硕士学历佛罗里达大学毕业证(UF毕业证书)24小时在线办理
Taqyea
 
unit 2_2 copy right fdrgfdgfai and sm.pptx
nepmithibai2024
 
Cost_of_Quality_Presentation_Software_Engineering.pptx
farispalayi
 
Agilent Optoelectronic Solutions for Mobile Application
andreashenniger2
 

Monitoring Hadoop with Prometheus (Hadoop User Group Ireland, December 2015)

  • 1. Brian Brazil Founder Monitoring Hadoop with Prometheus Making batch jobs manageable
  • 2. Who am I? Engineer passionate about running software reliably in production. ● TCD CS Degree ● Google SRE for 7 years, working on high-scale reliable systems such as Adwords, Adsense, Ad Exchange, Billing, Database ● Boxever TL Systems&Infrastructure, applied processes and technology to let allow company to scale and reduce operational load ● Contributor to many open source projects, including Prometheus, Ansible, Python, Aurora and Zookeeper. ● Founder of Robust Perception, making scalability and efficiency available to everyone
  • 3. Prometheus Inspired by Google’s Borgmon monitoring system. Started in 2012 by ex-Googlers working in Soundcloud as an open source project. Mainly written in Go. Publically launched in early 2015. 100+ companies using it including Digital Ocean, GoPro, Apple, Red Hat and Google.
  • 4. Why monitor? ● Know when things go wrong ○ To call in a human to prevent a business-level issue, or prevent an issue in advance ● Be able to debug and gain insight ● Trending to see changes over time, and drive technical/business decisions ● To feed into other systems/processes (e.g. QA, security, automation)
  • 5. Your Services Shouldn’t be a Black Box
  • 8. Monitor as a Service, not as Machines
  • 9. Inclusive Monitoring Don’t monitor just at the edges: ● Instrument client libraries ● Instrument server libraries (e.g. HTTP/RPC) ● Instrument business logic Library authors get information about usage. Application developers get monitoring of common components for free. Dashboards and alerting can be provided out of the box, customised for your organisation!
  • 10. Prometheus is About Metrics, not Events Event based monitoring such as logging is limited in how much data you can have per event. Each piece of data about each event needs to be stored and processed, which is challenging to scale. Metric based monitoring allows you to have thousands of metrics, allowing you to track performance of every subsystem. Prometheus regularly polls in-memory state of metrics.
  • 11. What about Hadoop? Batch jobs such as MapReduces are a very common way to use Hadoop. How do you monitor your regular jobs are working today? ● Checking dashboards? ● Emails about every run? ● Emails on failure?
  • 12. What do you really care about? The thing you want to know is: Has my batch job been successful recently enough? So let’s monitor that!
  • 13. Introducing the Pushgateway The Pushgateway holds metric state for ephemeral jobs.
  • 14. Java snippet CollectorRegistry registry = new CollectorRegistry(); JobClient.runJob(job); // Submit job to Hadoop and wait for completion. Gauge lastSuccess = Gauge.build() .name("my_batch_job_last_success") .help("Last time my batch job succeeded, in unixtime.") .register(registry); lastSuccess.setToCurrentTime() PushGateway pg = new PushGateway("127.0.0.1:9091"); pg.pushAdd(registry, "my_batch_job");
  • 15. Prometheus Alerts Prometheus has a powerful expression language that can be used in graphs, pre- calculation and alerts. Let’s alert if our batch job hasn’t succeeded in a day: ALERT MyBatchJobNotSuccessfulRecently IF time() - my_batch_job_last_success{job="my_batch_job"} > 86400
  • 16. New World! No longer have to manually check dashboards or emails every single day for every single batch job. Monitoring and alerting is now aligned with what we care about. More reliable, and scales better too!
  • 17. Aside: Idempotency and Frequency You shouldn’t care about a single failure. To make things even easier to manage, write your batch jobs so that if one run fails the next run will automatically catch up. Then run your batch jobs at least twice as often as needed. Result: A single failure is automatically handled, and if there is a problem you run it again. No more messing with command line flags and config files!
  • 18. Beyond Batch Prometheus has integrations with 50+ other systems, including JMX, EC2, MySQL, Postgresql, Redis, MongoDB, CouchDB, RethinkDB, Redis, Collected, Graphite, Nagios, InfluxDB, Django, Mtail, Heka, Memcached, RabbitMQ, Redis, RethinkDB, Rsyslog, HAProxy, Meteor.js, Java, Haskell, Python, Go, Ruby, .Net, Machine, Cloudwatch, Minecraft… Easy to run, easy to use, easy to scale. A single Prometheus can handle over 100k samples per second!
  • 19. Powerful Data Model All metrics have arbitrary multi-dimensional labels. No need to force your model into dotted strings. Can aggregate, cut, and slice along them. Supports any double value, labels support full unicode.
  • 20. Powerful Query Language Can multiply, add, aggregate, join, predict, take quantiles across many metrics in the same query. Can evaluate right now, and graph back in time. Answer questions like: ● What’s the 95th percentile latency in the European datacenter? ● How full will the disks be in 4 hours? ● Which services are the top 5 users of CPU? Can alert based on any query.
  • 22. What does this all mean for Hadoop? Due to it’s extensive integrations, Prometheus can monitor Hadoop and the rest of your infrastructure and applications. With its powerful data model and query language, you can graph and alert on what matters - not what your monitoring system limits you to. Better alerts with fewer false positives means more sleep, higher reliability and more confidence that your system is functioning correctly.
  • 23. Resources Official Project Website: prometheus.io Official Mailing List: [email protected] Demo: demo.robustperception.io Robust Perception Website: www.robustperception.io Queries: [email protected]