SlideShare a Scribd company logo
Monitoring your API
WHAT ISTHISTALK ABOUT?
• Passive monitoring with graphite (collect statistics).
• What metrics to monitor.
• What tools.
• Graph examples.
ASSUMPTIONS
• You are using Nginx as a proxy for your API.
• You are using Ubuntu (but works in other Linux
distributions).
• You’ll be using graphite to store metrics sent by
collectl for system metrics and logster for Nginx
logs.
WHATTO MONITOR?
“The 15 Essential Nginx Metrics to Monitor” by
Scalyr https://www.scalyr.com/community/guides/
how-to-monitor-nginx-the-essential-guide
•Requests per second
•Response time
•Active connections
•Connection backlog queue
•Response codes
•Process open file handlers
•Process state*
•Server status*
•Server load average
•Server network usage
•Server disk space
•Hosting provider status*
•DNS expiration*
•SSL certificate expiration*
•User activity*
* Not the kind of thing you would measure so not talking about them in this talk
WHATTO MONITOR?
• “The USE Method” by Brendan Gregg http://www.brendangregg.com/
usemethod.html
• Methodology for analyzing the performance of any system.
• Summarized as:“For every resource, check utilization, saturation, and
errors.”
• Consider software a resource as well
• “USE Method: Rosetta Stone of Performance Checklists” by Brendan
Gregg http://www.brendangregg.com/USEmethod/use-rosetta.html
WHATTO MONITOR ?
Utilization Saturation Errors
App Performance
Response time, #
Requests
— 5xx code
Nginx Connections Active
Accepted -
Handled
—
Open file descriptors # open files — —
CPU % Util Run queue size —
Network Rx orTx / Max Dropped Errors
Memory Used Swap —
Disk % Util
Wait time and
queue length
—
WHATTOOLS?
Utilization Saturation Errors
App Performance
Response time, #
Requests
— 5xx code
Nginx Connections Active
Accepted -
Handled
—
Open file descriptors # open files — —
CPU % Util Run queue size —
Network Rx orTx / Max Dropped Errors
Memory Used Swap —
Disk % Util
Wait time and
queue length
—
WHATTOOLS? COLLECTL
• Created by HP
• Low overhead
• Available in all major Linux distributions
• Measure a rich set of metrics
• Store locally and exports to ganglia and graphite, custom imports and
exports can be added
• Problem: doesn’t export all metrics to graphite
WHATTOOLS? COLLECTL
• Install: 

$ sudo apt-get install collectl libwww-curl-perl
• Patch graphite export (to fix metrics that aren't included by default):

$ wget graphite.patch https://gist.githubusercontent.com/andphe/
2a08eab7fb4148d33888/raw/5d416d8faa5a9ca535cd5e062622d712f74c6f11/
graphite.patch

$ sudo patch -p0 /usr/share/collectl/graphite.ph graphite.patch
• Install nginx import module

$ git clone https://github.com/andphe/collectl-imports.git

$ cd collectl-imports

$ sudo cp nginx.ph /usr/share/collectl/
WHATTOOLS? COLLECTL
• Configure (/etc/colletcl.conf):

DaemonCommands = -i 10 -s+YZDN --netopts e --import
nginx,s=http,h=localhost,p=80,u=nginx_status --export graphite,<ip
address>,p=.collectl
• Enable Nginx status (/etc/nginx/sites-available/default)

location /nginx_status {

stub_status on;

access_log off;

allow 127.0.0.1;

deny all;

}
• Restart:

$ sudo /etc/init.d/nginx reload

$ sudo /etc/init.d/collectl restart
WHATTOOLS? LOGSTER
• Created by Etsy
• Export to ganglia, graphite, statsd, cloudwatch, nagios
• Few dependencies
• New parsers can be added
• 1 minute resolution
• Problem: only sends requests / sec per response code
WHATTOOLS? LOGSTER
• Nginx allows to log the request time via $request_time
• I created a parser for logster that takes advantage of
$request_time
• Sends percentiles and max
• DOESN’T USE AVERAGES
• Sends total of requests per responde code
WHATTOOLS? LOGSTER
• Why a new parser that doesn't use averages:



“#LatencyTipOfTheDay:Average (def): a random
number that falls somewhere between the
maximum and 1/2 the median. Most often used to
ignore reality.” by GilTene http://
latencytipoftheday.blogspot.com.co/2014/06/
latencytipoftheday-average-random.html
WHATTOOLS? LOGSTER
• Why a new parser that doesn't use averages:



“#LatencyTipOfTheDay: If you are not measuring
and/or plotting Max, what are you hiding (from)?”
by GilTene http://
latencytipoftheday.blogspot.com.co/2014/06/
latencytipoftheday-if-you-are-not.html
WHATTOOLS? LOGSTER
• Why a new parser that doesn't use averages:



More resources about response times in web
apps:

http://www.infoq.com/presentations/latency-pitfalls
by GilTene

https://vimeo.com/104129953 by Andre Arko
WHATTOOLS? LOGSTER
• Install: 

$ sudo apt-get install logtail

$ git clone https://github.com/etsy/logster.git

$ cd logster && sudo python setup.py install
• Configure (add a cron job):

* * * * * logster --output=graphite —graphite-
host=<ip address>:2003 -p “<hostname>.logster.api"
NginxLogster /var/log/nginx/access.log 2>&1 > /tmp/
logster_out.txt
WHATTOOLS? LOGSTER
• Install NginxParser (copy it to parsers folder)

$ git clone https://github.com/andphe/logster-parsers.git

$ cd logster-parsers

$ sudo cp NginxParser.py /usr/local/lib/python2.7/dist-packages/
logster-0.0.1-py2.7.egg/logster/parsers/
• Configure Nginx to log the request time: 

log_format request_time '$remote_addr - $remote_user [$time_local] '

'"$request" $status "$request_time" $bytes_sent '

'"$http_referer" "$http_user_agent"';

access_log /var/logs/nginx/access.log request_time;
GRAPH EXAMPLES:APP
PERFORMANCE/render?
from=-15minutes&until=now&width=400&height=300&target=aliasByNode(<hostname>.logs
ter.api.requests.*%2C%204)&lineMode=staircase&areaAlpha=0.8&title=App
%20Performance%20(%23%20Requests%2C%20HTTP%20Codes)&areaMode=all
GRAPH EXAMPLES:APP
PERFORMANCE
/render?
from=-15minutes&until=now&width=400&height=300&target=aliasByNode(<hostname>.logs
ter.api.latency.*%2C4)&areaAlpha=0.8&title=App%20Performance%20(Response%20Time)
GRAPH EXAMPLES: CPU
/render?
from=-1hours&until=now&width=400&height=300&target=exclude(aliasByNode(<hostname>
.collectl.cputotals.*%2C%203)%2C%20'idle')&title=CPU%20(Utilization
%20%25)&areaMode=all&areaAlpha=0.8
GRAPH EXAMPLES: CPU
/render?
from=-1hours&until=now&width=400&height=300&target=alias(<hostname>.collectl.ctxi
nt.run%2C%20'Run%20queue')&title=CPU%20(Saturation
%20Tasks)&areaAlpha=0.8&areaMode=all
GRAPH EXAMPLES: MEMORY
/render?
from=-1hours&until=now&width=400&height=300&target=aliasByNode(<hostname>.collect
l.meminfo.used%2C%203)&title=Memory%20(Utilization%20KB)&vtitle=
%20&areaMode=stacked&areaAlpha=0.8
GRAPH EXAMPLES: MEMORY
/render?
from=-1hours&until=now&width=400&height=300&target=alias(<hostname>.collectl.swap
info.used%2C%20'swap%20used')&title=Memory%20(Saturation
%20KB)&areaMode=all&areaAlpha=0.8
GRAPH EXAMPLES: NETWORK
/render?
from=-1hours&until=now&width=400&height=300&target=alias(scale(highestMax(<hostna
me>.collectl.netinfo.kb*.eth0%2C%201)%2C%200.00008)%2C%20'eth0')&title=Network
%20(Utilization%20%25%2C%20100Mb)&areaMode=all&areaAlpha=0.8
GRAPH EXAMPLES: NETWORK
/render?
from=-1hours&until=now&width=400&height=300&target=alias(scale(<hostname>.collectl.net
info.drpout.eth0%2C-1)%2C'eth0%20out')&target=alias(<hostname>.collectl.netinfo.drpin.
eth0%2C'eth0%20in')&title=Network%20(%20Saturation%2C
%20Drops)&areaMode=all&areaAlpha=0.8
GRAPH EXAMPLES: NETWORK
/render?
from=-1hours&until=now&width=400&height=300&target=alias(scale(<hostname>.collectl
.netinfo.errout.eth0%2C-1)%2C'eth0%20out')&target=alias(<hostname>.collectl.netinf
o.errin.eth0%2C'eth0%20in')&title=Network%20(%20Errors)&areaMode=all&areaAlpha=0.8
GRAPH EXAMPLES: DISK
/render?
from=-1hours&until=now&width=400&height=300&target=aliasByNode(<hostname>.collect
l.diskinfo.util.sda%2C%204)&title=Disk%20(Utilization
%20%25)&areaMode=all&areaAlpha=0.8
GRAPH EXAMPLES: DISK
/render?
from=-1hours&until=now&width=400&height=300&target=aliasByNode(<hostname>.collect
l.diskinfo.quelen.sda%2C%204)&title=Disk%20(Saturation%2C%20Queue%20Len%20%2F
%20sec)&areaMode=all&areaAlpha=0.8
GRAPH EXAMPLES: DISK
/render?
from=-1hours&until=now&width=400&height=300&target=aliasByNode(<hostname>.collect
l.diskinfo.wait.sda%2C%204)&title=Disk%20(Saturation%2C%20Time%20wait%20%2F
%20sec)&areaMode=all&areaAlpha=0.8
GRAPH EXAMPLES: NGINX
/render?
from=-1hours&until=now&width=400&height=300&target=aliasByNode(<hostname>.collect
l.ngix.conn.active%2C%204)&title=Nginx%20(Utilization%2C
%20Connections)&areaMode=all&areaAlpha=0.8
GRAPH EXAMPLES: NGINX
/render?
from=-1hours&until=now&width=400&height=300&target=alias(diffSeries(<hostname>.colle
ctl.ngix.conn.accepted%2C%20<hostname>.collectl.ngix.conn.handled)%2C
%20'dropped')&title=Nginx%20(Saturation%2C%20Connections)&areaMode=all&areaAlpha=0.8
THANKYOU!
QUESTIONS & ANSWERS
@andphe
andphe@gmail.com

More Related Content

What's hot (20)

PPTX
Webhooks with Azure Functions - Live 360 Conference
SparkPost
 
ODP
Making security-agile matt-tesauro
Matt Tesauro
 
PDF
Merging Security with DevOps - An AppSec Perspective
Abhay Bhargav
 
PDF
SauceCon 2017: Testing @ the Speed of Concurrency
Sauce Labs
 
PPTX
Let's Jira do the work
Frank Ittermann
 
PDF
Using microsoft application insights to implement a build, measure, learn loop
Marcel de Vries
 
PDF
The Rounds Project: Growing from thousands to millions - Berry Ventura & Yoah...
DroidConTLV
 
PPTX
Making the Transition from Manual to Automated Testing
Sauce Labs
 
PPTX
AppSec Pipeline - Velcocity NY 2015
Matt Tesauro
 
PDF
Common Security API Issues and How to Mitigate Them Using Postman
Postman
 
PDF
OWASP DefectDojo - Open Source Security Sanity
Matt Tesauro
 
PDF
Intro to DefectDojo at OWASP Switzerland
Matt Tesauro
 
PDF
Deep Dive: Strategic Importance of BaaS
Apigee | Google Cloud
 
PDF
Measuring your way_to_successful_automation_webinar
Sauce Labs
 
PDF
I Love APIs 2015: Apigee and Node.js Building Mock Backends Fast
Apigee | Google Cloud
 
PDF
Security as Code: DOES15
Ed Bellis
 
PPT
VodQA_ParallelizingCukes_AmanKing
poojaelkunchwar
 
ODP
OWASP WTE - Now in the Cloud!
Matt Tesauro
 
PDF
Vijay & Supriya - Test your service not your ui
vodQA
 
PDF
AppSec Pipelines and Event based Security
Matt Tesauro
 
Webhooks with Azure Functions - Live 360 Conference
SparkPost
 
Making security-agile matt-tesauro
Matt Tesauro
 
Merging Security with DevOps - An AppSec Perspective
Abhay Bhargav
 
SauceCon 2017: Testing @ the Speed of Concurrency
Sauce Labs
 
Let's Jira do the work
Frank Ittermann
 
Using microsoft application insights to implement a build, measure, learn loop
Marcel de Vries
 
The Rounds Project: Growing from thousands to millions - Berry Ventura & Yoah...
DroidConTLV
 
Making the Transition from Manual to Automated Testing
Sauce Labs
 
AppSec Pipeline - Velcocity NY 2015
Matt Tesauro
 
Common Security API Issues and How to Mitigate Them Using Postman
Postman
 
OWASP DefectDojo - Open Source Security Sanity
Matt Tesauro
 
Intro to DefectDojo at OWASP Switzerland
Matt Tesauro
 
Deep Dive: Strategic Importance of BaaS
Apigee | Google Cloud
 
Measuring your way_to_successful_automation_webinar
Sauce Labs
 
I Love APIs 2015: Apigee and Node.js Building Mock Backends Fast
Apigee | Google Cloud
 
Security as Code: DOES15
Ed Bellis
 
VodQA_ParallelizingCukes_AmanKing
poojaelkunchwar
 
OWASP WTE - Now in the Cloud!
Matt Tesauro
 
Vijay & Supriya - Test your service not your ui
vodQA
 
AppSec Pipelines and Event based Security
Matt Tesauro
 

Similar to Monitoring your API (20)

PDF
Monitoring NGINX (plus): key metrics and how-to
Datadog
 
PPTX
Analyzing NGINX Logs with Datadog
NGINX, Inc.
 
PDF
Graphite, an introduction
jamesrwu
 
ODP
Nginx monitoring with graphite
damaex17
 
KEY
Trending with Purpose
Jason Dixon
 
PDF
OSDC 2014: Devdas Bhagat - Graphite: Graphs for the modern age
NETWAYS
 
PPTX
What's New in NGINX Plus R7?
NGINX, Inc.
 
KEY
London devops logging
Tomas Doran
 
PPT
Mis presentation
prutha_beta
 
PDF
Time series data monitoring at 99acres.com
Ravi Raj
 
PDF
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
Nick Galbreath
 
PDF
Rethinking metrics: metrics 2.0 @ Lisa 2014
Dieter Plaetinck
 
PDF
How to measure everything - a million metrics per second with minimal develop...
Jos Boumans
 
PDF
Graph Everything
Kunal Kerkar
 
PDF
Observability tips for HAProxy
Willy Tarreau
 
PDF
From Zero To Visibility
bridgetkromhout
 
PPTX
What's new in NGINX Plus R19
NGINX, Inc.
 
PDF
How to monitor NGINX
Server Density
 
PPTX
Maximizing PHP Performance with NGINX
NGINX, Inc.
 
PPTX
Time to say goodbye to your Nagios based setup
Check my Website
 
Monitoring NGINX (plus): key metrics and how-to
Datadog
 
Analyzing NGINX Logs with Datadog
NGINX, Inc.
 
Graphite, an introduction
jamesrwu
 
Nginx monitoring with graphite
damaex17
 
Trending with Purpose
Jason Dixon
 
OSDC 2014: Devdas Bhagat - Graphite: Graphs for the modern age
NETWAYS
 
What's New in NGINX Plus R7?
NGINX, Inc.
 
London devops logging
Tomas Doran
 
Mis presentation
prutha_beta
 
Time series data monitoring at 99acres.com
Ravi Raj
 
Care and Feeding of Large Scale Graphite Installations - DevOpsDays Austin 2013
Nick Galbreath
 
Rethinking metrics: metrics 2.0 @ Lisa 2014
Dieter Plaetinck
 
How to measure everything - a million metrics per second with minimal develop...
Jos Boumans
 
Graph Everything
Kunal Kerkar
 
Observability tips for HAProxy
Willy Tarreau
 
From Zero To Visibility
bridgetkromhout
 
What's new in NGINX Plus R19
NGINX, Inc.
 
How to monitor NGINX
Server Density
 
Maximizing PHP Performance with NGINX
NGINX, Inc.
 
Time to say goodbye to your Nagios based setup
Check my Website
 
Ad

Recently uploaded (20)

PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PDF
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
PPTX
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
PPTX
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
PDF
MiniTool Power Data Recovery 8.8 With Crack New Latest 2025
bashirkhan333g
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PDF
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
PPTX
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
PDF
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
PDF
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PPTX
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PPTX
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
MiniTool Power Data Recovery 8.8 With Crack New Latest 2025
bashirkhan333g
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
Ad

Monitoring your API