SlideShare a Scribd company logo
collectd & PostgreSQL

       Mark Wong
 markwkm@postgresql.org
 mark.wong@myemma.com
         PDXPUG


    November 17, 2011
My Story



     • How did I get to collectd?
     • What is collectd
     • Hacking collectd
     • Using collectd with Postgres
     • Visualizing the data




    markwkm (PDXPUG)            collectd & PostgreSQL   November 17, 2011   2 / 43
Brief background




     • Working at a little company called Emma http://myemma.com
     • Collect performance data from production systems




    markwkm (PDXPUG)           collectd & PostgreSQL       November 17, 2011   3 / 43
What did we have?



     • A database with over 1 million database objects
           • >500,000 tables
           • >1,000,000 indexes

     • Tables alone generate 11,000,000 data point per sample




    markwkm (PDXPUG)            collectd & PostgreSQL           November 17, 2011   4 / 43
What did we try?


  Only free things:
      • Cacti http://www.cacti.net/
      • Ganglia http://ganglia.info/
      • Munin http://munin-monitoring.org/
      • Reconnoiter https://labs.omniti.com/labs/reconnoiter
      • Zenoss http://community.zenoss.org/




    markwkm (PDXPUG)         collectd & PostgreSQL      November 17, 2011   5 / 43
What doesn’t work

  Dependency on RRDtool; can’t handle more than hundreds of thousands of
  metrics (Application Buffer-Cache Management for Performance: Running the
  World’s Largest MRTG by David Plonka, Archit Gupta and Dale Carder, LISA
  2007):
      • Cacti
      • Ganglia
      • Munin
      • Reconnoiter
      • Zenoss



     markwkm (PDXPUG)           collectd & PostgreSQL          November 17, 2011   6 / 43
Reconnoiter almost worked for us

  Pro’s:
      • Write your own SQL queries to collect data from Postgres
      • Used Postgres instead of RRDtool for storing data
      • JavaScript based on-the-fly charting
      • Support for integrating many other health and stats collection solutions
  Con’s:
      • Data collection still couldn’t keep up; maybe needed more tuning
      • Faster hardware? (using VM’s)
      • More hardware? (scale out MQ processes)


     markwkm (PDXPUG)             collectd & PostgreSQL            November 17, 2011   7 / 43
Couldn’t bring myself to try anything else



      • Hands were tied, no resources available to help move forward.
      • Can we build something light weight?
      • Played with collectd (http://collectd.org/) while evaluating
          Reconnoiter




     markwkm (PDXPUG)            collectd & PostgreSQL            November 17, 2011   8 / 43
What is collectd?



          collectd is a daemon which collects system performance
          statistics periodically and provides mechanisms to store the
          values in a variety of ways, for example in RRD files.

  http://collectd.org/




     markwkm (PDXPUG)               collectd & PostgreSQL            November 17, 2011   9 / 43
Does this look familiar?




  Note: RRDtool is an option, not a requirement
     markwkm (PDXPUG)                  collectd & PostgreSQL   November 17, 2011   10 / 43
What is special about collectd?

  From their web site:
      •         it’s written in C for performance and portability
      •         includes optimizations and features to handle hundreds
                of thousands of data sets
      • PostgreSQL plugin enables querying the database
      • Can collect most operating systems statistics (I say ā€œmostā€ because I
         don’t know if anything is missing)
      • Over 90 total plugins
         http://collectd.org/wiki/index.php/Table_of_Plugins


     markwkm (PDXPUG)             collectd & PostgreSQL           November 17, 2011   11 / 43
collectd data description

      • time - when the data was collected
      • interval - frequency of data collection
      • host - server hostname
      • plugin - collectd plugin used
      • plugin instance - additional plugin information
      • type - type of data collected for set of values
      • type instance - unique identifier of the metric
      • dsnames - names for the values collected
      • dstypes - type of data for values collected (e.g. counter, gauge, etc.)
      • values - array of values collected

     markwkm (PDXPUG)              collectd & PostgreSQL            November 17, 2011   12 / 43
PostgreSQL plugin configuration
  Define custom queries in collectd.conf:

  LoadPlugin postgresql
  <Plugin postgresql>
     <Query magic>
         Statement "SELECT magic FROM wizard;"
         <Result>
             Type gauge
             InstancePrefix "magic"
             ValuesFrom magic
         </Result>
     </Query>
  ...

     markwkm (PDXPUG)             collectd & PostgreSQL   November 17, 2011   13 / 43
. . . per database.

...
   <Database bar>
       Interval 60
       Service "service_name"
       Query backend # predefined
       Query magic_tickets
   </Database>
</Plugin>


Full details at
http://collectd.org/wiki/index.php/Plugin:PostgreSQL

   markwkm (PDXPUG)       collectd & PostgreSQL        November 17, 2011   14 / 43
Hurdles



  More meta data:
      • Need a way to save schema, table, and index names; can’t differentiate
        stats between tables and indexes
      • Basic support of meta data in collectd but mostly unused
      • How to store data in something other than RRDtool




     markwkm (PDXPUG)            collectd & PostgreSQL          November 17, 2011   15 / 43
Wanted: additional meta data


  Hack the PostgreSQL plugin to create meta data for:
      • database - database name (maybe not needed, same as
         plugin instance)
      • schemaname - schema name
      • tablename - table name
      • indexname - index name
      • metric - e.g. blks hit, blks read, seq scan, etc.




     markwkm (PDXPUG)           collectd & PostgreSQL         November 17, 2011   16 / 43
Another database query for collecting a table statistic



  <Query table_stats>
      SELECT schemaname, relname, seq_scan
      FROM pg_stat_all_tables;
  <Query>




     markwkm (PDXPUG)       collectd & PostgreSQL   November 17, 2011   17 / 43
Identify the data



  <Result>
      Type counter
      InstancePrefix "seq_scan"
      InstancesFrom "schemaname" "relname"
      ValuesFrom "seq_scan"
  </Result>




     markwkm (PDXPUG)       collectd & PostgreSQL   November 17, 2011   18 / 43
Meta data specific parameters


  <Database postgres>
      Host "localhost"
      Query table_stats
      SchemanameColumn 0
      TablenameColumn 1
  </Database>



  Note: The database name is set by what is specified in the <Database>tag, if
  it is not retrieved by the query.

     markwkm (PDXPUG)            collectd & PostgreSQL          November 17, 2011   19 / 43
Example data

     • time: 2011-10-20 18:04:17-05
     • interval: 300
     • host: pong.int
     • plugin: postgresql
     • plugin instance: sandbox
     • type: counter
     • type instance: seq scan-pg catalog-pg class
     • dsnames: {value}
     • dstypes: {counter}
     • values: {249873}

    markwkm (PDXPUG)              collectd & PostgreSQL   November 17, 2011   20 / 43
Example meta data



     • database: sandbox
     • schemaname: pg catalog
     • tablename: pg class
     • indexname:
     • metric: seq scan




    markwkm (PDXPUG)            collectd & PostgreSQL   November 17, 2011   21 / 43
Now what?



  Hand’s were tied (I think I mentioned that earlier); open sourced work to date:

      • collectd forked with patches
        https://github.com/mwongatemma/collectd
      • YAMS https://github.com/myemma/yams




     markwkm (PDXPUG)              collectd & PostgreSQL            November 17, 2011   22 / 43
Yet Another Monitoring System




markwkm (PDXPUG)             collectd & PostgreSQL   November 17, 2011   23 / 43
Switching hats and boosting code




  Using extracurricular time working on equipment donated to Postgres from
  SUN, IBM, and HP to continue proofing collectd changes.




     markwkm (PDXPUG)            collectd & PostgreSQL           November 17, 2011   24 / 43
How am I going to move the data?

  Options from available write plugins; guess which I used:
      • Carbon - Graphite’s storage API to Whisper
        http://collectd.org/wiki/index.php/Plugin:Carbon
      • CSV http://collectd.org/wiki/index.php/Plugin:CSV
      • Network - Send/Receive to other collectd daemons
        http://collectd.org/wiki/index.php/Plugin:Network
      • RRDCacheD http://collectd.org/wiki/index.php/Plugin:RRDCacheD
      • RRDtool http://collectd.org/wiki/index.php/Plugin:RRDtool
      • SysLog http://collectd.org/wiki/index.php/Plugin:SysLog
      • UnixSock http://collectd.org/wiki/index.php/Plugin:UnixSock
      • Write HTTP - PUTVAL (plain text), JSON
        http://collectd.org/wiki/index.php/Plugin:Write_HTTP

     markwkm (PDXPUG)              collectd & PostgreSQL      November 17, 2011   25 / 43
Process of elimination

  If RRDtool (wriiten in C) can’t handle massive volumes of data, a Python
  RRD like database probably can’t either:
       • Carbon
       • CSV
       • Network
       • RRDCacheD
       • RRDtool
       • SysLog
       • UnixSock
       • Write HTTP - PUTVAL (plain text), JSON

     markwkm (PDXPUG)             collectd & PostgreSQL           November 17, 2011   26 / 43
Process of elimination


  Writing to other collectd daemons or just locally doesn’t seem useful at the
  moment:
      • CSV
      • Network
      • SysLog
      • UnixSock
      • Write HTTP - PUTVAL (plain text), JSON




     markwkm (PDXPUG)              collectd & PostgreSQL            November 17, 2011   27 / 43
Process of elimination



  Let’s try CouchDB’s RESTful JSON API!
       • CSV
       • SysLog
       • Write HTTP - PUTVAL (plain text), JSON




     markwkm (PDXPUG)          collectd & PostgreSQL   November 17, 2011   28 / 43
Random: What Write HTTP PUTVAL data looks like
  Note: Each PUTVAL is a single line but is broken up into two lines to fit onto
  the slide.

  PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_octets
      interval=10 1251533299:197141504:175136768
  PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_ops
      interval=10 1251533299:10765:12858
  PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_time
      interval=10 1251533299:5:140
  PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_merged
      interval=10 1251533299:4658:29899


     markwkm (PDXPUG)             collectd & PostgreSQL           November 17, 2011   29 / 43
Random: What the Write HTTP JSON data looks like
  Note: Write HTTP packs as much data as it can into a 4KB buffer.
   [ {
       "values": [197141504, 175136768],
       "dstypes": ["counter", "counter"],
       "dsnames": ["read", "write"],
       "time": 1251533299,
       "interval": 10,
       "host": "leeloo.lan.home.verplant.org",
       "plugin": "disk",
       "plugin_instance": "sda",
       "type": "disk_octets",
       "type_instance": ""
     }, ... ]
     markwkm (PDXPUG)           collectd & PostgreSQL         November 17, 2011   30 / 43
I didn’t know anything about CouchDB at the time



     • Query interface not really suited for retrieving data to visualize
     • Insert performance not suited for millions of metrics of data over short
         intervals (can insert same data into Postgres several orders of
         magnitude faster)




    markwkm (PDXPUG)               collectd & PostgreSQL            November 17, 2011   31 / 43
Now where am I going to put the data?



  Hoping that using the Write HTTP is still a good choice:
      • Write an ETL
                •   Table partitioning logic; creation of partition tables
                •   Transform JSON data into INSERT statements
      • Use Postgres




     markwkm (PDXPUG)                     collectd & PostgreSQL              November 17, 2011   32 / 43
Database design
                Table "collectd.value_list"
       Column      |           Type           | Modifiers
  -----------------+--------------------------+-----------
   time            | timestamp with time zone | not null
   interval        | integer                  | not null
   host            | character varying(64)    | not null
   plugin          | character varying(64)    | not null
   plugin_instance | character varying(64)    |
   type            | character varying(64)    | not null
   type_instance   | character varying(64)    |
   dsnames         | character varying(512)[] | not null
   dstypes         | character varying(8)[]   | not null
   values          | numeric[]                | not null
    markwkm (PDXPUG)        collectd & PostgreSQL     November 17, 2011   33 / 43
Take advantage of partitioning




  At least table inheritance in Postgres’ case; partition data by plugin




     markwkm (PDXPUG)               collectd & PostgreSQL             November 17, 2011   34 / 43
Child table
               Table "collectd.vl_postgresql"
       Column      |           Type            | Modifiers
  -----------------+--------------------------+-----------
   ...
   database        | character varying(64)     | not null
   schemaname      | character varying(64)     |
   tablename       | character varying(64)     |
   indexname       | character varying(64)     |
   metric          | character varying(64)     | not null
  Check constraints:
      "vl_postgresql_plugin_check" CHECK (plugin::text =
                                           ’postgresql’::text)
  Inherits: value_list
     markwkm (PDXPUG)       collectd & PostgreSQL      November 17, 2011   35 / 43
How much partitioning?


  Lots of straightforward options:
      • Date
      • Database
      • Schema
      • Table
      • Index
      • Metric




     markwkm (PDXPUG)                collectd & PostgreSQL   November 17, 2011   36 / 43
Back to the ETL


  Parameters set for fastest path to working prototype:
      • Keeping using HTTP POST (Write HTTP plugin) for HTTP protocol
        and JSON
      • Use Python for built in HTTP Server and JSON parsing (Emma is
        primarily a Python shop)
      • Use SQLAlchemy/psycopg2




    markwkm (PDXPUG)          collectd & PostgreSQL       November 17, 2011   37 / 43
Back again to the ETL

  Python didn’t perform; combination of JSON parsing, data transformation,
  and INSERT performance still several orders of magnitude below acceptable
  levels:
       • redis to queue data to transform
       • lighttpd for the HTTP interface
       • fastcgi C program to push things to redis
       • multi-threaded C program using libpq for Postgres API
                •   pop data out of redis
                •   table partitioning creation logic
                •   transform JSON data into INSERT statements


     markwkm (PDXPUG)                  collectd & PostgreSQL     November 17, 2011   38 / 43
Success?




     • Table statistics for 1 million tables collect in approximately 12 minutes.
     • Is that acceptable?
     • Can we go faster?




    markwkm (PDXPUG)              collectd & PostgreSQL             November 17, 2011   39 / 43
If you don’t have millions of data


  Easier ways to visualize the data:
       • RRDtool
       • RRDtool compatible front-ends
         http://collectd.org/wiki/index.php/List_of_front-ends
       • Graphite with the Carbon and Whisper combo
         http://graphite.wikidot.com/
       • Reconnoiter




     markwkm (PDXPUG)       collectd & PostgreSQL      November 17, 2011   40 / 43
__      __
          / ~~~/  . o O ( Thank you! )
    ,----(       oo     )
  /        __      __/
 /|            ( |(
^     /___ / |
    |__|    |__|-"




  markwkm (PDXPUG)       collectd & PostgreSQL   November 17, 2011   41 / 43
Acknowledgements

  Hayley Jane Wakenshaw

              __      __
            / ~~~/ 
      ,----(       oo     )
    /        __      __/
   /|            ( |(
  ^     /___ / |
      |__|    |__|-"



     markwkm (PDXPUG)         collectd & PostgreSQL   November 17, 2011   42 / 43
License



  This work is licensed under a Creative Commons Attribution 3.0 Unported
  License. To view a copy of this license, (a) visit
  http://creativecommons.org/licenses/by/3.0/us/; or, (b) send a
  letter to Creative Commons, 171 2nd Street, Suite 300, San Francisco,
  California, 94105, USA.




     markwkm (PDXPUG)            collectd & PostgreSQL          November 17, 2011   43 / 43

More Related Content

What's hot (20)

PDF
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
Ā 
PDF
PostgreSQL13恧恮pg_basebackupć®ę”¹å–„ć«ć¤ć„ć¦ļ¼ˆē¬¬13回PostgreSQLć‚¢ćƒ³ć‚«ćƒ³ćƒ•ć‚”ćƒ¬ćƒ³ć‚¹@ć‚Ŗćƒ³ćƒ©ć‚¤ćƒ³ļ¼‰
NTT DATA Technology & Innovation
Ā 
PDF
ęœ€ę–°ę©Ÿčƒ½ć¾ć§ć‚’ē·ć–ć‚‰ć„ļ¼PostgreSQLć®ę³Øē›®ę©Ÿčƒ½ć‚’ęŒÆć‚Ščæ”ć‚‹ļ¼ˆē¬¬32回 äø­å›½åœ°ę–¹DB勉強会 in 岔山 発蔨資料)
NTT DATA Technology & Innovation
Ā 
PDF
[pgday.Seoul 2022] ģ„œė¹„ģŠ¤ź°œķŽøģ‹œ PostgreSQL ė„ģž…źø° - ģ§„ģ†Œė¦° & ź¹€ķƒœģ •
PgDay.Seoul
Ā 
PDF
PostgreSQL WAL for DBAs
PGConf APAC
Ā 
PDF
ć‚ćŖćŸć®ēŸ„ć‚‰ćŖć„PostgreSQLē›£č¦–ć®äø–ē•Œ
Yoshinori Nakanishi
Ā 
PDF
Introduction VAUUM, Freezing, XID wraparound
Masahiko Sawada
Ā 
PDF
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
PostgreSQL-Consulting
Ā 
PDF
Inside vacuum - ē¬¬äø€å›žPostgreSQLćƒ—ćƒ¬å‹‰å¼·ä¼š
Masahiko Sawada
Ā 
PPTX
Get Your Insecure PostgreSQL Passwords to SCRAM
Jonathan Katz
Ā 
PDF
ę˜Žę—„ć‹ć‚‰ä½æćˆć‚‹Postgre sqlé‹ē”Øē®”ē†ćƒ†ć‚Æćƒ‹ćƒƒć‚Æ(監視編)
kasaharatt
Ā 
PDF
ć‚Ŗćƒ©ć‚Æćƒ«ć®Hadoopć‚½ćƒŖćƒ„ćƒ¼ć‚·ćƒ§ćƒ³ć”ē“¹ä»‹
ć‚Ŗćƒ©ć‚Æćƒ«ć‚Øćƒ³ć‚øćƒ‹ć‚¢é€šäæ”
Ā 
PDF
SSD Deployment Strategies for MySQL
Yoshinori Matsunobu
Ā 
PDF
Oracle GoldenGate ć‚¢ćƒ¼ć‚­ćƒ†ć‚Æćƒćƒ£ćØåŸŗęœ¬ę©Ÿčƒ½
ć‚Ŗćƒ©ć‚Æćƒ«ć‚Øćƒ³ć‚øćƒ‹ć‚¢é€šäæ”
Ā 
PDF
A Journey to Boot Linux on Raspberry Pi
Jian-Hong Pan
Ā 
PDF
PostgreSQL + pgpoolę§‹ęˆć«ćŠć‘ć‚‹ćƒŖć‚«ćƒćƒŖ
hiroin0
Ā 
PDF
MariaDB Server Performance Tuning & Optimization
MariaDB plc
Ā 
PDF
PostgreSQL Deep Internal
EXEM
Ā 
PDF
Apache Bigtop3.2 (ä»®)(Open Source Conference 2022 Online/Hiroshima 発蔨資料)
NTT DATA Technology & Innovation
Ā 
PDF
GoldenGatećƒ†ć‚Æćƒ‹ć‚«ćƒ«ć‚»ćƒŸćƒŠćƒ¼3怌Oracle GoldenGate Technical Deep Dive怍(2016/5/11)
ć‚Ŗćƒ©ć‚Æćƒ«ć‚Øćƒ³ć‚øćƒ‹ć‚¢é€šäæ”
Ā 
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
Ā 
PostgreSQL13恧恮pg_basebackupć®ę”¹å–„ć«ć¤ć„ć¦ļ¼ˆē¬¬13回PostgreSQLć‚¢ćƒ³ć‚«ćƒ³ćƒ•ć‚”ćƒ¬ćƒ³ć‚¹@ć‚Ŗćƒ³ćƒ©ć‚¤ćƒ³ļ¼‰
NTT DATA Technology & Innovation
Ā 
ęœ€ę–°ę©Ÿčƒ½ć¾ć§ć‚’ē·ć–ć‚‰ć„ļ¼PostgreSQLć®ę³Øē›®ę©Ÿčƒ½ć‚’ęŒÆć‚Ščæ”ć‚‹ļ¼ˆē¬¬32回 äø­å›½åœ°ę–¹DB勉強会 in 岔山 発蔨資料)
NTT DATA Technology & Innovation
Ā 
[pgday.Seoul 2022] ģ„œė¹„ģŠ¤ź°œķŽøģ‹œ PostgreSQL ė„ģž…źø° - ģ§„ģ†Œė¦° & ź¹€ķƒœģ •
PgDay.Seoul
Ā 
PostgreSQL WAL for DBAs
PGConf APAC
Ā 
ć‚ćŖćŸć®ēŸ„ć‚‰ćŖć„PostgreSQLē›£č¦–ć®äø–ē•Œ
Yoshinori Nakanishi
Ā 
Introduction VAUUM, Freezing, XID wraparound
Masahiko Sawada
Ā 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
PostgreSQL-Consulting
Ā 
Inside vacuum - ē¬¬äø€å›žPostgreSQLćƒ—ćƒ¬å‹‰å¼·ä¼š
Masahiko Sawada
Ā 
Get Your Insecure PostgreSQL Passwords to SCRAM
Jonathan Katz
Ā 
ę˜Žę—„ć‹ć‚‰ä½æćˆć‚‹Postgre sqlé‹ē”Øē®”ē†ćƒ†ć‚Æćƒ‹ćƒƒć‚Æ(監視編)
kasaharatt
Ā 
ć‚Ŗćƒ©ć‚Æćƒ«ć®Hadoopć‚½ćƒŖćƒ„ćƒ¼ć‚·ćƒ§ćƒ³ć”ē“¹ä»‹
ć‚Ŗćƒ©ć‚Æćƒ«ć‚Øćƒ³ć‚øćƒ‹ć‚¢é€šäæ”
Ā 
SSD Deployment Strategies for MySQL
Yoshinori Matsunobu
Ā 
Oracle GoldenGate ć‚¢ćƒ¼ć‚­ćƒ†ć‚Æćƒćƒ£ćØåŸŗęœ¬ę©Ÿčƒ½
ć‚Ŗćƒ©ć‚Æćƒ«ć‚Øćƒ³ć‚øćƒ‹ć‚¢é€šäæ”
Ā 
A Journey to Boot Linux on Raspberry Pi
Jian-Hong Pan
Ā 
PostgreSQL + pgpoolę§‹ęˆć«ćŠć‘ć‚‹ćƒŖć‚«ćƒćƒŖ
hiroin0
Ā 
MariaDB Server Performance Tuning & Optimization
MariaDB plc
Ā 
PostgreSQL Deep Internal
EXEM
Ā 
Apache Bigtop3.2 (ä»®)(Open Source Conference 2022 Online/Hiroshima 発蔨資料)
NTT DATA Technology & Innovation
Ā 
GoldenGatećƒ†ć‚Æćƒ‹ć‚«ćƒ«ć‚»ćƒŸćƒŠćƒ¼3怌Oracle GoldenGate Technical Deep Dive怍(2016/5/11)
ć‚Ŗćƒ©ć‚Æćƒ«ć‚Øćƒ³ć‚øćƒ‹ć‚¢é€šäæ”
Ā 

Viewers also liked (15)

PDF
OHAI, my name is Chelnik! PGCon 2014 Mockumentary
Mark Wong
Ā 
DOC
Influxdb
Nguyen Ngoc Lieu
Ā 
PDF
pg_top is 'top' for PostgreSQL: pg_top + pg_proctab
Mark Wong
Ā 
PDF
pg_top is 'top' for PostgreSQL
Mark Wong
Ā 
PDF
Bacd zenoss
ke4qqq
Ā 
PDF
InfluxDB & Grafana
Pedro Salgado
Ā 
PDF
Advanced Postgres Monitoring
Denish Patel
Ā 
PPTX
InfluxDb
Guamaral Vasil
Ā 
PDF
Backend server monitoring and alarm system (collectd, graphite, grafana, zabb...
Jongwon Han
Ā 
PDF
Beautiful Monitoring With Grafana and InfluxDB
leesjensen
Ā 
PDF
Best Practices for Becoming an Exceptional Postgres DBA
EDB
Ā 
PDF
5 Steps to PostgreSQL Performance
Command Prompt., Inc
Ā 
OHAI, my name is Chelnik! PGCon 2014 Mockumentary
Mark Wong
Ā 
Influxdb
Nguyen Ngoc Lieu
Ā 
pg_top is 'top' for PostgreSQL: pg_top + pg_proctab
Mark Wong
Ā 
pg_top is 'top' for PostgreSQL
Mark Wong
Ā 
Bacd zenoss
ke4qqq
Ā 
InfluxDB & Grafana
Pedro Salgado
Ā 
Advanced Postgres Monitoring
Denish Patel
Ā 
InfluxDb
Guamaral Vasil
Ā 
Backend server monitoring and alarm system (collectd, graphite, grafana, zabb...
Jongwon Han
Ā 
Beautiful Monitoring With Grafana and InfluxDB
leesjensen
Ā 
Best Practices for Becoming an Exceptional Postgres DBA
EDB
Ā 
5 Steps to PostgreSQL Performance
Command Prompt., Inc
Ā 
Ad

Similar to collectd & PostgreSQL (20)

PDF
OSMC 2014: Introduction into collectd | Florian Foster
NETWAYS
Ā 
PDF
OSMC 2014 | Introduction into collectd by Florian Forster
NETWAYS
Ā 
PDF
Monitoring pg with_graphite_grafana
Jan Wieck
Ā 
PPTX
MySQL performance monitoring using Statsd and Graphite (PLUK2013)
spil-engineering
Ā 
PPTX
MySQL performance monitoring using Statsd and Graphite
DB-Art
Ā 
PDF
MySQL Performance Monitoring
spil-engineering
Ā 
PPTX
Doing More with Postgres - Yesterday's Vision Becomes Today's Reality
EDB
Ā 
PPTX
Tales from the Postgres Front - and What We Can Learn
EDB
Ā 
PDF
PostgreSQL Extension APIs are Changing the Face of Relational Databases | PGC...
Teresa Giacomini
Ā 
PDF
Postgres database Ibrahem Batta
Ibrahem Batta
Ā 
PDF
SysDB – System DataBase — a system management and inventory collection service
SysDB Project
Ā 
PPTX
New enhancements for security and usability in EDB 13
EDB
Ā 
PDF
Rohit_vyas_PGDay_Presentation
Rohit vyas
Ā 
PDF
Case Studies on PostgreSQL
InMobi Technology
Ā 
PDF
User-space Network Processing
Ryousei Takano
Ā 
KEY
Releasing PostgreSQL Extension on PGXN
David Wheeler
Ā 
PDF
Beyond Postgres: Interesting Projects, Tools and forks
Sameer Kumar
Ā 
PDF
Introduction to Postrges-XC
Ashutosh Bapat
Ā 
PDF
Telemetry doesn't have to be scary; Ben Ford
Puppet
Ā 
PDF
Ben ford intro
Puppet
Ā 
OSMC 2014: Introduction into collectd | Florian Foster
NETWAYS
Ā 
OSMC 2014 | Introduction into collectd by Florian Forster
NETWAYS
Ā 
Monitoring pg with_graphite_grafana
Jan Wieck
Ā 
MySQL performance monitoring using Statsd and Graphite (PLUK2013)
spil-engineering
Ā 
MySQL performance monitoring using Statsd and Graphite
DB-Art
Ā 
MySQL Performance Monitoring
spil-engineering
Ā 
Doing More with Postgres - Yesterday's Vision Becomes Today's Reality
EDB
Ā 
Tales from the Postgres Front - and What We Can Learn
EDB
Ā 
PostgreSQL Extension APIs are Changing the Face of Relational Databases | PGC...
Teresa Giacomini
Ā 
Postgres database Ibrahem Batta
Ibrahem Batta
Ā 
SysDB – System DataBase — a system management and inventory collection service
SysDB Project
Ā 
New enhancements for security and usability in EDB 13
EDB
Ā 
Rohit_vyas_PGDay_Presentation
Rohit vyas
Ā 
Case Studies on PostgreSQL
InMobi Technology
Ā 
User-space Network Processing
Ryousei Takano
Ā 
Releasing PostgreSQL Extension on PGXN
David Wheeler
Ā 
Beyond Postgres: Interesting Projects, Tools and forks
Sameer Kumar
Ā 
Introduction to Postrges-XC
Ashutosh Bapat
Ā 
Telemetry doesn't have to be scary; Ben Ford
Puppet
Ā 
Ben ford intro
Puppet
Ā 
Ad

More from Mark Wong (18)

PDF
OHAI, my name is Chelnik! Postgres Open 2013 Report
Mark Wong
Ā 
PDF
Android & PostgreSQL
Mark Wong
Ā 
PDF
PGTop for Android: Things I learned making this app
Mark Wong
Ā 
PDF
Introduction to PostgreSQL
Mark Wong
Ā 
PDF
Developing PGTop for Android
Mark Wong
Ā 
PDF
Pg in-the-brazilian-armed-forces-presentation
Mark Wong
Ā 
PDF
pg_proctab: Accessing System Stats in PostgreSQL
Mark Wong
Ā 
PDF
pg_proctab: Accessing System Stats in PostgreSQL
Mark Wong
Ā 
PDF
PostgreSQL Portland Performance Practice Project - Database Test 2 Tuning
Mark Wong
Ā 
PDF
pg_proctab: Accessing System Stats in PostgreSQL
Mark Wong
Ā 
PDF
Filesystem Performance from a Database Perspective
Mark Wong
Ā 
PDF
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem...
Mark Wong
Ā 
PDF
PostgreSQL Portland Performance Practice Project - Database Test 2 Howto
Mark Wong
Ā 
PDF
PostgreSQL Portland Performance Practice Project - Database Test 2 Workload D...
Mark Wong
Ā 
PDF
PostgreSQL Portland Performance Practice Project - Database Test 2 Background
Mark Wong
Ā 
PDF
PostgreSQL Portland Performance Practice Project - Database Test 2 Series Ove...
Mark Wong
Ā 
PDF
Linux Filesystems, RAID, and more
Mark Wong
Ā 
PDF
What Is Going On?
Mark Wong
Ā 
OHAI, my name is Chelnik! Postgres Open 2013 Report
Mark Wong
Ā 
Android & PostgreSQL
Mark Wong
Ā 
PGTop for Android: Things I learned making this app
Mark Wong
Ā 
Introduction to PostgreSQL
Mark Wong
Ā 
Developing PGTop for Android
Mark Wong
Ā 
Pg in-the-brazilian-armed-forces-presentation
Mark Wong
Ā 
pg_proctab: Accessing System Stats in PostgreSQL
Mark Wong
Ā 
pg_proctab: Accessing System Stats in PostgreSQL
Mark Wong
Ā 
PostgreSQL Portland Performance Practice Project - Database Test 2 Tuning
Mark Wong
Ā 
pg_proctab: Accessing System Stats in PostgreSQL
Mark Wong
Ā 
Filesystem Performance from a Database Perspective
Mark Wong
Ā 
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem...
Mark Wong
Ā 
PostgreSQL Portland Performance Practice Project - Database Test 2 Howto
Mark Wong
Ā 
PostgreSQL Portland Performance Practice Project - Database Test 2 Workload D...
Mark Wong
Ā 
PostgreSQL Portland Performance Practice Project - Database Test 2 Background
Mark Wong
Ā 
PostgreSQL Portland Performance Practice Project - Database Test 2 Series Ove...
Mark Wong
Ā 
Linux Filesystems, RAID, and more
Mark Wong
Ā 
What Is Going On?
Mark Wong
Ā 

Recently uploaded (20)

PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
Ā 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
Ā 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
Ā 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
Ā 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
Ā 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
Ā 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
Ā 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
Ā 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
Ā 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
Ā 
PPTX
Digital Circuits, important subject in CS
contactparinay1
Ā 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
Ā 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
Ā 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
Ā 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
Ā 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
Ā 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
Ā 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
Ā 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
Ā 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
Ā 
The Project Compass - GDG on Campus MSIT
dscmsitkol
Ā 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
Ā 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
Ā 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
Ā 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
Ā 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
Ā 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
Ā 
How do you fast track Agentic automation use cases discovery?
DianaGray10
Ā 
Staying Human in a Machine- Accelerated World
Catalin Jora
Ā 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
Ā 
Digital Circuits, important subject in CS
contactparinay1
Ā 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
Ā 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
Ā 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
Ā 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
Ā 
Mastering Financial Management in Direct Selling
Epixel MLM Software
Ā 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
Ā 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
Ā 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
Ā 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
Ā 

collectd & PostgreSQL

  • 1. collectd & PostgreSQL Mark Wong [email protected] [email protected] PDXPUG November 17, 2011
  • 2. My Story • How did I get to collectd? • What is collectd • Hacking collectd • Using collectd with Postgres • Visualizing the data markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 2 / 43
  • 3. Brief background • Working at a little company called Emma http://myemma.com • Collect performance data from production systems markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 3 / 43
  • 4. What did we have? • A database with over 1 million database objects • >500,000 tables • >1,000,000 indexes • Tables alone generate 11,000,000 data point per sample markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 4 / 43
  • 5. What did we try? Only free things: • Cacti http://www.cacti.net/ • Ganglia http://ganglia.info/ • Munin http://munin-monitoring.org/ • Reconnoiter https://labs.omniti.com/labs/reconnoiter • Zenoss http://community.zenoss.org/ markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 5 / 43
  • 6. What doesn’t work Dependency on RRDtool; can’t handle more than hundreds of thousands of metrics (Application Buffer-Cache Management for Performance: Running the World’s Largest MRTG by David Plonka, Archit Gupta and Dale Carder, LISA 2007): • Cacti • Ganglia • Munin • Reconnoiter • Zenoss markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 6 / 43
  • 7. Reconnoiter almost worked for us Pro’s: • Write your own SQL queries to collect data from Postgres • Used Postgres instead of RRDtool for storing data • JavaScript based on-the-fly charting • Support for integrating many other health and stats collection solutions Con’s: • Data collection still couldn’t keep up; maybe needed more tuning • Faster hardware? (using VM’s) • More hardware? (scale out MQ processes) markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 7 / 43
  • 8. Couldn’t bring myself to try anything else • Hands were tied, no resources available to help move forward. • Can we build something light weight? • Played with collectd (http://collectd.org/) while evaluating Reconnoiter markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 8 / 43
  • 9. What is collectd? collectd is a daemon which collects system performance statistics periodically and provides mechanisms to store the values in a variety of ways, for example in RRD files. http://collectd.org/ markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 9 / 43
  • 10. Does this look familiar? Note: RRDtool is an option, not a requirement markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 10 / 43
  • 11. What is special about collectd? From their web site: • it’s written in C for performance and portability • includes optimizations and features to handle hundreds of thousands of data sets • PostgreSQL plugin enables querying the database • Can collect most operating systems statistics (I say ā€œmostā€ because I don’t know if anything is missing) • Over 90 total plugins http://collectd.org/wiki/index.php/Table_of_Plugins markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 11 / 43
  • 12. collectd data description • time - when the data was collected • interval - frequency of data collection • host - server hostname • plugin - collectd plugin used • plugin instance - additional plugin information • type - type of data collected for set of values • type instance - unique identifier of the metric • dsnames - names for the values collected • dstypes - type of data for values collected (e.g. counter, gauge, etc.) • values - array of values collected markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 12 / 43
  • 13. PostgreSQL plugin configuration Define custom queries in collectd.conf: LoadPlugin postgresql <Plugin postgresql> <Query magic> Statement "SELECT magic FROM wizard;" <Result> Type gauge InstancePrefix "magic" ValuesFrom magic </Result> </Query> ... markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 13 / 43
  • 14. . . . per database. ... <Database bar> Interval 60 Service "service_name" Query backend # predefined Query magic_tickets </Database> </Plugin> Full details at http://collectd.org/wiki/index.php/Plugin:PostgreSQL markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 14 / 43
  • 15. Hurdles More meta data: • Need a way to save schema, table, and index names; can’t differentiate stats between tables and indexes • Basic support of meta data in collectd but mostly unused • How to store data in something other than RRDtool markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 15 / 43
  • 16. Wanted: additional meta data Hack the PostgreSQL plugin to create meta data for: • database - database name (maybe not needed, same as plugin instance) • schemaname - schema name • tablename - table name • indexname - index name • metric - e.g. blks hit, blks read, seq scan, etc. markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 16 / 43
  • 17. Another database query for collecting a table statistic <Query table_stats> SELECT schemaname, relname, seq_scan FROM pg_stat_all_tables; <Query> markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 17 / 43
  • 18. Identify the data <Result> Type counter InstancePrefix "seq_scan" InstancesFrom "schemaname" "relname" ValuesFrom "seq_scan" </Result> markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 18 / 43
  • 19. Meta data specific parameters <Database postgres> Host "localhost" Query table_stats SchemanameColumn 0 TablenameColumn 1 </Database> Note: The database name is set by what is specified in the <Database>tag, if it is not retrieved by the query. markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 19 / 43
  • 20. Example data • time: 2011-10-20 18:04:17-05 • interval: 300 • host: pong.int • plugin: postgresql • plugin instance: sandbox • type: counter • type instance: seq scan-pg catalog-pg class • dsnames: {value} • dstypes: {counter} • values: {249873} markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 20 / 43
  • 21. Example meta data • database: sandbox • schemaname: pg catalog • tablename: pg class • indexname: • metric: seq scan markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 21 / 43
  • 22. Now what? Hand’s were tied (I think I mentioned that earlier); open sourced work to date: • collectd forked with patches https://github.com/mwongatemma/collectd • YAMS https://github.com/myemma/yams markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 22 / 43
  • 23. Yet Another Monitoring System markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 23 / 43
  • 24. Switching hats and boosting code Using extracurricular time working on equipment donated to Postgres from SUN, IBM, and HP to continue proofing collectd changes. markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 24 / 43
  • 25. How am I going to move the data? Options from available write plugins; guess which I used: • Carbon - Graphite’s storage API to Whisper http://collectd.org/wiki/index.php/Plugin:Carbon • CSV http://collectd.org/wiki/index.php/Plugin:CSV • Network - Send/Receive to other collectd daemons http://collectd.org/wiki/index.php/Plugin:Network • RRDCacheD http://collectd.org/wiki/index.php/Plugin:RRDCacheD • RRDtool http://collectd.org/wiki/index.php/Plugin:RRDtool • SysLog http://collectd.org/wiki/index.php/Plugin:SysLog • UnixSock http://collectd.org/wiki/index.php/Plugin:UnixSock • Write HTTP - PUTVAL (plain text), JSON http://collectd.org/wiki/index.php/Plugin:Write_HTTP markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 25 / 43
  • 26. Process of elimination If RRDtool (wriiten in C) can’t handle massive volumes of data, a Python RRD like database probably can’t either: • Carbon • CSV • Network • RRDCacheD • RRDtool • SysLog • UnixSock • Write HTTP - PUTVAL (plain text), JSON markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 26 / 43
  • 27. Process of elimination Writing to other collectd daemons or just locally doesn’t seem useful at the moment: • CSV • Network • SysLog • UnixSock • Write HTTP - PUTVAL (plain text), JSON markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 27 / 43
  • 28. Process of elimination Let’s try CouchDB’s RESTful JSON API! • CSV • SysLog • Write HTTP - PUTVAL (plain text), JSON markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 28 / 43
  • 29. Random: What Write HTTP PUTVAL data looks like Note: Each PUTVAL is a single line but is broken up into two lines to fit onto the slide. PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_octets interval=10 1251533299:197141504:175136768 PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_ops interval=10 1251533299:10765:12858 PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_time interval=10 1251533299:5:140 PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_merged interval=10 1251533299:4658:29899 markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 29 / 43
  • 30. Random: What the Write HTTP JSON data looks like Note: Write HTTP packs as much data as it can into a 4KB buffer. [ { "values": [197141504, 175136768], "dstypes": ["counter", "counter"], "dsnames": ["read", "write"], "time": 1251533299, "interval": 10, "host": "leeloo.lan.home.verplant.org", "plugin": "disk", "plugin_instance": "sda", "type": "disk_octets", "type_instance": "" }, ... ] markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 30 / 43
  • 31. I didn’t know anything about CouchDB at the time • Query interface not really suited for retrieving data to visualize • Insert performance not suited for millions of metrics of data over short intervals (can insert same data into Postgres several orders of magnitude faster) markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 31 / 43
  • 32. Now where am I going to put the data? Hoping that using the Write HTTP is still a good choice: • Write an ETL • Table partitioning logic; creation of partition tables • Transform JSON data into INSERT statements • Use Postgres markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 32 / 43
  • 33. Database design Table "collectd.value_list" Column | Type | Modifiers -----------------+--------------------------+----------- time | timestamp with time zone | not null interval | integer | not null host | character varying(64) | not null plugin | character varying(64) | not null plugin_instance | character varying(64) | type | character varying(64) | not null type_instance | character varying(64) | dsnames | character varying(512)[] | not null dstypes | character varying(8)[] | not null values | numeric[] | not null markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 33 / 43
  • 34. Take advantage of partitioning At least table inheritance in Postgres’ case; partition data by plugin markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 34 / 43
  • 35. Child table Table "collectd.vl_postgresql" Column | Type | Modifiers -----------------+--------------------------+----------- ... database | character varying(64) | not null schemaname | character varying(64) | tablename | character varying(64) | indexname | character varying(64) | metric | character varying(64) | not null Check constraints: "vl_postgresql_plugin_check" CHECK (plugin::text = ’postgresql’::text) Inherits: value_list markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 35 / 43
  • 36. How much partitioning? Lots of straightforward options: • Date • Database • Schema • Table • Index • Metric markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 36 / 43
  • 37. Back to the ETL Parameters set for fastest path to working prototype: • Keeping using HTTP POST (Write HTTP plugin) for HTTP protocol and JSON • Use Python for built in HTTP Server and JSON parsing (Emma is primarily a Python shop) • Use SQLAlchemy/psycopg2 markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 37 / 43
  • 38. Back again to the ETL Python didn’t perform; combination of JSON parsing, data transformation, and INSERT performance still several orders of magnitude below acceptable levels: • redis to queue data to transform • lighttpd for the HTTP interface • fastcgi C program to push things to redis • multi-threaded C program using libpq for Postgres API • pop data out of redis • table partitioning creation logic • transform JSON data into INSERT statements markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 38 / 43
  • 39. Success? • Table statistics for 1 million tables collect in approximately 12 minutes. • Is that acceptable? • Can we go faster? markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 39 / 43
  • 40. If you don’t have millions of data Easier ways to visualize the data: • RRDtool • RRDtool compatible front-ends http://collectd.org/wiki/index.php/List_of_front-ends • Graphite with the Carbon and Whisper combo http://graphite.wikidot.com/ • Reconnoiter markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 40 / 43
  • 41. __ __ / ~~~/ . o O ( Thank you! ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-" markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 41 / 43
  • 42. Acknowledgements Hayley Jane Wakenshaw __ __ / ~~~/ ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-" markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 42 / 43
  • 43. License This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, (a) visit http://creativecommons.org/licenses/by/3.0/us/; or, (b) send a letter to Creative Commons, 171 2nd Street, Suite 300, San Francisco, California, 94105, USA. markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 43 / 43