SlideShare a Scribd company logo
HBaseCon Update
Distributed, Scalable Time Series Database
Chris Larsen clarsen@yahoo-inc.com
Who Am I? (no really, who am I?)
Chris Larsen
Maintainer for OpenTSDB
Software Engineer @ Yahoo!
Monitoring Team
What Is OpenTSDB?
Open Source Time Series Database
Store trillions of data points
Sucks up all data and keeps going
Never lose precision
Scales using HBase, Cassandra
Or Bigtable
What good is it?
Systems Monitoring & Measurement
Servers
Networks
Sensor Data
The Internet of Things
SCADA
Financial Data
Scientific Experiment Results
Use Cases
Backing store for Argus:
Open source monitoring
and alerting system
15 HBase Servers
6 month retention
10M writes per minute
95p query latency < 30 days = 200ms
Moving to 200 node cluster writing at 100M/m
Use Cases
●Monitoring system, network and application
performance and statistics
110 region servers, 10M writes/s ~ 2PB
Multi-tenant and Kerberos secure HBase
~200k writes per second per TSD
Central monitoring for all Yahoo properties
Over 2 billion time series served
Some Other Users
What Are Time Series?
Time Series: data points for an identity
over time
Typical Identity:
Dotted string: web01.sys.cpu.user.0
OpenTSDB Identity:
Metric: sys.cpu.user
Tags (name/value pairs):
host=web01 cpu=0
What Are Time Series?
Data Point:
Metric + Tags
+ Value: 42
+ Timestamp: 1234567890
sys.cpu.user 1234567890 42 host=web01 cpu=0
^ a data point ^
How it Works
Writing Data
1) Open Telnet style socket, write:
put sys.cpu.user 1234567890 42 host=web01 cpu=0
2) ..or, post JSON to:
http://<host>:<port>/api/put
3) .. or import big files with CLI
No schema definition
No RRD file creation
Just write!
Querying Data
Graph with the GUI
CLI tools
HTTP API
Aggregate multiple series
Simple query language
To average all CPUs on host:
start=1h-ago
avg sys.cpu.user
host=web01
HBase Data Tables
tsdb - Data point table. Massive
tsdb-uid - Name to UID and UID to
name mappings
tsdb-meta - Time series index and
meta-data
tsdb-tree - Config and index for
hierarchical naming schema
Data Table Schema
Row key is a concatenation of UIDs and time:
metric + timestamp + tagk1 + tagv1… + tagkN + tagvN
sys.cpu.user 1234567890 42 host=web01 cpu=0
x00x00x01x49x95xFBx70x00x00x01x00x00x01x00x00x02x00x00x02
Timestamp normalized on 1 hour boundaries
All data points for an hour are stored in one row
Enables fast scans of all time series for a metric
…or pass a row key filter for specific time series with
particular tags
New for OpenTSDB 2.2
● Append writes (no more need for TSD
Compactions)
● Row salting and random metric IDs
● Downsampling Fill Policies
● Query filters (wildcard, regex, group by or not)
● Storage Exception plugin for retrying writes
● Released February 2016
New for OpenTSDB 2.3
● Graphite style expressions
● Cross-metric expressions
● Calendar based downsampling
● New data stores
● UID assignment plugin interface
● Datapoint write filter plugin interface
● RC1 released May 2016
● New Committer, Jonathan Creasy
Fuzzy Row Filter
How do you find a single time
series out of 1 million?
For a day?
For a month?
Fuzzy Row Filter
Instead of running a regex
string comparator over each
byte array formatted key…
(?s)^.{9}(?:.{8})*Qx00x00x00x02
E(?:Q)x00x0F‡x42x2BE)(?:.{8})*$
TSDB query takes 1.6 seconds
for 89,726 rows
KEY
Match -> m t1 tagk1 tagv1
No Match -> m t1 tagk1 tagv2
No Match -> m t1 tagk1 tagv1 tagk2 tagv3
No Match -> m t1 tagk1 tagv2 tagk2 tagv4
No Match -> m t1 tagk3 tagv5
No Match -> m t1 tagk3 tagv6
Match -> m t2 tagk tagv1
No Match -> m t2 tagk tagv2
Fuzzy Row Filter
Use a byte mask!
● Use the bloom filter to skip-scan
to the next candidate row.
● Combine with regex (after fuzzy
filter) to filter further.
FuzzyFilter{[FuzzyFilterPair{row_key=[18, 68,
-3, -82, 120, 87, 56, -15, 96, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0],
mask=[0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0,
1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1]}]}
Now it takes 0.239 seconds
KEY
Match -> m t1 tagk1 tagv1
No Match -> m t1 tagk1 tagv2
Skip -> m t1 tagk1 tagv1 tagk2 tagv3
m t1 tagk1 tagv2 tagk2 tagv4
m t1 tagk3 tagv5
m t1 tagk3 tagv6
Match -> m t2 tagk tagv1
No Match -> m t2 tagk tagv2
Fuzzy Row Filter
Pros:
● Can improve scan latency by orders of magnitude
● Combines nicely with other filters
Cons:
● All row keys for the match have to be the same, fixed
length
● Doesn’t help much when matching the majority of a set
OR if a set has uniform key lengths
● Doesn’t support bitmasks, only byte masks
AsyncHBase
AsyncHBase is a fully asynchronous, multi-
threaded HBase client
Supports HBase 0.90 to 1.x
Faster and less resource intensive than the
native HBase client
Support for scanner filters, META prefetch,
“fail-fast” RPCs
AsyncHBase in YCSB
● New Yahoo! Cloud Serving Benchmark (YCSB)
module for testing AsyncHBase
● Test Params:
○ 1 YCSB worker thread with workload A for run and load
○ Ran consecutive Async -> HBase -> Async -> HBase… (new YCSB JVM
each run) 50 times
○ HBase 1.0.0 stock Apache with default configs
○ Local host, Macbook Pro
○ 10K rows written/read
○ Async writes for both
AsyncHBase in YCSB
HBase Client
Threads:
238
AsyncHBase
Client
Threads:
22
AsyncHBase in YCSB
AsyncHBase in YCSB
Upcoming in 1.8
●Reverse Scanning
●Multi-Get requests
●Netty 4
●Lots of bug fixes
○Stuck NSRE bugs
○Region client resource leaks
OpenTSDB on Bigtable
● Bigtable
○Hosted Google Service
○Client uses HTTP2 and GRPC for communication
● OpenTSDB heads home
○Based on a time series store on Bigtable at Google
○Identical schema as HBase
○Same filter support (fuzzy filters are coming)
OpenTSDB on Bigtable
● AsyncBigtable
○Implementation of AsyncHBase’s API for drop-in use
○https://github.com/OpenTSDB/asyncbigtable
○Uses HTable API
○Moving to native Bigtable API
● Thanks to Christos of Pythian, Solomon, Carter, Misha,
and the rest of the Google Bigtable team
● https://www.pythian.com/blog/run-opentsdb-google-
bigtable/#
OpenTSDB on Cassandra
● AsyncCassandra - Implementation of AsyncHBase’s
API for drop-in use
● Wraps Netflix’s Astyanax for asynchronous calls
● Requires the ByteOrderedPartitioner and legacy
API
● Same schema as HBase/Bigtable
● Scan filtering performed client side
● May not work with future Cassandra versions
if they drop the API
Community
Salesforce Argus
●Time series monitoring
and alerting
●Multi-series annotations
●Dashboards
Thanks to Tom Valine and the Salesforce engineers
https://medium.com/salesforce-open-source/argus-time-series-monitoring-and-
alerting-d2941f67864#.ez7mbo3ek
https://github.com/SalesforceEng/Argus
Community
Turn Splicer
●API to shard TSDB queries
●Locality advantage hosting
TSDs on region servers
●Query caching
Thanks to Jonathan Creasy and the Turn engineers
https://github.com/turn/splicer
The Future of OpenTSDB
The Future
Reworked query pipeline for selective ordering
of operations
Histogram support
Flexible query caching framework
Distributed queries
Greater data store abstraction
More Information
Thank you to everyone who has helped test, debug and add to OpenTSDB
2.2 and 2.3 including, but not limited to:
Kyle, Ivan, Davide, Liu, Utkarsh, Andy, Anna, Camden, Can, Carlos, Hugo, Isaih, Kevin, Ping, Jonathan
Contribute at github.com/OpenTSDB/opentsdb
Website: opentsdb.net
Documentation: opentsdb.net/docs/build/html
Mailing List: groups.google.com/group/opentsdb
Images
http://photos.jdhancock.com/photo/2013-06-04-212438-the-lonely-vacuum-of-space.html
http://en.wikipedia.org/wiki/File:Semi-automated-external-monitor-defibrillator.jpg
http://upload.wikimedia.org/wikipedia/commons/1/17/Dining_table_for_two.jpg
http://upload.wikimedia.org/wikipedia/commons/9/92/Easy_button.JPG
https://www.flickr.com/photos/verbeeldingskr8/15563333617
http://www.flickr.com/photos/ladydragonflyherworld/4845314274/
http://lego.cuusoo.com/ideas/view/96

More Related Content

What's hot (20)

PPTX
Keynote: Apache HBase at Yahoo! Scale
HBaseCon
 
PDF
hbaseconasia2017: HBase Practice At XiaoMi
HBaseCon
 
PDF
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon
 
PDF
HBaseCon2017 Transactions in HBase
HBaseCon
 
PDF
HBaseCon2017 HBase at Xiaomi
HBaseCon
 
PDF
SignalFx: Making Cassandra Perform as a Time Series Database
DataStax Academy
 
PDF
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
DataStax
 
PDF
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon
 
PDF
Kafka Summit SF 2017 - Infrastructure for Streaming Applications
confluent
 
PDF
Managing terabytes: When Postgres gets big
Selena Deckelmann
 
PDF
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
Altinity Ltd
 
PDF
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon
 
PDF
Kafka on ZFS: Better Living Through Filesystems
confluent
 
PDF
User Defined Partitioning on PlazmaDB
Kai Sasaki
 
PDF
Benchmarking Apache Samza: 1.2 million messages per sec per node
Tao Feng
 
PPTX
Logs @ OVHcloud
OVHcloud
 
PPTX
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
DataStax
 
PPTX
Writing Applications for Scylla
ScyllaDB
 
PDF
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC
 
PDF
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
DataStax
 
Keynote: Apache HBase at Yahoo! Scale
HBaseCon
 
hbaseconasia2017: HBase Practice At XiaoMi
HBaseCon
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon
 
HBaseCon2017 Transactions in HBase
HBaseCon
 
HBaseCon2017 HBase at Xiaomi
HBaseCon
 
SignalFx: Making Cassandra Perform as a Time Series Database
DataStax Academy
 
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
DataStax
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon
 
Kafka Summit SF 2017 - Infrastructure for Streaming Applications
confluent
 
Managing terabytes: When Postgres gets big
Selena Deckelmann
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
Altinity Ltd
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon
 
Kafka on ZFS: Better Living Through Filesystems
confluent
 
User Defined Partitioning on PlazmaDB
Kai Sasaki
 
Benchmarking Apache Samza: 1.2 million messages per sec per node
Tao Feng
 
Logs @ OVHcloud
OVHcloud
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
DataStax
 
Writing Applications for Scylla
ScyllaDB
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC
 
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
DataStax
 

Viewers also liked (20)

PPTX
Apache HBase at Airbnb
HBaseCon
 
PDF
Improvements to Apache HBase and Its Applications in Alibaba Search
HBaseCon
 
PDF
openTSDB - Metrics for a distributed world
Oliver Hankeln
 
PDF
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon
 
PDF
Apache HBase in the Enterprise Data Hub at Cerner
HBaseCon
 
PDF
Apache HBase Improvements and Practices at Xiaomi
HBaseCon
 
PDF
Apache HBase - Just the Basics
HBaseCon
 
PPTX
Real-time HBase: Lessons from the Cloud
HBaseCon
 
PPTX
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
Cloudera, Inc.
 
PPTX
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
HBaseCon
 
PPTX
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
HBaseCon
 
PPTX
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon
 
PDF
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon
 
PDF
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
Cloudera, Inc.
 
PPTX
HBase Data Modeling and Access Patterns with Kite SDK
HBaseCon
 
PPTX
Digital Library Collection Management using HBase
HBaseCon
 
PPTX
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBaseCon
 
PDF
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWS
HBaseCon
 
PPTX
Content Identification using HBase
HBaseCon
 
PPTX
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
Cloudera, Inc.
 
Apache HBase at Airbnb
HBaseCon
 
Improvements to Apache HBase and Its Applications in Alibaba Search
HBaseCon
 
openTSDB - Metrics for a distributed world
Oliver Hankeln
 
HBaseCon 2015: Solving HBase Performance Problems with Apache HTrace
HBaseCon
 
Apache HBase in the Enterprise Data Hub at Cerner
HBaseCon
 
Apache HBase Improvements and Practices at Xiaomi
HBaseCon
 
Apache HBase - Just the Basics
HBaseCon
 
Real-time HBase: Lessons from the Cloud
HBaseCon
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
Cloudera, Inc.
 
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
HBaseCon
 
HBaseCon 2015: Blackbird Collections - In-situ Stream Processing in HBase
HBaseCon
 
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon
 
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon
 
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
Cloudera, Inc.
 
HBase Data Modeling and Access Patterns with Kite SDK
HBaseCon
 
Digital Library Collection Management using HBase
HBaseCon
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBaseCon
 
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWS
HBaseCon
 
Content Identification using HBase
HBaseCon
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
Cloudera, Inc.
 
Ad

Similar to Update on OpenTSDB and AsyncHBase (20)

PDF
OSMC 2013 | openTSDB - metrics for a distributed world
NETWAYS
 
PPTX
Need for Time series Database
Pramit Choudhary
 
PPTX
HBaseCon 2013: OpenTSDB at Box
Cloudera, Inc.
 
PPTX
Apache IOTDB: a Time Series Database for Industrial IoT
jixuan1989
 
PPTX
Apache HBase - Introduction & Use Cases
Data Con LA
 
PPTX
Monitoring MySQL with OpenTSDB
Geoffrey Anderson
 
PDF
Survey real time databases
Manuel Santos
 
PDF
Apache con 2020 use cases and optimizations of iotdb
ZhangZhengming
 
PDF
From a student to an apache committer practice of apache io tdb
jixuan1989
 
PPTX
temporal and spatial database.pptx
64837JAYAASRIK
 
PPTX
HBase in Practice
DataWorks Summit/Hadoop Summit
 
PPTX
HBase in Practice
larsgeorge
 
PDF
TimeSpaceDB
Zvi Avraham
 
PDF
The Future of Fast Databases: Lessons from a Decade of QuestDB
javier ramirez
 
PDF
Argus Production Monitoring at Salesforce
HBaseCon
 
PDF
Big Data Conference April 2015
Aaron Benz
 
PDF
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
Cloudera, Inc.
 
PDF
Intro to HBase - Lars George
JAX London
 
KEY
HBase and Hadoop at Urban Airship
dave_revell
 
PPTX
Hbasepreso 111116185419-phpapp02
Gokuldas Pillai
 
OSMC 2013 | openTSDB - metrics for a distributed world
NETWAYS
 
Need for Time series Database
Pramit Choudhary
 
HBaseCon 2013: OpenTSDB at Box
Cloudera, Inc.
 
Apache IOTDB: a Time Series Database for Industrial IoT
jixuan1989
 
Apache HBase - Introduction & Use Cases
Data Con LA
 
Monitoring MySQL with OpenTSDB
Geoffrey Anderson
 
Survey real time databases
Manuel Santos
 
Apache con 2020 use cases and optimizations of iotdb
ZhangZhengming
 
From a student to an apache committer practice of apache io tdb
jixuan1989
 
temporal and spatial database.pptx
64837JAYAASRIK
 
HBase in Practice
DataWorks Summit/Hadoop Summit
 
HBase in Practice
larsgeorge
 
TimeSpaceDB
Zvi Avraham
 
The Future of Fast Databases: Lessons from a Decade of QuestDB
javier ramirez
 
Argus Production Monitoring at Salesforce
HBaseCon
 
Big Data Conference April 2015
Aaron Benz
 
HBaseCon 2012 | Lessons learned from OpenTSDB - Benoit Sigoure, StumbleUpon
Cloudera, Inc.
 
Intro to HBase - Lars George
JAX London
 
HBase and Hadoop at Urban Airship
dave_revell
 
Hbasepreso 111116185419-phpapp02
Gokuldas Pillai
 
Ad

More from HBaseCon (20)

PDF
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
HBaseCon
 
PDF
hbaseconasia2017: HBase on Beam
HBaseCon
 
PDF
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
HBaseCon
 
PDF
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon
 
PDF
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
HBaseCon
 
PDF
hbaseconasia2017: Apache HBase at Netease
HBaseCon
 
PDF
hbaseconasia2017: HBase在Hulu的使用和实践
HBaseCon
 
PDF
hbaseconasia2017: 基于HBase的企业级大数据平台
HBaseCon
 
PDF
hbaseconasia2017: HBase at JD.com
HBaseCon
 
PDF
hbaseconasia2017: Large scale data near-line loading method and architecture
HBaseCon
 
PDF
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
HBaseCon
 
PDF
hbaseconasia2017: hbase-2.0.0
HBaseCon
 
PDF
HBaseCon2017 Democratizing HBase
HBaseCon
 
PDF
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon
 
PDF
HBaseCon2017 Highly-Available HBase
HBaseCon
 
PDF
HBaseCon2017 Apache HBase at Didi
HBaseCon
 
PDF
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon
 
PDF
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon
 
PDF
HBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon
 
PDF
HBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon
 
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
HBaseCon
 
hbaseconasia2017: HBase on Beam
HBaseCon
 
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
HBaseCon
 
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
HBaseCon
 
hbaseconasia2017: Apache HBase at Netease
HBaseCon
 
hbaseconasia2017: HBase在Hulu的使用和实践
HBaseCon
 
hbaseconasia2017: 基于HBase的企业级大数据平台
HBaseCon
 
hbaseconasia2017: HBase at JD.com
HBaseCon
 
hbaseconasia2017: Large scale data near-line loading method and architecture
HBaseCon
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
HBaseCon
 
hbaseconasia2017: hbase-2.0.0
HBaseCon
 
HBaseCon2017 Democratizing HBase
HBaseCon
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon
 
HBaseCon2017 Highly-Available HBase
HBaseCon
 
HBaseCon2017 Apache HBase at Didi
HBaseCon
 
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon
 
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon
 
HBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon
 
HBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon
 

Recently uploaded (20)

PDF
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
PDF
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PPTX
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
PPTX
Human Resources Information System (HRIS)
Amity University, Patna
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
PDF
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PDF
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
PDF
Online Queue Management System for Public Service Offices in Nepal [Focused i...
Rishab Acharya
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PDF
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
PDF
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
Human Resources Information System (HRIS)
Amity University, Patna
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
Online Queue Management System for Public Service Offices in Nepal [Focused i...
Rishab Acharya
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 

Update on OpenTSDB and AsyncHBase

  • 1. HBaseCon Update Distributed, Scalable Time Series Database Chris Larsen [email protected]
  • 2. Who Am I? (no really, who am I?) Chris Larsen Maintainer for OpenTSDB Software Engineer @ Yahoo! Monitoring Team
  • 3. What Is OpenTSDB? Open Source Time Series Database Store trillions of data points Sucks up all data and keeps going Never lose precision Scales using HBase, Cassandra Or Bigtable
  • 4. What good is it? Systems Monitoring & Measurement Servers Networks Sensor Data The Internet of Things SCADA Financial Data Scientific Experiment Results
  • 5. Use Cases Backing store for Argus: Open source monitoring and alerting system 15 HBase Servers 6 month retention 10M writes per minute 95p query latency < 30 days = 200ms Moving to 200 node cluster writing at 100M/m
  • 6. Use Cases ●Monitoring system, network and application performance and statistics 110 region servers, 10M writes/s ~ 2PB Multi-tenant and Kerberos secure HBase ~200k writes per second per TSD Central monitoring for all Yahoo properties Over 2 billion time series served
  • 8. What Are Time Series? Time Series: data points for an identity over time Typical Identity: Dotted string: web01.sys.cpu.user.0 OpenTSDB Identity: Metric: sys.cpu.user Tags (name/value pairs): host=web01 cpu=0
  • 9. What Are Time Series? Data Point: Metric + Tags + Value: 42 + Timestamp: 1234567890 sys.cpu.user 1234567890 42 host=web01 cpu=0 ^ a data point ^
  • 11. Writing Data 1) Open Telnet style socket, write: put sys.cpu.user 1234567890 42 host=web01 cpu=0 2) ..or, post JSON to: http://<host>:<port>/api/put 3) .. or import big files with CLI No schema definition No RRD file creation Just write!
  • 12. Querying Data Graph with the GUI CLI tools HTTP API Aggregate multiple series Simple query language To average all CPUs on host: start=1h-ago avg sys.cpu.user host=web01
  • 13. HBase Data Tables tsdb - Data point table. Massive tsdb-uid - Name to UID and UID to name mappings tsdb-meta - Time series index and meta-data tsdb-tree - Config and index for hierarchical naming schema
  • 14. Data Table Schema Row key is a concatenation of UIDs and time: metric + timestamp + tagk1 + tagv1… + tagkN + tagvN sys.cpu.user 1234567890 42 host=web01 cpu=0 x00x00x01x49x95xFBx70x00x00x01x00x00x01x00x00x02x00x00x02 Timestamp normalized on 1 hour boundaries All data points for an hour are stored in one row Enables fast scans of all time series for a metric …or pass a row key filter for specific time series with particular tags
  • 15. New for OpenTSDB 2.2 ● Append writes (no more need for TSD Compactions) ● Row salting and random metric IDs ● Downsampling Fill Policies ● Query filters (wildcard, regex, group by or not) ● Storage Exception plugin for retrying writes ● Released February 2016
  • 16. New for OpenTSDB 2.3 ● Graphite style expressions ● Cross-metric expressions ● Calendar based downsampling ● New data stores ● UID assignment plugin interface ● Datapoint write filter plugin interface ● RC1 released May 2016 ● New Committer, Jonathan Creasy
  • 17. Fuzzy Row Filter How do you find a single time series out of 1 million? For a day? For a month?
  • 18. Fuzzy Row Filter Instead of running a regex string comparator over each byte array formatted key… (?s)^.{9}(?:.{8})*Qx00x00x00x02 E(?:Q)x00x0F‡x42x2BE)(?:.{8})*$ TSDB query takes 1.6 seconds for 89,726 rows KEY Match -> m t1 tagk1 tagv1 No Match -> m t1 tagk1 tagv2 No Match -> m t1 tagk1 tagv1 tagk2 tagv3 No Match -> m t1 tagk1 tagv2 tagk2 tagv4 No Match -> m t1 tagk3 tagv5 No Match -> m t1 tagk3 tagv6 Match -> m t2 tagk tagv1 No Match -> m t2 tagk tagv2
  • 19. Fuzzy Row Filter Use a byte mask! ● Use the bloom filter to skip-scan to the next candidate row. ● Combine with regex (after fuzzy filter) to filter further. FuzzyFilter{[FuzzyFilterPair{row_key=[18, 68, -3, -82, 120, 87, 56, -15, 96, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0], mask=[0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1]}]} Now it takes 0.239 seconds KEY Match -> m t1 tagk1 tagv1 No Match -> m t1 tagk1 tagv2 Skip -> m t1 tagk1 tagv1 tagk2 tagv3 m t1 tagk1 tagv2 tagk2 tagv4 m t1 tagk3 tagv5 m t1 tagk3 tagv6 Match -> m t2 tagk tagv1 No Match -> m t2 tagk tagv2
  • 20. Fuzzy Row Filter Pros: ● Can improve scan latency by orders of magnitude ● Combines nicely with other filters Cons: ● All row keys for the match have to be the same, fixed length ● Doesn’t help much when matching the majority of a set OR if a set has uniform key lengths ● Doesn’t support bitmasks, only byte masks
  • 21. AsyncHBase AsyncHBase is a fully asynchronous, multi- threaded HBase client Supports HBase 0.90 to 1.x Faster and less resource intensive than the native HBase client Support for scanner filters, META prefetch, “fail-fast” RPCs
  • 22. AsyncHBase in YCSB ● New Yahoo! Cloud Serving Benchmark (YCSB) module for testing AsyncHBase ● Test Params: ○ 1 YCSB worker thread with workload A for run and load ○ Ran consecutive Async -> HBase -> Async -> HBase… (new YCSB JVM each run) 50 times ○ HBase 1.0.0 stock Apache with default configs ○ Local host, Macbook Pro ○ 10K rows written/read ○ Async writes for both
  • 23. AsyncHBase in YCSB HBase Client Threads: 238 AsyncHBase Client Threads: 22
  • 26. Upcoming in 1.8 ●Reverse Scanning ●Multi-Get requests ●Netty 4 ●Lots of bug fixes ○Stuck NSRE bugs ○Region client resource leaks
  • 27. OpenTSDB on Bigtable ● Bigtable ○Hosted Google Service ○Client uses HTTP2 and GRPC for communication ● OpenTSDB heads home ○Based on a time series store on Bigtable at Google ○Identical schema as HBase ○Same filter support (fuzzy filters are coming)
  • 28. OpenTSDB on Bigtable ● AsyncBigtable ○Implementation of AsyncHBase’s API for drop-in use ○https://github.com/OpenTSDB/asyncbigtable ○Uses HTable API ○Moving to native Bigtable API ● Thanks to Christos of Pythian, Solomon, Carter, Misha, and the rest of the Google Bigtable team ● https://www.pythian.com/blog/run-opentsdb-google- bigtable/#
  • 29. OpenTSDB on Cassandra ● AsyncCassandra - Implementation of AsyncHBase’s API for drop-in use ● Wraps Netflix’s Astyanax for asynchronous calls ● Requires the ByteOrderedPartitioner and legacy API ● Same schema as HBase/Bigtable ● Scan filtering performed client side ● May not work with future Cassandra versions if they drop the API
  • 30. Community Salesforce Argus ●Time series monitoring and alerting ●Multi-series annotations ●Dashboards Thanks to Tom Valine and the Salesforce engineers https://medium.com/salesforce-open-source/argus-time-series-monitoring-and- alerting-d2941f67864#.ez7mbo3ek https://github.com/SalesforceEng/Argus
  • 31. Community Turn Splicer ●API to shard TSDB queries ●Locality advantage hosting TSDs on region servers ●Query caching Thanks to Jonathan Creasy and the Turn engineers https://github.com/turn/splicer
  • 32. The Future of OpenTSDB
  • 33. The Future Reworked query pipeline for selective ordering of operations Histogram support Flexible query caching framework Distributed queries Greater data store abstraction
  • 34. More Information Thank you to everyone who has helped test, debug and add to OpenTSDB 2.2 and 2.3 including, but not limited to: Kyle, Ivan, Davide, Liu, Utkarsh, Andy, Anna, Camden, Can, Carlos, Hugo, Isaih, Kevin, Ping, Jonathan Contribute at github.com/OpenTSDB/opentsdb Website: opentsdb.net Documentation: opentsdb.net/docs/build/html Mailing List: groups.google.com/group/opentsdb Images http://photos.jdhancock.com/photo/2013-06-04-212438-the-lonely-vacuum-of-space.html http://en.wikipedia.org/wiki/File:Semi-automated-external-monitor-defibrillator.jpg http://upload.wikimedia.org/wikipedia/commons/1/17/Dining_table_for_two.jpg http://upload.wikimedia.org/wikipedia/commons/9/92/Easy_button.JPG https://www.flickr.com/photos/verbeeldingskr8/15563333617 http://www.flickr.com/photos/ladydragonflyherworld/4845314274/ http://lego.cuusoo.com/ideas/view/96