Kapacitor - Real Time Data Processing Engine

Kapacitor
-Real TIme Data Processing Engine

Agenda
● Kapacitor introduction
● Integration and Installation of Kapacitor in TICK Stack
● TICK Script
● CQ/DownSampling
● Join Node
● User defined function in TICK Script
● Enriching Data with Kapacitor
● Anomaly Detection using Kapacitor

TICK Stack
How Kapacitor fits ?

Kapacitor
● Kapacitor is a native data processing engine.
● It can process both stream and batch data from InfluxDB.
● It lets you plug in your own custom logic or user-defined
functions to process alerts with dynamic thresholds.
● Key Kapacitor Capabilities
○ Alerting
○ ETL (Extraction, Transformation and Loading)
○ Action Oriented
○ Streaming Analytics
○ Anomaly Detection

Installing TICK Stack
#Installing InfluxDB
wget https://dl.influxdata.com/influxdb/releases/influxdb_1.5.2_amd64.deb
sudo dpkg -i influxdb_1.5.2_amd64.deb
#Installing Telegraf
wget https://dl.influxdata.com/telegraf/releases/telegraf_1.6.1-1_amd64.deb
sudo dpkg -i telegraf_1.6.1-1_amd64.deb
#Installing Cronograf
wget https://dl.influxdata.com/chronograf/releases/chronograf_1.4.4.1_amd64.deb
sudo dpkg -i chronograf_1.4.4.1_amd64.deb
#Installing Kapacitor
wget https://dl.influxdata.com/kapacitor/releases/kapacitor_1.4.1_amd64.deb
sudo dpkg -i kapacitor_1.4.1_amd64.deb
Source: https://portal.influxdata.com/downloads

Kapacitor Components
● Server Daemon (Kapacitord)
● CLI (Kapacitor)
○ Call HTTP API
○ Non Interactive
● Tasks (Unit of work)
○ Defined by TICK Script
○ Stream or Batch
○ DAG Pipeline
● Recordings
○ Useful for isolated testing
● Replay
○ Useful for isolated testing

TICK Script
● Kapacitor uses a DSL (Domain Specific Language) called TICKscript to define tasks.
● Each TICKscript defines a pipeline that tells Kapacitor which data to process and how.
● Pipeline is a Directed Acyclic Graph (DAG)
● Components:
○ Statements
○ Variables
○ Comments
○ Literals
■ Boolean - True and False
■ Numbers - int or float
■ Strings
■ Duration - 1u, 10ms,
Source: https://docs.influxdata.com/kapacitor/v1.4/tick/syntax/
dbrp "kss"."autogen"
stream
// Select just the cpu measurement from our example database.
|from()
.measurement('cpu')
.groupBy('cpu', 'host')
|alert()
.id('{{ index .Tags "host" }}/{{ index .Tags "cpu" }}')
// Email subject
.message('{{ .ID }} is {{ .Level}} Usage Idle Value: {{ index
.Fields "usage_idle" }}')
//Email body as HTML
.details('''
<h1>{{ .ID }}</h1>
<b>{{ .Message }}</b><br>
Usage Idle Value: {{ index .Fields "usage_idle" }}
''')
.crit(lambda: int("usage_idle") < 100)
// Whenever we get an alert write it to a file.
.log('/tmp/alerts.log')
// Whenever we get an alert send a mail.
.email('prashant.vats@tothenew.com')

TICKscript nodes overview
● Nodes represent process invocation units that either take data as a batch or a point-by-point stream,
and then alter the data, store the data, or trigger some other activity based on changes in the data (e.g.,
an alert).
● The property methods for these two nodes define the type of task that you are running, either stream or
batch.
● Available Nodes
○ https://docs.influxdata.com/kapacitor/v1.4/nodes/
● TICK Script Specification
○ https://docs.influxdata.com/kapacitor/v1.4/reference/spec/
|from()
|alert()
.id('{{ index .Tags "host" }}')
.exec('script/handler.py')
|watch()
|eval()

Lambda Expression
TICK Script uses Lambda Expression to define transformation on data points as well as define
boolean condition that act as filter.
.WHERE Expression
.where(lambda:”host” == `server1`)
Built in Expression
.where(lambda: sqrt(“value”) < 5)
EVAL Node
|eval(lambda: if("field" > threshold AND "field" != 0, 'true', 'false'))
.as('value')

Example
Write a tick script which stream the measurement “cpu” from kss database of
influxdb and generate an alert.
Kapacitor CLI Hands-On

Single Versus Double Quotes
var data = stream
|from()
.database('telegraf')
.retentionPolicy('autogen')
.measurement('cpu')
// NOTE: Double quotes on server1
.where(lambda: "host" == "server1")
var data = stream
|from()
.measurement('cpu')
// NOTE: Single quotes on server1
.where(lambda: "host" == 'server1')
The result of this search will always be
empty, because double quotes were used
around “server1”

Template Task
1. Write generic TICK Script.
2. Define Template.
3. Write JSON variable file.
4. Define Task using template and variable JSON File.
5. YAML Definition

DownSampling of Data
● Continuous Query (CQ)
○ Continuous queries are created on a database
○ database admins are allowed to create continuous queries.
○ Downsampling is one of the primary use case of Downsampling
○ Continuous queries are not applied to historical data.
● Using Kapactior task instead of CQ
CREATE CONTINUOUS
QUERY cpu_idle_mean ON
telegraf BEGIN
SELECT mean("usage_idle")
as usage_idle
INTO mean_cpu_idle
FROM cpu
GROUP BY time(5m),*
END
stream
|from()
.measurement('cpu')
.groupBy(*)
|window()
.period(5m)
.every(5m)
|mean('usage_idle')
.as('usage_idle')
|influxDBOut()
.database('telegratelegraff')
.retentionPolicy(‘one_year’)
.measurement('mean_cpu_idle')
.precision('s')
Stream
Task
batch
|query('SELECT mean(usage_idle)
as usage_idle FROM
"telegraf"."default".cpu')
.period(5m)
.every(5m)
.groupBy(*)
|influxDBOut()
.retentionPolicy(‘one_year’)
.measurement('mean_cpu_idle')
.precision('s')
Batch
Task

CQ Vs Batch Vs Stream
● When should we use Kapacitor instead of CQs?
○ You want to perform more action rather than just downsampling.
○ To isolate the workload from InfuxDB
○ But if we have handful CQs and just performing downsampling for retention policies, there is no
need to add kapacitor in your infrastructure.
● When should we use stream tasks vs batch tasks in Kapacitor?
○ RAM and time period are two major factor which decide it
○ A stream task will have to keep all data in RAM for the specified period.
○ If this period is too long for the available RAM then you will first need to store the data in
InfluxDB and then query using a batch task.
○ If you have some timestamp constraint and need real time processing then use stream task
○ A stream task does have one slight advantage in that since its watching the stream of data it
understands time by the timestamps on the data, while batch do this by query.
○ As such there are no race conditions for whether a given point will make it into a window or not.
If you are using a batch task it is still possible for a point to arrive late and be missed in a
window.
○ Data will be sequential in stream on the basis of time, in batch not necessarily.

Join Node
● It is one the TICK script Node which Join this node with other nodes. The data are joined on
timestamp.
● We can define type of join and tolerance.
var errors = batch
|query('''
SELECT value
FROM
"join"."autogen".errors''')
.groupBy(*)
.period(5s)
. every(1s)
var requests = batch
|query('''
SELECT value
FROM
"join"."autogen".requests''')
.groupBy(*)
.period(5s)
.every(1s)
errors
|join(requests)
.as('errors', 'requests')
// points that are within 1 second are considered the
same time.
.tolerance(1s)
// fill missing values with 0, implies outer join.
.fill(0.0)
|eval(lambda: "errors.value" / "requests.value")
.as('error_rate')
|influxDBOut()
.database('join')
.measurement('join_wala')

User Defined Function (UDF)
● Write your own algorithm/function and plug them into kapacitor.
● Build custom function run and in its own process and kapacitor communicates to it via defined
protocol.
● As of now supported language is GO and Python.
● UDF Handler has some method which must be implemented at the functionality level.
● We will see the example using Python
Writing a UDF
● Implement a UDF handler interface
● Write a TICK script which uses the UDF
● Configure the UDF inside of Kapacitor

UDF Handler Interface
● info
○ When Kapacitor is started it will call the info method.
○ The info method is used to parse the options associated with the UDF.
○ It specifies which type of data it wants(like int, float) and provides (stream or batch)
● init
○ init is run when task containing the UDF is start executing.
○ it receives a list of specified options that are pulled from TICK script.
● begin_batch
○ Should be used if UDF wants (receives) data in batch form.
● end_batch
○ Should be used if UDF provides(send out) data in batch form.
● point
○ should be used if the UDF wants and/or provides data in stream form.
● snapshot/restore
○ is used to save and restore the state of the UDF process .
○ not necessarily needed.

...Continued
● UDF Handler:
https://github.com/influxdata/kapacitor/tree/master/udf/agent/examples/moving_avg
● TICK Script
stream
|from()
.measurement('cpu')
.where(lambda: "cpu" == 'cpu-total')
@pyavg()
.field('usage_idle')
.size(10)
.as('cpu_avg')
|influxDBOut()
.database('udf')
[udf]
[udf.functions]
[udf.functions.pyavg]
prog = "/usr/bin/python2"
args = ["-u",
"/etc/kapacitor/script/kapacitor/udf/agent/examples/moving_av
g/moving_avg.py"]
timeout = "10s"
[udf.functions.pyavg.env]
PYTHONPATH =
"/etc/kapacitor/script/kapacitor/udf/agent/py"

Enriching Your Data with Kapacitor
Problem: How do I summarize my data for the entire month of August just for business hours, defined by Monday through
Friday between 0800AM and 0500PM for a range of time.
● As per InfluxQL we are limited to : SELECT * FROM “mymeasurement” WHERE time >= ‘2017-08-01
08:00:00.000000’ and time <= ‘2017-08-31 17:00:00.000000’;
● Configure Telegraf to write to Kapacitor instead of directly to InfluxDB
[[outputs.influxdb]]
urls = [http://localhost:9092]
database = “kap_telegraf”
retention_policy = “autogen”

...Continued
stream
|from()
.database('kap_telegraf')
|eval(lambda: if(((weekday("time") >= 1 AND weekday("time") <= 5) AND (hour("time") >= 8 AND (hour("time")*100+minute("time")) <= 1700)),
'true', 'false'))
.as('business_hours')
.tags('business_hours')
.keep()
|delete()
.field('business_hours')
|influxDBOut()
.tag('kapacitor_augmented','true')
Once data begins the flow through Kapacitor to InfluxDB, you can then add your condition AND business_hours=’true’ to
the first query we specified:
SELECT * FROM “mymeasurement” WHERE time >= ‘2017-08-01 08:00:00.000000’ and time <= ‘2017-08-31
17:00:00.000000’ AND business_hours=’true’;

What can be further explored ?
● Anomaly Detection Algorithms
○ (https://docs.influxdata.com/kapacitor/v1.4/guides/anomaly_detection/)
● Kapacitar Nodes
○ EC2 AutoScale
○ K8 AutoScale
○ Docker Swarm AutoScaling
○ Service Discovery in K8
● Machine Learning using Kapacitor (Smart Alerting)

Kapacitor - Real Time Data Processing Engine

More Related Content

What's hot (20)

Similar to Kapacitor - Real Time Data Processing Engine (20)

Recently uploaded (20)

Kapacitor - Real Time Data Processing Engine