SlideShare a Scribd company logo
®
®
OSACon 2022
Switching Jaeger Distributed Tracing Storage
to ClickHouse for Advanced Performance
Monitoring (APM)
Satbir Chahal, OpsVerse
Open Source Analytics Conference 2022
®
®
Purpose
Why and how OpsVerse migrated Jaeger storage to ClickHouse
OSACon 2022
“Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse
®
®
What is Jaeger?
● A distributed tracing system open-sourced
by Uber in 2015, now part of CNCF
● Instrumented apps achieve visibility of a
request’s traversal of the system by
propagating trace context between services
● Each of these operations (e.g., svc-to-svc
request) is known as a span - a collection of
spans compose a trace
OSACon 2022
“Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse
Image courtesy of jaegertracing.io
®
®
Jaeger Architecture
● Jaeger has native support for
Elasticsearch and Cassandra as
storage
● Later, gRPC storage plugin support
was added (more on this soon)
● Leaned toward Cassandra due to
operational overhead of
maintaining ES clusters (resources
and manual TTLs)
● Cassandra was okay… until we
wanted to do more
OSACon 2022
“Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse
* Image courtesy of jaegertracing.io
** In our case, jaeger-clients are replaced by OpenTelemetry; an OpenTelemetry Collector is between the
Application and the Jaeger Collector.
®
®
Cassandra
Pros
● Scalable with near-zero maintenance
● Fast writes - Kafka consumer group
lags were also near-zero
● General advantages (not pertaining to
traces per se):
○ Fault tolerance
○ Replication
OSACon 2022
“Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse
Cons
● No JOINs
○ Services index one table
○ Operations index in one table
● Bulk querying load can be expensive
● Minimal aggregate functions
So advanced analytics on the data will require a
custom process
®
®
Why ClickHouse (Data)
● Great data compression and batch ingestion
● Some existing familiarity:
○ Our team was exposed to ClickHouse when we ran another open
source tool using it as a storage backend (PostHog)
○ Near SQL compatibility (for querying)
● Rich set of functions:
○ Aggregate (topk, stdev, etc)
○ JSON, String, Array, Maps, etc
● Its weaknesses are tolerable for distributed tracing:
○ Data consistency isn’t guaranteed
○ Delete/updates are slow (batch operations)
OSACon 2022
“Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse
®
®
Why ClickHouse (Kubernetes)
Some key pre-requisites met:
● Jaeger adds support for gRPC storage
plugins
○ Paves way for community to
support (virtually) any database
○ Plugin implements the storage
gRPC protobuf interfaces
● Community releases
https://github.com/jaegertracing/jaeg
er-clickhouse/ (mid 2021)
OSACon 2022
“Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse
So… we need to run a ClickHouse instance (to connect
plugin):
● The Kubernetes Operator for ClickHouse (by
Altinity) grows in popularity
https://github.com/Altinity/clickhouse-operator
● Plenty of community content (blogs, videos,
webinars, meetups) allow team to run and PoC
quickly
● Becomes the preferred method of running
ClickHouse on Kubernetes
®
®
The Steps
● Swap out Bitnami Cassandra helm chart for the K8s
ClickHouse Operator (CHOP)
● Add a ClickHouseInstallation CR template
(for CHOP to reconcile) to stack deployment charts
● Make sure the ClickHouse gRPC plugin for Jaeger binary
makes it onto the container images that connect to the
database:
○ Jaeger Ingester
○ Jaeger Querier
● Update Jaeger helm chart to specify gRPC storage rather
than Cassandra
OSACon 2022
“Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse
®
®
Some Gotchas (and Best Practices)
● Get familiar with ClickHouseInstallation Custom Resource spec:
○ It will create a LoadBalancer Service (possibly public) by default (update CR spec
templates.serviceTemplates)
○ No “readonly” user by default (update CR spec configuration.users)
○ Similar for default user (limit network CIDRs), retention periods, disk space and
resource request/limits
OSACon 2022
“Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse
®
®
Some Gotchas (and Best Practices) … cont’d
● Resource management conflicts with GitOps tools (e.g., ArgoCD)
○ ArgoCD may think it is managing a CHOP-managed resource (like a PVC)
○ This can lead to unintended Terminating state of resource
○ Update ClickHouseInstallation template metadata annotations with:
■ argocd.argoproj.io/compare-options: IgnoreExtraneous
■ argocd.argoproj.io/sync-options: Prune=false
○ Also, can set CHOP config to exclude the label propagation your specific tool depends on
OSACon 2022
“Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse
®
®
Sleep Easy at Night
● Enable Backups
○ https://github.com/AlexAkulov/clickhouse-backup as a K8s CronJob
○ [In Progress] ClickHouseBackup as a CR:
https://github.com/Altinity/clickhouse-operator/issues/862
● Enable metrics and alerts:
○ /metrics endpoint serves by default (have Prometheus scrape it)
OSACon 2022
“Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse
®
®
Begin Querying
● Add the ClickHouse instance as a
data source to your favorite data
visualization tool (Grafana, Apache
Superset, etc)
● Rich functions and parsers allow
for some query magic
● Use Materialized Columns (or
Views) to create additional columns
from existing data - for even faster
querying
OSACon 2022
“Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse
®
®
OpsVerse APM Using Jaeger+ClickHouse
OSACon 2022
“Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse
®
®
Q&A
OSACon 2022
“Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse
Thank you!
@Satbir Chahal | @OpsVerse
Community Slack tiny.cc/slck
https://opsverse.io

More Related Content

What's hot (20)

PPTX
Hadoop REST API Security with Apache Knox Gateway
DataWorks Summit
 
PPTX
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
DataWorks Summit
 
PPTX
Scaling Data Quality @ Netflix
Michelle Ufford
 
PDF
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
PDF
SparkSQL: A Compiler from Queries to RDDs
Databricks
 
PDF
Intro to open source observability with grafana, prometheus, loki, and tempo(...
LibbySchulze
 
PPTX
Best practices and lessons learnt from Running Apache NiFi at Renault
DataWorks Summit
 
PDF
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Databricks
 
PDF
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Seunghyun Lee
 
PDF
Decentralized Identifier (DIDs) fundamentals deep dive
SSIMeetup
 
PDF
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
VictoriaMetrics
 
PDF
NATS Streaming - an alternative to Apache Kafka?
Anton Zadorozhniy
 
PDF
The Zen of High Performance Messaging with NATS
NATS
 
PPTX
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
 
PPTX
Spark streaming
Whiteklay
 
PDF
Building Data Lakes with Apache Airflow
Gary Stafford
 
PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
PDF
KubeConEU - NATS Deep Dive
wallyqs
 
PPTX
Introduction to Time Series Analytics with Microsoft Azure
Codit
 
PPTX
Azure redis cache
Shahriar Hossain
 
Hadoop REST API Security with Apache Knox Gateway
DataWorks Summit
 
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
DataWorks Summit
 
Scaling Data Quality @ Netflix
Michelle Ufford
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
SparkSQL: A Compiler from Queries to RDDs
Databricks
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
LibbySchulze
 
Best practices and lessons learnt from Running Apache NiFi at Renault
DataWorks Summit
 
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Databricks
 
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Seunghyun Lee
 
Decentralized Identifier (DIDs) fundamentals deep dive
SSIMeetup
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
VictoriaMetrics
 
NATS Streaming - an alternative to Apache Kafka?
Anton Zadorozhniy
 
The Zen of High Performance Messaging with NATS
NATS
 
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
 
Spark streaming
Whiteklay
 
Building Data Lakes with Apache Airflow
Gary Stafford
 
DW Migration Webinar-March 2022.pptx
Databricks
 
KubeConEU - NATS Deep Dive
wallyqs
 
Introduction to Time Series Analytics with Microsoft Azure
Codit
 
Azure redis cache
Shahriar Hossain
 

Similar to OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable Advanced Performance Management- Satbir Chahal (20)

PDF
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
NETWAYS
 
PPTX
xPatterns on Spark, Shark, Mesos, Tachyon
Claudiu Barbura
 
PDF
Intro to Kubernetes & GitOps Workshop
Weaveworks
 
PDF
Benchmarking for postgresql workloads in kubernetes
DoKC
 
PDF
Intro to creating kubernetes operators
Juraj Hantak
 
PDF
Where is my cache architectural patterns for caching microservices by example
Rafał Leszko
 
PDF
Kubecon seattle 2018 workshop slides
Weaveworks
 
PDF
Sprint 45 review
ManageIQ
 
PDF
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
ArangoDB Database
 
PDF
Implementing Progressive Delivery with Your Team (by Leigh Capili)
Weaveworks
 
PDF
GraphQL across the stack: How everything fits together
Sashko Stubailo
 
PDF
Sprint 46 review
ManageIQ
 
PDF
Apache Provisionr (incubating) - Bucharest JUG 10
Andrei Savu
 
PDF
Free GitOps Workshop + Intro to Kubernetes & GitOps
Weaveworks
 
PDF
Spring Boot to Quarkus: A real app migration experience | DevNation Tech Talk
Red Hat Developers
 
PDF
Improving Apache Spark Downscaling
Databricks
 
PDF
Openshift service broker and catalog ocp-meetup july 2018
Michael Calizo
 
PPTX
Introduction to Container Storage Interface (CSI)
Idan Atias
 
PPTX
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Cloudera, Inc.
 
PDF
Creating pools of Virtual Machines - ApacheCon NA 2013
Andrei Savu
 
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
NETWAYS
 
xPatterns on Spark, Shark, Mesos, Tachyon
Claudiu Barbura
 
Intro to Kubernetes & GitOps Workshop
Weaveworks
 
Benchmarking for postgresql workloads in kubernetes
DoKC
 
Intro to creating kubernetes operators
Juraj Hantak
 
Where is my cache architectural patterns for caching microservices by example
Rafał Leszko
 
Kubecon seattle 2018 workshop slides
Weaveworks
 
Sprint 45 review
ManageIQ
 
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
ArangoDB Database
 
Implementing Progressive Delivery with Your Team (by Leigh Capili)
Weaveworks
 
GraphQL across the stack: How everything fits together
Sashko Stubailo
 
Sprint 46 review
ManageIQ
 
Apache Provisionr (incubating) - Bucharest JUG 10
Andrei Savu
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Weaveworks
 
Spring Boot to Quarkus: A real app migration experience | DevNation Tech Talk
Red Hat Developers
 
Improving Apache Spark Downscaling
Databricks
 
Openshift service broker and catalog ocp-meetup july 2018
Michael Calizo
 
Introduction to Container Storage Interface (CSI)
Idan Atias
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Cloudera, Inc.
 
Creating pools of Virtual Machines - ApacheCon NA 2013
Andrei Savu
 
Ad

More from Altinity Ltd (20)

PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Altinity Ltd
 
PDF
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Altinity Ltd
 
PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Altinity Ltd
 
PDF
Fun with ClickHouse Window Functions-2021-08-19.pdf
Altinity Ltd
 
PDF
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Altinity Ltd
 
PDF
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Altinity Ltd
 
PDF
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Altinity Ltd
 
PDF
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Altinity Ltd
 
PDF
ClickHouse ReplacingMergeTree in Telecom Apps
Altinity Ltd
 
PDF
Adventures with the ClickHouse ReplacingMergeTree Engine
Altinity Ltd
 
PDF
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Ltd
 
PDF
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Ltd
 
PDF
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
Altinity Ltd
 
PDF
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
Altinity Ltd
 
PDF
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
Altinity Ltd
 
PDF
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
Altinity Ltd
 
PDF
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
Altinity Ltd
 
PDF
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
Altinity Ltd
 
PDF
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
Altinity Ltd
 
PDF
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
Altinity Ltd
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Altinity Ltd
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Altinity Ltd
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Altinity Ltd
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Altinity Ltd
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Altinity Ltd
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Altinity Ltd
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Altinity Ltd
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Altinity Ltd
 
ClickHouse ReplacingMergeTree in Telecom Apps
Altinity Ltd
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Altinity Ltd
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Ltd
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Ltd
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
Altinity Ltd
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
Altinity Ltd
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
Altinity Ltd
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
Altinity Ltd
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
Altinity Ltd
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
Altinity Ltd
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
Altinity Ltd
 
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
Altinity Ltd
 
Ad

Recently uploaded (20)

PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 

OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable Advanced Performance Management- Satbir Chahal

  • 1. ® ® OSACon 2022 Switching Jaeger Distributed Tracing Storage to ClickHouse for Advanced Performance Monitoring (APM) Satbir Chahal, OpsVerse Open Source Analytics Conference 2022
  • 2. ® ® Purpose Why and how OpsVerse migrated Jaeger storage to ClickHouse OSACon 2022 “Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse
  • 3. ® ® What is Jaeger? ● A distributed tracing system open-sourced by Uber in 2015, now part of CNCF ● Instrumented apps achieve visibility of a request’s traversal of the system by propagating trace context between services ● Each of these operations (e.g., svc-to-svc request) is known as a span - a collection of spans compose a trace OSACon 2022 “Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse Image courtesy of jaegertracing.io
  • 4. ® ® Jaeger Architecture ● Jaeger has native support for Elasticsearch and Cassandra as storage ● Later, gRPC storage plugin support was added (more on this soon) ● Leaned toward Cassandra due to operational overhead of maintaining ES clusters (resources and manual TTLs) ● Cassandra was okay… until we wanted to do more OSACon 2022 “Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse * Image courtesy of jaegertracing.io ** In our case, jaeger-clients are replaced by OpenTelemetry; an OpenTelemetry Collector is between the Application and the Jaeger Collector.
  • 5. ® ® Cassandra Pros ● Scalable with near-zero maintenance ● Fast writes - Kafka consumer group lags were also near-zero ● General advantages (not pertaining to traces per se): ○ Fault tolerance ○ Replication OSACon 2022 “Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse Cons ● No JOINs ○ Services index one table ○ Operations index in one table ● Bulk querying load can be expensive ● Minimal aggregate functions So advanced analytics on the data will require a custom process
  • 6. ® ® Why ClickHouse (Data) ● Great data compression and batch ingestion ● Some existing familiarity: ○ Our team was exposed to ClickHouse when we ran another open source tool using it as a storage backend (PostHog) ○ Near SQL compatibility (for querying) ● Rich set of functions: ○ Aggregate (topk, stdev, etc) ○ JSON, String, Array, Maps, etc ● Its weaknesses are tolerable for distributed tracing: ○ Data consistency isn’t guaranteed ○ Delete/updates are slow (batch operations) OSACon 2022 “Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse
  • 7. ® ® Why ClickHouse (Kubernetes) Some key pre-requisites met: ● Jaeger adds support for gRPC storage plugins ○ Paves way for community to support (virtually) any database ○ Plugin implements the storage gRPC protobuf interfaces ● Community releases https://github.com/jaegertracing/jaeg er-clickhouse/ (mid 2021) OSACon 2022 “Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse So… we need to run a ClickHouse instance (to connect plugin): ● The Kubernetes Operator for ClickHouse (by Altinity) grows in popularity https://github.com/Altinity/clickhouse-operator ● Plenty of community content (blogs, videos, webinars, meetups) allow team to run and PoC quickly ● Becomes the preferred method of running ClickHouse on Kubernetes
  • 8. ® ® The Steps ● Swap out Bitnami Cassandra helm chart for the K8s ClickHouse Operator (CHOP) ● Add a ClickHouseInstallation CR template (for CHOP to reconcile) to stack deployment charts ● Make sure the ClickHouse gRPC plugin for Jaeger binary makes it onto the container images that connect to the database: ○ Jaeger Ingester ○ Jaeger Querier ● Update Jaeger helm chart to specify gRPC storage rather than Cassandra OSACon 2022 “Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse
  • 9. ® ® Some Gotchas (and Best Practices) ● Get familiar with ClickHouseInstallation Custom Resource spec: ○ It will create a LoadBalancer Service (possibly public) by default (update CR spec templates.serviceTemplates) ○ No “readonly” user by default (update CR spec configuration.users) ○ Similar for default user (limit network CIDRs), retention periods, disk space and resource request/limits OSACon 2022 “Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse
  • 10. ® ® Some Gotchas (and Best Practices) … cont’d ● Resource management conflicts with GitOps tools (e.g., ArgoCD) ○ ArgoCD may think it is managing a CHOP-managed resource (like a PVC) ○ This can lead to unintended Terminating state of resource ○ Update ClickHouseInstallation template metadata annotations with: ■ argocd.argoproj.io/compare-options: IgnoreExtraneous ■ argocd.argoproj.io/sync-options: Prune=false ○ Also, can set CHOP config to exclude the label propagation your specific tool depends on OSACon 2022 “Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse
  • 11. ® ® Sleep Easy at Night ● Enable Backups ○ https://github.com/AlexAkulov/clickhouse-backup as a K8s CronJob ○ [In Progress] ClickHouseBackup as a CR: https://github.com/Altinity/clickhouse-operator/issues/862 ● Enable metrics and alerts: ○ /metrics endpoint serves by default (have Prometheus scrape it) OSACon 2022 “Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse
  • 12. ® ® Begin Querying ● Add the ClickHouse instance as a data source to your favorite data visualization tool (Grafana, Apache Superset, etc) ● Rich functions and parsers allow for some query magic ● Use Materialized Columns (or Views) to create additional columns from existing data - for even faster querying OSACon 2022 “Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse
  • 13. ® ® OpsVerse APM Using Jaeger+ClickHouse OSACon 2022 “Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse
  • 14. ® ® Q&A OSACon 2022 “Switching Jaeger Storage to ClickHouse to Enable APM Insights” Satbir Chahal, OpsVerse Thank you! @Satbir Chahal | @OpsVerse Community Slack tiny.cc/slck https://opsverse.io