SlideShare a Scribd company logo
Data Warehouse on Kubernetes
A gentle introduction to the ClickHouse
Kubernetes Operator
Robert Hodges
Brief Intros
www.altinity.com
Leading software and services
provider for ClickHouse
Major committer and community
sponsor in US and Western Europe
Robert Hodges - Altinity CEO
30+ years on DBMS plus
virtualization and security.
ClickHouse is DBMS #20
Why run data warehouse on Kubernetes?
1. Same environment as other cloud native services
2. Portability
3. Fast deployment cycles
4. Flexible mapping to resources
...
5. It offers revolutionary capabilities for building
analytic systems
ClickHouse
Data
Warehouse
Introduction to ClickHouse
Understands SQL
Runs on bare metal to cloud
Shared nothing architecture
Stores data in columns
Parallel and vectorized execution
Scales to many petabytes
Is Open source (Apache 2.0)
a b c d
a b c d
a b c d
a b c d
And it’s really fast!
ClickHouse structure is optimized for speed
Table
Part
Index Columns
Indexed
Sorted
Compressed
Part
Index Columns
Part
ClickHouse has built-in sharding & replication
ClickHouse
event_loc
ClickHouse
event
event_loc
ClickHouse
event_loc
ClickHouse
event_loc
ClickHouse
event_loc
ClickHouse
event_loc
SELECT ...
FROM event
GROUP BY ...
Result Set
Zookeeper
ZNodes
Zookeeper
ZNodes
Zookeeper
ZNodes
What makes ClickHouse “cloud friendly?”
● Single process
● Relatively few configuration knobs
● Simple networking and storage
● Replication/high availability built in
● Already containerized!
Demo Time
ClickHouse performance
demo
Mapping
databases to
Kubernetes
Obligatory slide on Kubernetes
What does Kubernetes do for us?
● manage container-based systems
● build distributed applications declaratively
● allocate machine resources efficiently
● automate application deployment
A simple distributed data service
Load
Balancer
Service
#1
Service
#3
Service
#2
Storage
Storage
Storage
Traffic
Defined using Kubernetes resources
Pod
“svc-1”
Persistent
Volume
Service
“svc”
Stateful
Set
Persistent
Volume
Claim
Persistent
Volume
Persistent
Volume
Pod
“svc-2”
Pod
“svc-2”
Persistent
Volume
Claim
Persistent
Volume
Claim
Config
Maps
SecretsConfig
Maps
Secrets
Kubernetes NodeKubernetes NodeKubernetes Node
Mapped to proxies, containers, and storage
Container
“svc-1”
NVMe
SSD
NGINX
“svc”
Container
“svc-2”
Container
“svc-3”
NVMe
SSD
NVMe
SSD
ClickHouse
Operator
ClickHouse on Kubernetes is complex!
Zookeeper
Services
Zookeeper-0
Zookeeper-2
Zookeeper-1Shard 1 Replica 1
Replica
Service
Load
Balancer
Service
Shard 1 Replica 2
Shard 2 Replica 1
Shard 2 Replica 2
Replica
Service
Replica
Service
Replica
Service
User Config Map Common Config Map
Stateful
Set
Pod
Persistent
Volume
Claim
Persistent
Volume
Per-replica Config Map
Operators encapsulate complex deployments
kube-system namespace
ClickHouse
Operator
your-favorite namespace
Apache 2.0 source,
distributed as Docker
imageSingle specification
Best practice deployment
ClickHouse
Resource
Definition
Installing and removing ClickHouse operator
Get operator custom resource definition:
wget 
https://raw.githubusercontent.com/Altinity/clickhouse-operato
r/master/deploy/operator/clickhouse-operator-install.yaml
Install the operator:
kubectl apply -f clickhouse-operator-install.yaml
Remove the operator:
kubectl delete -f clickhouse-operator-install.yaml
You will also need Zookeeper
Simplest way is to use helm:
kubectl create namespace zk
helm install --namespace zk --name zookeeper 
incubator/zookeeper
(There’s also an operator for Zookeeper now)
Deploying data
warehouses
Clickhouse custom resource definition
defaults
Global defaults
configuration
Cluster topology,
users, zookeeper
locations, etc.
serviceTemplates
Network resources
podTemplates
Container definitions
storageClaimTemplates
Storage definitions
Stateful
Sets
Services
Pods
Persistent
Volumes
Basic data warehouse topology
apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
name: "ch01"
spec:
configuration:
clusters:
- name: replicated
layout:
shardsCount: 2
replicasCount: 2
zookeeper:
nodes:
- host: zookeeper.zk
Name used to identify all resources
Definition of cluster
Location of service we depend on
You can add users and change configuration
apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
name: "ch01"
spec:
configuration:
users:
demo/default: secret
demo/password: demo
demo/profile: default
demo/quota: default
demo/networks/ip: "::/0"
clusters:
- name: replicated
Changes take a few
minutes to propagate
Simplicity requires defaults
defaults:
templates:
volumeClaimTemplate: persistent
podTemplate: clickhouse:19.6
serviceTemplate: minikube
templates:
volumeClaimTemplates:
- name: persistent
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
Name of template
Storage misconfigurations
lead to insidious errors
Speaking of storage, we have options
● Cloud storage:
○ AWS
○ GKE
○ Other cloud providers
● Local storage
○ emptyDir
○ hostPath
○ local Complex
Network access
Simple
Fast
Use storageClassName to bind storage
Use kubectl to find available storage classes:
kubectl describe StorageClass
Bind to default storage:
spec:
storageClassName: default
Bind to gp2 type
spec:
storageClassName: gp2
Templates can be simple
defaults:
templates:
volumeClaimTemplate: persistent
podTemplate: clickhouse:19.6
serviceTemplate: minikube
templates:
podTemplates:
- name: clickhouse:19.6
spec:
containers:
- name: clickhouse-pod
image: yandex/clickhouse-server:19.6.2.11
Name of template
Most values take
defaults
templates:
podTemplates:
- name: clickhouse-in-zone-us-east-1b
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "failure-domain.beta.kubernetes.io/zone"
operator: In
values:
- "us-east-1b"
containers:
- name: clickhouse-pod
image: yandex/clickhouse-server:19.3.7
Or specify complex configuration
Set availability
zone affinity
More container
properties
Versatile mapping to different deployments
ClickHouse
Resource
Definition
Pod
Load
Balance
PodPod
Pod Pod
Load
BalanceLoad
Balance
Load
BalanceLoad
Balance
Pod Pod
Load
BalanceLoad
Balance
Pod Pod
Minikube Multi-AZ Deployment
(Differences mostly
in templates)
Changes are recognized automatically
defaults:
templates:
volumeClaimTemplate: persistent
podTemplate: clickhouse:19.11
serviceTemplate: minikube
templates:
podTemplates:
- name: clickhouse:19.11
spec:
containers:
- name: clickhouse-pod
image: yandex/clickhouse-server:19.11.3.11
Make new version
the default
Define template
for new version
Upgrade runs while service is online
Pod
chi-0-0
Update resource definition
ClickHouse
Operator
Apply Pod
chi-0-1
Pod
chi-1-1
Pod
chi-1-0
Plan
Compare resource
to actual state
Upgrade pods sequentially
ClickHouse
Resource
Definition
What’s going on inside Kubernetes?
kubectl apply
ClickHouse
Operator
Custom
Resource
Controller
ClickHouse
Resource
Definition
Kubernetes API
Events
Actions
Etcd
System state
Native
Controller
Native
Controller
Native
Controllers
Grafana
ClickHouse monitoring with prometheus
ClickHouse
Operator
(ServiceMonitor)
ClickHouse Installations
Prometheus
Demo Time
Fast data warehouse
deployment on Kubernetes
Some pros and
cons of data
warehouses in
Kubernetes
Pod
chi-0-1
Con: DNS resolution is complex/error prone
Pod
chi-1-1 Pod
chi-0-1
Pod
chi-1-0
Pod
chi-0-0
DNS DNS
DNS
Restart
Pod restart invalidates
cluster DNS mappings
Core DNS
Server
Name resolution
deadlock at startup
Must resolve
host name
to start up
Won’t resolve
host until
pod starts
Pro: Kubernetes overhead is minimal
Cluster deploy and load Query Comparison
Redshift dc2.large vs. Kubernetes EC2 r5.xlarge with EBS (st1)
Con: error handling is complicated
ClickHouse
Operator
ClickHouse
Resource
Definition
Complex
specification
Kubernetes
Storage
Provider
Asynchronous
execution
Local
semantics
Future of data
warehouses on
Kubernetes
Architectural challenge
Data warehouses are not cattle
Losing/compromising data can be really bad
Safety is paramount
Security, migration, availability require logic
above level of the operator
Biggest opportunity
Kubernetes democratizes data
warehouse access
Set up complex configurations in minutes
Run on any platform that Kubernetes runs on
Integrate easily with other services
Dashboards and predictive analytics
Most intriguing future benefit of Kubernetes
Kafka
Apps
ClickHouse
AppsContent
Delivery
Applications
Grafana
Tailored analytic solution
for every service that
needs it
Thank you!
We’re hiring!
Presenter:
rhodges@altinity.com
ClickHouse Operator:
https://github.com/Altinity/clickhouse-operator
ClickHouse:
https://github.com/yandex/ClickHouse
Altinity:
https://www.altinity.com

More Related Content

What's hot (20)

PDF
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Altinity Ltd
 
PDF
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
Altinity Ltd
 
PDF
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Altinity Ltd
 
PDF
Materialize: a platform for changing data
Altinity Ltd
 
PDF
ClickHouse on Kubernetes! By Robert Hodges, Altinity CEO
Altinity Ltd
 
PDF
Fun with click house window functions webinar slides 2021-08-19
Altinity Ltd
 
PPTX
High Performance, High Reliability Data Loading on ClickHouse
Altinity Ltd
 
PDF
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
Altinity Ltd
 
PDF
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Altinity Ltd
 
PDF
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
HostedbyConfluent
 
PDF
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Altinity Ltd
 
PDF
Postgresql database administration volume 1
Federico Campoli
 
PDF
ClickHouse materialized views - a secret weapon for high performance analytic...
Altinity Ltd
 
PDF
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
Altinity Ltd
 
PDF
A Day in the Life of a ClickHouse Query Webinar Slides
Altinity Ltd
 
PDF
Using ClickHouse for Experimentation
Gleb Kanterov
 
PDF
[Meetup] a successful migration from elastic search to clickhouse
Vianney FOUCAULT
 
PDF
Better than you think: Handling JSON data in ClickHouse
Altinity Ltd
 
PDF
ClickHouse Features for Advanced Users, by Aleksei Milovidov
Altinity Ltd
 
PDF
ClickHouse Deep Dive, by Aleksei Milovidov
Altinity Ltd
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Altinity Ltd
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
Altinity Ltd
 
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Altinity Ltd
 
Materialize: a platform for changing data
Altinity Ltd
 
ClickHouse on Kubernetes! By Robert Hodges, Altinity CEO
Altinity Ltd
 
Fun with click house window functions webinar slides 2021-08-19
Altinity Ltd
 
High Performance, High Reliability Data Loading on ClickHouse
Altinity Ltd
 
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
Altinity Ltd
 
Data Warehouses in Kubernetes Visualized: the ClickHouse Kubernetes Operator UI
Altinity Ltd
 
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
HostedbyConfluent
 
Fast Insight from Fast Data: Integrating ClickHouse and Apache Kafka
Altinity Ltd
 
Postgresql database administration volume 1
Federico Campoli
 
ClickHouse materialized views - a secret weapon for high performance analytic...
Altinity Ltd
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
Altinity Ltd
 
A Day in the Life of a ClickHouse Query Webinar Slides
Altinity Ltd
 
Using ClickHouse for Experimentation
Gleb Kanterov
 
[Meetup] a successful migration from elastic search to clickhouse
Vianney FOUCAULT
 
Better than you think: Handling JSON data in ClickHouse
Altinity Ltd
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
Altinity Ltd
 
ClickHouse Deep Dive, by Aleksei Milovidov
Altinity Ltd
 

Similar to Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert Hodges (20)

PDF
Data Con LA 2019 - Data warehouse and Kubernetes: Lessons from ClickHouse Ope...
Data Con LA
 
PDF
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Altinity Ltd
 
PDF
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Ltd
 
PDF
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Altinity Ltd
 
PPTX
K8s in 3h - Kubernetes Fundamentals Training
Piotr Perzyna
 
PDF
Five Great Ways to Lose Data on Kubernetes - KubeCon EU 2020
Altinity Ltd
 
PDF
Effective Platform Building with Kubernetes. Is K8S new Linux?
Wojciech Barczyński
 
PDF
Effective Building your Platform with Kubernetes == Keep it Simple
Wojciech Barczyński
 
PPTX
Data weekender deploying prod grade sql 2019 big data clusters
Chris Adkin
 
PDF
Kubernetes for Java Developers
Anthony Dahanne
 
PDF
Kubernetes - Starting with 1.2
William Stewart
 
PDF
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
Altinity Ltd
 
PDF
Kubernetes for java developers - Tutorial at Oracle Code One 2018
Anthony Dahanne
 
PDF
kubernetes.pdf
crezzcrezz
 
PDF
Growing up fast: Kubernetes and Real-Time Analytic Applications
DoKC
 
PDF
Introduction of kubernetes rancher
cyberblack28 Ichikawa
 
PDF
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Ltd
 
PDF
Get you Java application ready for Kubernetes !
Anthony Dahanne
 
PDF
David Steiman - Getting serious with private kubernetes clusters & cloud nati...
Codemotion
 
PDF
Deploying kubernetes at scale on OpenStack
Victor Palma
 
Data Con LA 2019 - Data warehouse and Kubernetes: Lessons from ClickHouse Ope...
Data Con LA
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Altinity Ltd
 
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Ltd
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Altinity Ltd
 
K8s in 3h - Kubernetes Fundamentals Training
Piotr Perzyna
 
Five Great Ways to Lose Data on Kubernetes - KubeCon EU 2020
Altinity Ltd
 
Effective Platform Building with Kubernetes. Is K8S new Linux?
Wojciech Barczyński
 
Effective Building your Platform with Kubernetes == Keep it Simple
Wojciech Barczyński
 
Data weekender deploying prod grade sql 2019 big data clusters
Chris Adkin
 
Kubernetes for Java Developers
Anthony Dahanne
 
Kubernetes - Starting with 1.2
William Stewart
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
Altinity Ltd
 
Kubernetes for java developers - Tutorial at Oracle Code One 2018
Anthony Dahanne
 
kubernetes.pdf
crezzcrezz
 
Growing up fast: Kubernetes and Real-Time Analytic Applications
DoKC
 
Introduction of kubernetes rancher
cyberblack28 Ichikawa
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Ltd
 
Get you Java application ready for Kubernetes !
Anthony Dahanne
 
David Steiman - Getting serious with private kubernetes clusters & cloud nati...
Codemotion
 
Deploying kubernetes at scale on OpenStack
Victor Palma
 
Ad

More from Altinity Ltd (20)

PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Altinity Ltd
 
PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Altinity Ltd
 
PDF
Fun with ClickHouse Window Functions-2021-08-19.pdf
Altinity Ltd
 
PDF
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Altinity Ltd
 
PDF
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Altinity Ltd
 
PDF
ClickHouse ReplacingMergeTree in Telecom Apps
Altinity Ltd
 
PDF
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Ltd
 
PDF
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
Altinity Ltd
 
PDF
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
Altinity Ltd
 
PDF
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
Altinity Ltd
 
PDF
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
Altinity Ltd
 
PDF
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
Altinity Ltd
 
PDF
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
Altinity Ltd
 
PDF
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
Altinity Ltd
 
PDF
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
Altinity Ltd
 
PDF
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
Altinity Ltd
 
PDF
OSA Con 2022 - Signal Correlation, the Ho11y Grail - Michael Hausenblas - AWS...
Altinity Ltd
 
PDF
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
Altinity Ltd
 
PDF
OSA Con 2022 - Quick Reflexes_ Building Real-Time Data Analytics with Redpand...
Altinity Ltd
 
PDF
OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...
Altinity Ltd
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Altinity Ltd
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Altinity Ltd
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Altinity Ltd
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Altinity Ltd
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Altinity Ltd
 
ClickHouse ReplacingMergeTree in Telecom Apps
Altinity Ltd
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Ltd
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
Altinity Ltd
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
Altinity Ltd
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
Altinity Ltd
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
Altinity Ltd
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
Altinity Ltd
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
Altinity Ltd
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
Altinity Ltd
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
Altinity Ltd
 
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
Altinity Ltd
 
OSA Con 2022 - Signal Correlation, the Ho11y Grail - Michael Hausenblas - AWS...
Altinity Ltd
 
OSA Con 2022 - Scaling your Pandas Analytics with Modin - Doris Lee - Ponder.pdf
Altinity Ltd
 
OSA Con 2022 - Quick Reflexes_ Building Real-Time Data Analytics with Redpand...
Altinity Ltd
 
OSA Con 2022 - Extract, Transform, and Learn about your developers - Brian Le...
Altinity Ltd
 
Ad

Recently uploaded (20)

PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 

Data warehouse on Kubernetes - gentle intro to Clickhouse Operator, by Robert Hodges

  • 1. Data Warehouse on Kubernetes A gentle introduction to the ClickHouse Kubernetes Operator Robert Hodges
  • 2. Brief Intros www.altinity.com Leading software and services provider for ClickHouse Major committer and community sponsor in US and Western Europe Robert Hodges - Altinity CEO 30+ years on DBMS plus virtualization and security. ClickHouse is DBMS #20
  • 3. Why run data warehouse on Kubernetes? 1. Same environment as other cloud native services 2. Portability 3. Fast deployment cycles 4. Flexible mapping to resources ... 5. It offers revolutionary capabilities for building analytic systems
  • 5. Introduction to ClickHouse Understands SQL Runs on bare metal to cloud Shared nothing architecture Stores data in columns Parallel and vectorized execution Scales to many petabytes Is Open source (Apache 2.0) a b c d a b c d a b c d a b c d And it’s really fast!
  • 6. ClickHouse structure is optimized for speed Table Part Index Columns Indexed Sorted Compressed Part Index Columns Part
  • 7. ClickHouse has built-in sharding & replication ClickHouse event_loc ClickHouse event event_loc ClickHouse event_loc ClickHouse event_loc ClickHouse event_loc ClickHouse event_loc SELECT ... FROM event GROUP BY ... Result Set Zookeeper ZNodes Zookeeper ZNodes Zookeeper ZNodes
  • 8. What makes ClickHouse “cloud friendly?” ● Single process ● Relatively few configuration knobs ● Simple networking and storage ● Replication/high availability built in ● Already containerized!
  • 11. Obligatory slide on Kubernetes What does Kubernetes do for us? ● manage container-based systems ● build distributed applications declaratively ● allocate machine resources efficiently ● automate application deployment
  • 12. A simple distributed data service Load Balancer Service #1 Service #3 Service #2 Storage Storage Storage Traffic
  • 13. Defined using Kubernetes resources Pod “svc-1” Persistent Volume Service “svc” Stateful Set Persistent Volume Claim Persistent Volume Persistent Volume Pod “svc-2” Pod “svc-2” Persistent Volume Claim Persistent Volume Claim Config Maps SecretsConfig Maps Secrets
  • 14. Kubernetes NodeKubernetes NodeKubernetes Node Mapped to proxies, containers, and storage Container “svc-1” NVMe SSD NGINX “svc” Container “svc-2” Container “svc-3” NVMe SSD NVMe SSD
  • 16. ClickHouse on Kubernetes is complex! Zookeeper Services Zookeeper-0 Zookeeper-2 Zookeeper-1Shard 1 Replica 1 Replica Service Load Balancer Service Shard 1 Replica 2 Shard 2 Replica 1 Shard 2 Replica 2 Replica Service Replica Service Replica Service User Config Map Common Config Map Stateful Set Pod Persistent Volume Claim Persistent Volume Per-replica Config Map
  • 17. Operators encapsulate complex deployments kube-system namespace ClickHouse Operator your-favorite namespace Apache 2.0 source, distributed as Docker imageSingle specification Best practice deployment ClickHouse Resource Definition
  • 18. Installing and removing ClickHouse operator Get operator custom resource definition: wget https://raw.githubusercontent.com/Altinity/clickhouse-operato r/master/deploy/operator/clickhouse-operator-install.yaml Install the operator: kubectl apply -f clickhouse-operator-install.yaml Remove the operator: kubectl delete -f clickhouse-operator-install.yaml
  • 19. You will also need Zookeeper Simplest way is to use helm: kubectl create namespace zk helm install --namespace zk --name zookeeper incubator/zookeeper (There’s also an operator for Zookeeper now)
  • 21. Clickhouse custom resource definition defaults Global defaults configuration Cluster topology, users, zookeeper locations, etc. serviceTemplates Network resources podTemplates Container definitions storageClaimTemplates Storage definitions Stateful Sets Services Pods Persistent Volumes
  • 22. Basic data warehouse topology apiVersion: "clickhouse.altinity.com/v1" kind: "ClickHouseInstallation" metadata: name: "ch01" spec: configuration: clusters: - name: replicated layout: shardsCount: 2 replicasCount: 2 zookeeper: nodes: - host: zookeeper.zk Name used to identify all resources Definition of cluster Location of service we depend on
  • 23. You can add users and change configuration apiVersion: "clickhouse.altinity.com/v1" kind: "ClickHouseInstallation" metadata: name: "ch01" spec: configuration: users: demo/default: secret demo/password: demo demo/profile: default demo/quota: default demo/networks/ip: "::/0" clusters: - name: replicated Changes take a few minutes to propagate
  • 24. Simplicity requires defaults defaults: templates: volumeClaimTemplate: persistent podTemplate: clickhouse:19.6 serviceTemplate: minikube templates: volumeClaimTemplates: - name: persistent spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi Name of template Storage misconfigurations lead to insidious errors
  • 25. Speaking of storage, we have options ● Cloud storage: ○ AWS ○ GKE ○ Other cloud providers ● Local storage ○ emptyDir ○ hostPath ○ local Complex Network access Simple Fast
  • 26. Use storageClassName to bind storage Use kubectl to find available storage classes: kubectl describe StorageClass Bind to default storage: spec: storageClassName: default Bind to gp2 type spec: storageClassName: gp2
  • 27. Templates can be simple defaults: templates: volumeClaimTemplate: persistent podTemplate: clickhouse:19.6 serviceTemplate: minikube templates: podTemplates: - name: clickhouse:19.6 spec: containers: - name: clickhouse-pod image: yandex/clickhouse-server:19.6.2.11 Name of template Most values take defaults
  • 28. templates: podTemplates: - name: clickhouse-in-zone-us-east-1b spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "failure-domain.beta.kubernetes.io/zone" operator: In values: - "us-east-1b" containers: - name: clickhouse-pod image: yandex/clickhouse-server:19.3.7 Or specify complex configuration Set availability zone affinity More container properties
  • 29. Versatile mapping to different deployments ClickHouse Resource Definition Pod Load Balance PodPod Pod Pod Load BalanceLoad Balance Load BalanceLoad Balance Pod Pod Load BalanceLoad Balance Pod Pod Minikube Multi-AZ Deployment (Differences mostly in templates)
  • 30. Changes are recognized automatically defaults: templates: volumeClaimTemplate: persistent podTemplate: clickhouse:19.11 serviceTemplate: minikube templates: podTemplates: - name: clickhouse:19.11 spec: containers: - name: clickhouse-pod image: yandex/clickhouse-server:19.11.3.11 Make new version the default Define template for new version
  • 31. Upgrade runs while service is online Pod chi-0-0 Update resource definition ClickHouse Operator Apply Pod chi-0-1 Pod chi-1-1 Pod chi-1-0 Plan Compare resource to actual state Upgrade pods sequentially ClickHouse Resource Definition
  • 32. What’s going on inside Kubernetes? kubectl apply ClickHouse Operator Custom Resource Controller ClickHouse Resource Definition Kubernetes API Events Actions Etcd System state Native Controller Native Controller Native Controllers
  • 33. Grafana ClickHouse monitoring with prometheus ClickHouse Operator (ServiceMonitor) ClickHouse Installations Prometheus
  • 34. Demo Time Fast data warehouse deployment on Kubernetes
  • 35. Some pros and cons of data warehouses in Kubernetes
  • 36. Pod chi-0-1 Con: DNS resolution is complex/error prone Pod chi-1-1 Pod chi-0-1 Pod chi-1-0 Pod chi-0-0 DNS DNS DNS Restart Pod restart invalidates cluster DNS mappings Core DNS Server Name resolution deadlock at startup Must resolve host name to start up Won’t resolve host until pod starts
  • 37. Pro: Kubernetes overhead is minimal Cluster deploy and load Query Comparison Redshift dc2.large vs. Kubernetes EC2 r5.xlarge with EBS (st1)
  • 38. Con: error handling is complicated ClickHouse Operator ClickHouse Resource Definition Complex specification Kubernetes Storage Provider Asynchronous execution Local semantics
  • 39. Future of data warehouses on Kubernetes
  • 40. Architectural challenge Data warehouses are not cattle Losing/compromising data can be really bad Safety is paramount Security, migration, availability require logic above level of the operator
  • 41. Biggest opportunity Kubernetes democratizes data warehouse access Set up complex configurations in minutes Run on any platform that Kubernetes runs on Integrate easily with other services
  • 42. Dashboards and predictive analytics Most intriguing future benefit of Kubernetes Kafka Apps ClickHouse AppsContent Delivery Applications Grafana Tailored analytic solution for every service that needs it
  • 43. Thank you! We’re hiring! Presenter: [email protected] ClickHouse Operator: https://github.com/Altinity/clickhouse-operator ClickHouse: https://github.com/yandex/ClickHouse Altinity: https://www.altinity.com