SlideShare a Scribd company logo
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
Introduction to Presenter
www.altinity.com
Leading software and services
provider for ClickHouse
Major committer and community
sponsor in US and Western Europe
Robert Hodges - Altinity CEO
30+ years on DBMS plus
virtualization and security.
ClickHouse is DBMS #20
Goals of the talk
● Introduce scaling axes of ClickHouse clusters
● Dig into distributed clusters
○ Using shards to scale writes
○ Using replicas to scale reads
● Describe handy tricks as well as common performance bottlenecks
Non-Goals:
● Boost performance of single nodes (though that’s important, too)
● Teach advanced ClickHouse performance management
Introduction to ClickHouse
Understands SQL
Runs on bare metal to cloud
Stores data in columns
Parallel and vectorized execution
Scales to many petabytes
Is Open source (Apache 2.0)
Is WAY fast!
a b c d
a b c d
a b c d
a b c d
ClickHouse Cluster Model
Clickhouse nodes can scale vertically
Storage
CPU
RAM
Host
Clickhouse nodes can scale vertically
Storage
CPU
RAM
Host
Clusters introduce horizontal scaling
Shards
Replicas
Host Host Host
Host
Replicas help with
concurrency
Shards add
IOPs
Different sharding and replication patterns
Shard 1
Shard 3
Shard 2
Shard 4
All Sharded
Data sharded 4
ways without
replication
Replica 1
Replica 3
Replica 2
Replica 4
All Replicated
Data replicated 4
times without
sharding
Shard 1
Replica 1
Shard 1
Replica 2
Shard 2
Replica 1
Shard 2
Replica 2
Sharded and
Replicated
Data sharded 2
ways and
replicated 2 times
Clusters define sharding and replication layouts
/etc/clickhouse-server/config.d/remote_servers.xml:
<yandex>
<remote_servers>
<ShardedAndReplicated>
<shard>
<replica><host>10.0.0.71</host><port>9000</port></replica>
<replica><host>10.0.0.72</host><port>9000</port></replica>
<internal_replication>true</internal_replication>
</shard>
<shard>
. . .
</shard>
</ShardedAndReplicated>
</remote_servers>
</yandex>
How ClickHouse uses Zookeeper
INSERT
Replicate
ClickHouse Node 1
Table: ontime
(Parts)
ReplicatedMergeTree
:9009
:9000 ClickHouse Node 2
Table: ontime
(Parts)
ReplicatedMergeTree
:9009
:9000
zookeeper-1
ZNodes
:2181 zookeeper-2
ZNodes
:2181 zookeeper-3
ZNodes
:2181
Cluster Performance in Practice
Setting up airline dataset on ClickHouse
clickhouse-0
ontime
_shard
airports
ontime
clickhouse-1
ontime
_shard
airports
ontime
clickhouse-2
ontime
_shard
airports
ontime
clickhouse-3
ontime
_shard
airports
ontime
Distributed
table
(No data)
Sharded,
replicated
table
(Partial data)
Fully
replicated
table
(All data)
Define sharded, replicated fact table
CREATE TABLE IF NOT EXISTS `ontime_shard` ON CLUSTER '{cluster}' (
`Year` UInt16,
`Quarter` UInt8,
...
)
Engine=ReplicatedMergeTree(
'/clickhouse/{cluster}/tables/{shard}/airline_shards/ontime_shard',
'{replica}')
PARTITION BY toYYYYMM(FlightDate)
ORDER BY (FlightDate, `Year`, `Month`, DepDel15)
Trick: Use macros to enable consistent DDL
/etc/clickhouse-server/config.d/macros.xml:
<yandex>
<macros>
<all>0</all>
<cluster>demo</cluster>
<shard>0</shard>
<replica>clickhouse-0</replica>
</macros>
</yandex>
Define a distributed table to query shards
CREATE TABLE IF NOT EXISTS ontime ON CLUSTER `{cluster}`
AS airline_shards.ontime_shard
ENGINE = Distributed(
'{cluster}', airline_shards, ontime_shard, rand())
Define a fully replicated dimension table
CREATE TABLE IF NOT EXISTS airports ON CLUSTER 'all-replicated' (
AirportID String,
Name String,
...
)
Engine=ReplicatedMergeTree(
'/clickhouse/{cluster}/tables/{all}/airline_shards/airports',
'{replica}')
PARTITION BY tuple()
PRIMARY KEY AirportID
ORDER BY AirportID
Overview of insertion options
● Local versus vs distributed data insertion
○ Local – no need to sync, larger blocks, faster
○ Distributed – sharding by ClickHouse
○ CHProxy -- distributes transactions across nodes
● Asynchronous (default) vs synchronous insertions
○ insert_distributed_sync
○ insert_quorum, select_sequential_consistency – linearization at replica level
Distributed vs local inserts in action
ontime
_shard
ontime
Insert via
distributed
table
Insert directly
to shards
ontime
_shard
ontime
ontime
_shard
ontime
ontime
_shard
ontime
Data
Pipeline Data
Pipeline
Pipeline is shard-aware
Slower and more
error prone
(Cache)
Testing cluster loading trade-offs
● With adequate I/O, RAM, CPU all
load options have equal
performance
● Direct loading is fastest for high
volume feeds
● Loading via distributed table is
most complex
○ Resource-inefficient
○ Can fail or lose data due to
async insertion
○ May generate more parts
○ Requires careful monitoring
CRASH!!
Selecting the sharding key
Shard 2 Shard 3Shard 1
Random Key, e.g.,
cityHash64(x) % 3
Must query
all shards
Nodes are
balanced
Shard 3
Specific Key e.g.,
TenantId % 3
Unbalanced
nodes
Queries can
skip shards
Shard 2Shard 1
Easier to
add nodes
Hard to
add nodes
Bi-level sharding combines both approaches
cityHash64(x,
TenantId %3)
Shard 2 Shard 3Shard 1
Natural Key e.g., TenantId % 3
Shard 2Shard 1
cityHash64(x,
TenantId % 3)
cityHash64(x,
TenantId % 3)
Shard 2Shard 1
Tenant-Group-1 Tenant-Group-2 Tenant-Group-3
Implement any sharding scheme via macros
/etc/clickhouse-server/config.d/macros.xml:
<yandex>
<macros>
<all>0</all>
<cluster>demo</cluster>
<group>2</group>
<shard>0</shard>
<replica>clickhouse-0</replica>
</macros>
</yandex>
CREATE TABLE IF NOT EXISTS `ontime_shard` ON CLUSTER '{cluster}' (
. . .)
Engine=ReplicatedMergeTree(
'/clickhouse/{cluster}/tables/ {group}/{shard}/airline_shards/ontime_shard',
'{replica}')
Adding nodes and rebalancing data
● To add servers:
○ Configure and bring up ClickHouse
○ Add schema
○ Add server to cluster definitions in remote_servers.xml, propagate to other servers
● Random sharding schemes allow easier addition of shards
○ Common pattern for time series--allow data to rebalance naturally over time
○ Use replication to propagate dimension tables
● Keyed partitioning schemes do not rebalance automatically
○ You can move parts manually using ALTER TABLE FREEZE PARTITION/rsync/ALTER TABLE
ATTACH PARTITION
○ Other methods work too
How do distributed queries work?
ontime
_shard
ontime
Application
ontime
_shard
ontime
ontime
_shard
ontime
ontime
_shard
ontime
Application
Move WHERE and
heavy grouping work
to innermost level
Innermost
subselect is
distributed
AggregateState
computed
locally
Aggregates
merged on
initiator node
Read performance using distributed tables
● Best case performance is linear
with number of nodes
● For fast queries network latency
may dominate parallelization
ClickHouse pushes down JOINs by default
SELECT o.Dest d, a.Name n, count(*) c, avg(o.ArrDelayMinutes) ad
FROM airline_shards_4.ontime o
JOIN airline_shards_4.airports a ON (a.IATA = o.Dest)
GROUP BY d, n HAVING c > 100000 ORDER BY d DESC
LIMIT 10
SELECT Dest AS d, Name AS n, count() AS c, avg(ArrDelayMinutes) AS ad
FROM airline_shards_4.ontime_shard AS o
ALL INNER JOIN airline_shards_4.airports AS a ON a.IATA = o.Dest
GROUP BY d, n HAVING c > 100000 ORDER BY d DESC LIMIT 10
...Unless the left side is a subquery
SELECT d, Name n, c AS flights, ad
FROM
(
SELECT Dest d, count(*) c, avg(ArrDelayMinutes) ad
FROM airline_shard_4.ontime
GROUP BY d HAVING c > 100000
ORDER BY ad DESC
)
LEFT JOIN airports ON airports.IATA = d
LIMIT 10
Remote
Servers
Distributed product modes affect joins
● ‘Normal’ IN/JOIN – run subquery locally on every server
○ Many nodes – many queries, expensive for distributed
● GLOBAL IN/JOIN - run subquery on initiator, and pass results to every server
● Distributed_product_mode alters “normal” IN/JOIN behavior :
○ deny (default)
○ allow – run queries in ‘normal’ mode, distributed subquery runs on every server, if GLOBAL
keyword is not used
○ local – use local tables for subqueries
○ global – automatically rewrite queries to ‘GLOBAL’ mode
Examples of IN operator processing
select foo from T1 where a in (select a from T2)
1) Subquery runs on a local table
select foo from T1_local
where a in (select a from T2_local)
2) Subquery runs on every node
select foo from T1_local
where a in (select a from T2)
3) Subquery runs on initiator node
create temporary table tmp Engine = Set AS select a from T2
select foo from T1_local where a in tmp;
Distributed query limitations and advice
● If joined table is missing, pushdown will fail
● Releases prior to 20.1 do not push down row-level security predicates
● Fully qualify table names to avoid syntax errors
● Distributed join behavior still somewhat limited
Settings to control distributed query
● distributed_product_mode -- How to handle joins of 2 distributed tables
● enable_optimize_predicate_expression -- Push down predicates
● max_replica_delay_for_distributed_queries -- Maximum permitted delay on
replicas
● load_balancing -- Load balancing algorithm to choose replicas
● prefer_localhost_replica -- Whether to try to use local replica first for queries
● optimize_skip_unused_shards -- One of several settings to avoid shards if
possible
(Plus others…)
Advanced Topics
Capacity planning for clusters
(Based on CloudFlare approach, see Resources slide below)
1. Test capacity on single nodes first
a. Understand contention between INSERTs, background merges, and SELECTs
b. Understand single node scaling issues (e.g., mark cache sizes)
2. If you can support your design ceiling with a single shard, stop here
a. Ensure you have HA covered, though
3. Build the cluster
4. Test full capacity on the cluster
a. Add shards to handle INSERTs
b. Add replicas to handle SELECTs
Debugging slow node problems
Distributed queries are only as fast as the slowest node
Use the remote() table function to test performance across the cluster. Example:
SELECT sum(Cancelled) AS cancelled_flights
FROM remote('clickhouse-0', airline_shards_4.ontime_shard)
SELECT sum(Cancelled) AS cancelled_flights
FROM remote('clickhouse-1', airline_shards_4.ontime_shard)
. . .
A non-exhaustive list of things that go wrong
Zookeeper becomes a bottleneck (avoid excessive numbers of parts)
Choosing a bad partition key
Degraded systems
Insufficient monitoring
Wrap-up and Further Resources
Key takeaways
● Shards add read/write capacity over a dataset (IOPs)
● Replicas enable more concurrent reads
● Choose sharding keys and clustering patterns with care
● Insert directly to shards for best performance
● Distributed query behavior is more complex than MergeTree
● It’s a big distributed system. Plan for things to go wrong
Well-managed clusters are extremely fast! Check your setup if you are not getting
good performance.
Resources
● Altinity Blog
● Secrets of ClickHouse Query Performance -- Altinity Webinar
● ClickHouse Capacity Planning for OLAP Workloads by Mik Kocikowski of
CloudFlare
● ClickHouse Telegram Channel
● ClickHouse Slack Channel
Thank you!
Special Offer:
Contact us for a
1-hour consultation!
Contacts:
info@altinity.com
Visit us at:
https://www.altinity.com
Free Consultation:
https://blog.altinity.com/offer

More Related Content

What's hot (20)

PDF
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
Altinity Ltd
 
PDF
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
Altinity Ltd
 
PDF
All about Zookeeper and ClickHouse Keeper.pdf
Altinity Ltd
 
PDF
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Altinity Ltd
 
PDF
Your first ClickHouse data warehouse
Altinity Ltd
 
PDF
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
Altinity Ltd
 
PDF
Adventures with the ClickHouse ReplacingMergeTree Engine
Altinity Ltd
 
PDF
Altinity Quickstart for ClickHouse
Altinity Ltd
 
PDF
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
Altinity Ltd
 
PDF
ClickHouse Monitoring 101: What to monitor and how
Altinity Ltd
 
PDF
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Altinity Ltd
 
PDF
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Altinity Ltd
 
PPTX
High Performance, High Reliability Data Loading on ClickHouse
Altinity Ltd
 
PDF
Fun with click house window functions webinar slides 2021-08-19
Altinity Ltd
 
PPTX
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Altinity Ltd
 
PDF
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
Altinity Ltd
 
PDF
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Jérôme Petazzoni
 
PDF
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Altinity Ltd
 
PDF
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
Altinity Ltd
 
PDF
ClickHouse materialized views - a secret weapon for high performance analytic...
Altinity Ltd
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
Altinity Ltd
 
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
Altinity Ltd
 
All about Zookeeper and ClickHouse Keeper.pdf
Altinity Ltd
 
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Altinity Ltd
 
Your first ClickHouse data warehouse
Altinity Ltd
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
Altinity Ltd
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Altinity Ltd
 
Altinity Quickstart for ClickHouse
Altinity Ltd
 
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
Altinity Ltd
 
ClickHouse Monitoring 101: What to monitor and how
Altinity Ltd
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Altinity Ltd
 
Webinar slides: MORE secrets of ClickHouse Query Performance. By Robert Hodge...
Altinity Ltd
 
High Performance, High Reliability Data Loading on ClickHouse
Altinity Ltd
 
Fun with click house window functions webinar slides 2021-08-19
Altinity Ltd
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Altinity Ltd
 
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
Altinity Ltd
 
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Jérôme Petazzoni
 
Building ClickHouse and Making Your First Contribution: A Tutorial_06.10.2021
Altinity Ltd
 
ClickHouse Mark Cache, by Mik Kocikowski, Cloudflare
Altinity Ltd
 
ClickHouse materialized views - a secret weapon for high performance analytic...
Altinity Ltd
 

Similar to Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance (20)

PPTX
CS 542 -- Query Execution
J Singh
 
PPTX
UNIT IV Compiler.pptx RUNTIMEENVIRONMENT
KavithaNagendran1
 
PPTX
MongoDB Auto-Sharding at Mongo Seattle
MongoDB
 
PDF
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Altinity Ltd
 
PPT
Google Cluster Innards
Martin Dvorak
 
PDF
Distributed Realtime Computation using Apache Storm
the100rabh
 
PDF
CoreOS, or How I Learned to Stop Worrying and Love Systemd
Richard Lister
 
PDF
Data Grids with Oracle Coherence
Ben Stopford
 
PPT
Os Reindersfinal
oscon2007
 
PPT
Os Reindersfinal
oscon2007
 
PPSX
Getting the-best-out-of-d3 d12
mistercteam
 
PPTX
Ibm db2 case study
Ishvitha Badhri
 
PDF
Imply at Apache Druid Meetup in London 1-15-20
Jelena Zanko
 
PDF
Spark Structured APIs
Knoldus Inc.
 
PDF
Azure Cosmos DB - Technical Deep Dive
Andre Essing
 
PPT
Migration To Multi Core - Parallel Programming Models
Zvi Avraham
 
PPT
Data race
James Wong
 
PDF
Accordion - VLDB 2014
Marco Serafini
 
PPTX
Apache Cassandra 2.0
Joe Stein
 
PPTX
Sql killedserver
ColdFusionConference
 
CS 542 -- Query Execution
J Singh
 
UNIT IV Compiler.pptx RUNTIMEENVIRONMENT
KavithaNagendran1
 
MongoDB Auto-Sharding at Mongo Seattle
MongoDB
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Altinity Ltd
 
Google Cluster Innards
Martin Dvorak
 
Distributed Realtime Computation using Apache Storm
the100rabh
 
CoreOS, or How I Learned to Stop Worrying and Love Systemd
Richard Lister
 
Data Grids with Oracle Coherence
Ben Stopford
 
Os Reindersfinal
oscon2007
 
Os Reindersfinal
oscon2007
 
Getting the-best-out-of-d3 d12
mistercteam
 
Ibm db2 case study
Ishvitha Badhri
 
Imply at Apache Druid Meetup in London 1-15-20
Jelena Zanko
 
Spark Structured APIs
Knoldus Inc.
 
Azure Cosmos DB - Technical Deep Dive
Andre Essing
 
Migration To Multi Core - Parallel Programming Models
Zvi Avraham
 
Data race
James Wong
 
Accordion - VLDB 2014
Marco Serafini
 
Apache Cassandra 2.0
Joe Stein
 
Sql killedserver
ColdFusionConference
 
Ad

More from Altinity Ltd (20)

PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Altinity Ltd
 
PDF
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Altinity Ltd
 
PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Altinity Ltd
 
PDF
Fun with ClickHouse Window Functions-2021-08-19.pdf
Altinity Ltd
 
PDF
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Altinity Ltd
 
PDF
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Altinity Ltd
 
PDF
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Altinity Ltd
 
PDF
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Altinity Ltd
 
PDF
ClickHouse ReplacingMergeTree in Telecom Apps
Altinity Ltd
 
PDF
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Ltd
 
PDF
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Ltd
 
PDF
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
Altinity Ltd
 
PDF
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
Altinity Ltd
 
PDF
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
Altinity Ltd
 
PDF
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
Altinity Ltd
 
PDF
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
Altinity Ltd
 
PDF
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
Altinity Ltd
 
PDF
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
Altinity Ltd
 
PDF
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
Altinity Ltd
 
PDF
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
Altinity Ltd
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Altinity Ltd
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Altinity Ltd
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Altinity Ltd
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Altinity Ltd
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Altinity Ltd
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Altinity Ltd
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Altinity Ltd
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Altinity Ltd
 
ClickHouse ReplacingMergeTree in Telecom Apps
Altinity Ltd
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Ltd
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Ltd
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
Altinity Ltd
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
Altinity Ltd
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
Altinity Ltd
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
Altinity Ltd
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
Altinity Ltd
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
Altinity Ltd
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
Altinity Ltd
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
Altinity Ltd
 
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
Altinity Ltd
 
Ad

Recently uploaded (20)

PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
July Patch Tuesday
Ivanti
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 

Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance

  • 2. Introduction to Presenter www.altinity.com Leading software and services provider for ClickHouse Major committer and community sponsor in US and Western Europe Robert Hodges - Altinity CEO 30+ years on DBMS plus virtualization and security. ClickHouse is DBMS #20
  • 3. Goals of the talk ● Introduce scaling axes of ClickHouse clusters ● Dig into distributed clusters ○ Using shards to scale writes ○ Using replicas to scale reads ● Describe handy tricks as well as common performance bottlenecks Non-Goals: ● Boost performance of single nodes (though that’s important, too) ● Teach advanced ClickHouse performance management
  • 4. Introduction to ClickHouse Understands SQL Runs on bare metal to cloud Stores data in columns Parallel and vectorized execution Scales to many petabytes Is Open source (Apache 2.0) Is WAY fast! a b c d a b c d a b c d a b c d
  • 6. Clickhouse nodes can scale vertically Storage CPU RAM Host
  • 7. Clickhouse nodes can scale vertically Storage CPU RAM Host
  • 8. Clusters introduce horizontal scaling Shards Replicas Host Host Host Host Replicas help with concurrency Shards add IOPs
  • 9. Different sharding and replication patterns Shard 1 Shard 3 Shard 2 Shard 4 All Sharded Data sharded 4 ways without replication Replica 1 Replica 3 Replica 2 Replica 4 All Replicated Data replicated 4 times without sharding Shard 1 Replica 1 Shard 1 Replica 2 Shard 2 Replica 1 Shard 2 Replica 2 Sharded and Replicated Data sharded 2 ways and replicated 2 times
  • 10. Clusters define sharding and replication layouts /etc/clickhouse-server/config.d/remote_servers.xml: <yandex> <remote_servers> <ShardedAndReplicated> <shard> <replica><host>10.0.0.71</host><port>9000</port></replica> <replica><host>10.0.0.72</host><port>9000</port></replica> <internal_replication>true</internal_replication> </shard> <shard> . . . </shard> </ShardedAndReplicated> </remote_servers> </yandex>
  • 11. How ClickHouse uses Zookeeper INSERT Replicate ClickHouse Node 1 Table: ontime (Parts) ReplicatedMergeTree :9009 :9000 ClickHouse Node 2 Table: ontime (Parts) ReplicatedMergeTree :9009 :9000 zookeeper-1 ZNodes :2181 zookeeper-2 ZNodes :2181 zookeeper-3 ZNodes :2181
  • 13. Setting up airline dataset on ClickHouse clickhouse-0 ontime _shard airports ontime clickhouse-1 ontime _shard airports ontime clickhouse-2 ontime _shard airports ontime clickhouse-3 ontime _shard airports ontime Distributed table (No data) Sharded, replicated table (Partial data) Fully replicated table (All data)
  • 14. Define sharded, replicated fact table CREATE TABLE IF NOT EXISTS `ontime_shard` ON CLUSTER '{cluster}' ( `Year` UInt16, `Quarter` UInt8, ... ) Engine=ReplicatedMergeTree( '/clickhouse/{cluster}/tables/{shard}/airline_shards/ontime_shard', '{replica}') PARTITION BY toYYYYMM(FlightDate) ORDER BY (FlightDate, `Year`, `Month`, DepDel15)
  • 15. Trick: Use macros to enable consistent DDL /etc/clickhouse-server/config.d/macros.xml: <yandex> <macros> <all>0</all> <cluster>demo</cluster> <shard>0</shard> <replica>clickhouse-0</replica> </macros> </yandex>
  • 16. Define a distributed table to query shards CREATE TABLE IF NOT EXISTS ontime ON CLUSTER `{cluster}` AS airline_shards.ontime_shard ENGINE = Distributed( '{cluster}', airline_shards, ontime_shard, rand())
  • 17. Define a fully replicated dimension table CREATE TABLE IF NOT EXISTS airports ON CLUSTER 'all-replicated' ( AirportID String, Name String, ... ) Engine=ReplicatedMergeTree( '/clickhouse/{cluster}/tables/{all}/airline_shards/airports', '{replica}') PARTITION BY tuple() PRIMARY KEY AirportID ORDER BY AirportID
  • 18. Overview of insertion options ● Local versus vs distributed data insertion ○ Local – no need to sync, larger blocks, faster ○ Distributed – sharding by ClickHouse ○ CHProxy -- distributes transactions across nodes ● Asynchronous (default) vs synchronous insertions ○ insert_distributed_sync ○ insert_quorum, select_sequential_consistency – linearization at replica level
  • 19. Distributed vs local inserts in action ontime _shard ontime Insert via distributed table Insert directly to shards ontime _shard ontime ontime _shard ontime ontime _shard ontime Data Pipeline Data Pipeline Pipeline is shard-aware Slower and more error prone (Cache)
  • 20. Testing cluster loading trade-offs ● With adequate I/O, RAM, CPU all load options have equal performance ● Direct loading is fastest for high volume feeds ● Loading via distributed table is most complex ○ Resource-inefficient ○ Can fail or lose data due to async insertion ○ May generate more parts ○ Requires careful monitoring CRASH!!
  • 21. Selecting the sharding key Shard 2 Shard 3Shard 1 Random Key, e.g., cityHash64(x) % 3 Must query all shards Nodes are balanced Shard 3 Specific Key e.g., TenantId % 3 Unbalanced nodes Queries can skip shards Shard 2Shard 1 Easier to add nodes Hard to add nodes
  • 22. Bi-level sharding combines both approaches cityHash64(x, TenantId %3) Shard 2 Shard 3Shard 1 Natural Key e.g., TenantId % 3 Shard 2Shard 1 cityHash64(x, TenantId % 3) cityHash64(x, TenantId % 3) Shard 2Shard 1 Tenant-Group-1 Tenant-Group-2 Tenant-Group-3
  • 23. Implement any sharding scheme via macros /etc/clickhouse-server/config.d/macros.xml: <yandex> <macros> <all>0</all> <cluster>demo</cluster> <group>2</group> <shard>0</shard> <replica>clickhouse-0</replica> </macros> </yandex> CREATE TABLE IF NOT EXISTS `ontime_shard` ON CLUSTER '{cluster}' ( . . .) Engine=ReplicatedMergeTree( '/clickhouse/{cluster}/tables/ {group}/{shard}/airline_shards/ontime_shard', '{replica}')
  • 24. Adding nodes and rebalancing data ● To add servers: ○ Configure and bring up ClickHouse ○ Add schema ○ Add server to cluster definitions in remote_servers.xml, propagate to other servers ● Random sharding schemes allow easier addition of shards ○ Common pattern for time series--allow data to rebalance naturally over time ○ Use replication to propagate dimension tables ● Keyed partitioning schemes do not rebalance automatically ○ You can move parts manually using ALTER TABLE FREEZE PARTITION/rsync/ALTER TABLE ATTACH PARTITION ○ Other methods work too
  • 25. How do distributed queries work? ontime _shard ontime Application ontime _shard ontime ontime _shard ontime ontime _shard ontime Application Move WHERE and heavy grouping work to innermost level Innermost subselect is distributed AggregateState computed locally Aggregates merged on initiator node
  • 26. Read performance using distributed tables ● Best case performance is linear with number of nodes ● For fast queries network latency may dominate parallelization
  • 27. ClickHouse pushes down JOINs by default SELECT o.Dest d, a.Name n, count(*) c, avg(o.ArrDelayMinutes) ad FROM airline_shards_4.ontime o JOIN airline_shards_4.airports a ON (a.IATA = o.Dest) GROUP BY d, n HAVING c > 100000 ORDER BY d DESC LIMIT 10 SELECT Dest AS d, Name AS n, count() AS c, avg(ArrDelayMinutes) AS ad FROM airline_shards_4.ontime_shard AS o ALL INNER JOIN airline_shards_4.airports AS a ON a.IATA = o.Dest GROUP BY d, n HAVING c > 100000 ORDER BY d DESC LIMIT 10
  • 28. ...Unless the left side is a subquery SELECT d, Name n, c AS flights, ad FROM ( SELECT Dest d, count(*) c, avg(ArrDelayMinutes) ad FROM airline_shard_4.ontime GROUP BY d HAVING c > 100000 ORDER BY ad DESC ) LEFT JOIN airports ON airports.IATA = d LIMIT 10 Remote Servers
  • 29. Distributed product modes affect joins ● ‘Normal’ IN/JOIN – run subquery locally on every server ○ Many nodes – many queries, expensive for distributed ● GLOBAL IN/JOIN - run subquery on initiator, and pass results to every server ● Distributed_product_mode alters “normal” IN/JOIN behavior : ○ deny (default) ○ allow – run queries in ‘normal’ mode, distributed subquery runs on every server, if GLOBAL keyword is not used ○ local – use local tables for subqueries ○ global – automatically rewrite queries to ‘GLOBAL’ mode
  • 30. Examples of IN operator processing select foo from T1 where a in (select a from T2) 1) Subquery runs on a local table select foo from T1_local where a in (select a from T2_local) 2) Subquery runs on every node select foo from T1_local where a in (select a from T2) 3) Subquery runs on initiator node create temporary table tmp Engine = Set AS select a from T2 select foo from T1_local where a in tmp;
  • 31. Distributed query limitations and advice ● If joined table is missing, pushdown will fail ● Releases prior to 20.1 do not push down row-level security predicates ● Fully qualify table names to avoid syntax errors ● Distributed join behavior still somewhat limited
  • 32. Settings to control distributed query ● distributed_product_mode -- How to handle joins of 2 distributed tables ● enable_optimize_predicate_expression -- Push down predicates ● max_replica_delay_for_distributed_queries -- Maximum permitted delay on replicas ● load_balancing -- Load balancing algorithm to choose replicas ● prefer_localhost_replica -- Whether to try to use local replica first for queries ● optimize_skip_unused_shards -- One of several settings to avoid shards if possible (Plus others…)
  • 34. Capacity planning for clusters (Based on CloudFlare approach, see Resources slide below) 1. Test capacity on single nodes first a. Understand contention between INSERTs, background merges, and SELECTs b. Understand single node scaling issues (e.g., mark cache sizes) 2. If you can support your design ceiling with a single shard, stop here a. Ensure you have HA covered, though 3. Build the cluster 4. Test full capacity on the cluster a. Add shards to handle INSERTs b. Add replicas to handle SELECTs
  • 35. Debugging slow node problems Distributed queries are only as fast as the slowest node Use the remote() table function to test performance across the cluster. Example: SELECT sum(Cancelled) AS cancelled_flights FROM remote('clickhouse-0', airline_shards_4.ontime_shard) SELECT sum(Cancelled) AS cancelled_flights FROM remote('clickhouse-1', airline_shards_4.ontime_shard) . . .
  • 36. A non-exhaustive list of things that go wrong Zookeeper becomes a bottleneck (avoid excessive numbers of parts) Choosing a bad partition key Degraded systems Insufficient monitoring
  • 37. Wrap-up and Further Resources
  • 38. Key takeaways ● Shards add read/write capacity over a dataset (IOPs) ● Replicas enable more concurrent reads ● Choose sharding keys and clustering patterns with care ● Insert directly to shards for best performance ● Distributed query behavior is more complex than MergeTree ● It’s a big distributed system. Plan for things to go wrong Well-managed clusters are extremely fast! Check your setup if you are not getting good performance.
  • 39. Resources ● Altinity Blog ● Secrets of ClickHouse Query Performance -- Altinity Webinar ● ClickHouse Capacity Planning for OLAP Workloads by Mik Kocikowski of CloudFlare ● ClickHouse Telegram Channel ● ClickHouse Slack Channel
  • 40. Thank you! Special Offer: Contact us for a 1-hour consultation! Contacts: [email protected] Visit us at: https://www.altinity.com Free Consultation: https://blog.altinity.com/offer