SlideShare a Scribd company logo
ClickHouse
Enabling interactive data exploration
Alexander Kuzmenkov, ClickHouse developer at Yandex
What is ClickHouse?
● RDBMS for analytics
○ SQL
○ FOSS
● Distributed
○ Cross-datacenter replication
○ Tolerant to a single-datacenter failure
● Linear scaling
○ 100s of servers
○ 100G-1000G records daily
● Blazing fast
○ interactive data exploration
These slides have a lot of links!
Download at https://clck.ru/MMWHA
History
● Yandex.Metrica (think Google Analytics)
○ 30B rows daily, 3PB total, 700 machines
○ Processing speed up to 2TB/s
● Used a MyISAM solution with pre-aggregation
○ Didn’t scale for the growing load
○ Could only build pre-defined reports
● Started developing own columnar DB in 2009
● Enables “double real time” reporting
○ Add new events in real time
○ Build custom reports in real time
● Becomes the main engine in 2012
● Open-sourced in 2016
Adoption
● Yandex-wide
○ business analytics
● Thousands of companies worldwide
○ analyzing 1M DNS queries per second
(Cloudflare)
○ geospatial processing (Carto)
○ real-time ad analytics platform (LifeStreet)
○ storage performance analysis and
monitoring (Infinidat)
○ AdTech, FinTech, sensors, logs, event
streams, time series, etc.
Unusual applications:
● Blockchain transaction history
○ blockchair.com
○ bloxy.info
● CERN LHCb experiment
● Bioinformatics
When to use
● stream of well-structured, immutable
events
○ (mostly) fixed schema
○ append-only
○ heavy-weight ALTER UPDATE/DELETE
as a “GDPR escape hatch”
● flexible real-time reporting
○ queries finish in seconds, not hours
○ no preprocessing needed
○ enables ad-hoc experiments
When NOT to use
● OLTP
○ no transactions
○ single INSERT is atomic, but no
cross-query atomicity
○ very heavy UPDATEs
● key/value storage
○ sparse indexes
○ not suitable for point reads
● document/blob storage
○ optimized for records < 100 kB
● highly normalized data
○ optimized for star schema
How fast?
Query 1 Query 2 Query 3 Query 4 Setup
0.005 0.01 0.10 0.188 BrytlytDB 2.1 & 5-node IBM Minsky cluster
0.051 0.15 0.05 0.794 kdb+/q & 4 Intel Xeon Phi 7210 CPUs
0.241 0.83 1.21 1.781 ClickHouse, 3 x c5d.9xlarge cluster
0.762 2.47 4.13 6.041 BrytlytDB 1.0 & 2-node p2.16xlarge cluster
1.034 3.06 5.35 12.748 ClickHouse, Intel Core i5 4670K
1.56 1.25 2.25 2.97 Redshift, 6-node ds2.8xlarge cluster
2 2.00 1.00 3 BigQuery
2.362 3.56 4.02 20.412 Spark 2.4 & 21 x m3.xlarge HDFS cluster
14.389 32.15 33.45 67.312 Vertica, Intel Core i5 4670K
1.1 Billion Taxi rides benchmark: 1.1G records, 51 columns, 500 GB uncompressed CSV
More on performance at our site
Why so fast?
Read less data
● Data locality
○ columnar storage
○ sorted by PK
● Compression
Process data faster
● Parallelism
○ multithreading
○ distributed queries
● Efficient computation
○ >40 different GROUP BY algorithms
○ vectorized query execution with SIMD
Deployment
● Repos for major Linux distros
● Docker containers
● One self-contained binary
○ works everywhere
● ZooKeeper if you need replication
● Test it on your laptop
○ runs with minimal resources
● Stable release every two weeks
○ Thousands of scenarios tested for
each change
● No data migration on update
○ Just run a new server
Data ingestion
● INSERTs
● Many data formats
○ CSV, JSON, Parquet, CapnProto,
ORC, ...
● Batching for optimal performance
○ Buffer table engine
○ Kafka table engine
○ Third-party solutions — chproxy,
kittenhouse etc.
Analyzing data
● A rich SQL dialect
○ strong typing
○ higher order functions like arrayMap
○ variety of aggregate function —
quantiles, cardinality estimators etc.
○ sampling
○ Nested type for key/value records
○ LowCardinality type for dictionary
encoding
● BI tools support
○ Tableau
○ Apache Superset
○ Holistics
○ others via ODBC/SQLAlchemy/...
clickhouse-local
$ clickhouse local
--file ~/hits_v1.tsv
--structure 'WatchID UInt64, JavaEnable UInt8, ...'
--query 'SELECT UserID, SearchPhrase, count() FROM table GROUP BY UserID,
SearchPhrase'
Read 8873898 rows, 7.88 GiB in 5.208 sec., 1704038 rows/sec., 1.51 GiB/sec.
UserID SearchPhrase count()
8410854169855355129 пальные кость играть терхи 3
The full power of ClickHouse engine over a data file.
Interfaces
Connect to ClickHouse
● Native binary protocol
○ Drivers for Python, Go, C++, ...
● RESTful HTTP
● ODBC
● JDBC
● MySQL wire protocol
● PostgreSQL
○ clickhouse_fdw
○ pg2ch (logical replication)
Connect from ClickHouse
● File
● HDFS
● URL
● MySQL
● ODBC
● External dictionaries
Resources
● Check the docs at our site
● View the talks at our YouTube channel
● Create issues on github
● Ask on Stack Overflow
● Email us at clickhouse-feeback@yandex-team.ru
● Join English and Russian chats in Telegram
● Get commercial support from Altinity and others
● … and more
Thank you!
Questions?
● https://clickhouse.tech
● https://github.com/ClickHouse/ClickHouse
● clickhouse-feedback@yandex-team.ru

More Related Content

What's hot (20)

PDF
U C2007 My S Q L Performance Cookbook
guestae36d0
 
PPTX
Back to Basics Webinar 1: Introduction to NoSQL
MongoDB
 
PPTX
Lessons Learned Migrating 2+ Billion Documents at Craigslist
Jeremy Zawodny
 
PPTX
Mongo db present
scottmsims
 
PPTX
MongoDB basics & Introduction
Jerwin Roy
 
PDF
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
leifwalsh
 
PDF
Mongo db basics
Harischandra M K
 
PDF
Austin bdug 2011_01_27_small_and_big_data
Alex Pinkin
 
PDF
Zero to 1 Billion+ Records: A True Story of Learning & Scaling GameChanger
MongoDB
 
PDF
Mongo db basics
Claudio Montoya
 
PPT
Introduction to MongoDB
Ravi Teja
 
PPTX
Mongo db
Raghu nath
 
PPTX
Introducción a NoSQL
MongoDB
 
PPTX
Tms training
Chi Lee
 
PPTX
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
Gianfranco Palumbo
 
PDF
Шардинг в MongoDB, Henrik Ingo (MongoDB)
Ontico
 
PDF
Data engineering Stl Big Data IDEA user group
Adam Doyle
 
KEY
Mongo db admin_20110329
radiocats
 
PPTX
Mongodb @ vrt
JWORKS powered by Ordina
 
PPTX
Introduction to MongoDB
NodeXperts
 
U C2007 My S Q L Performance Cookbook
guestae36d0
 
Back to Basics Webinar 1: Introduction to NoSQL
MongoDB
 
Lessons Learned Migrating 2+ Billion Documents at Craigslist
Jeremy Zawodny
 
Mongo db present
scottmsims
 
MongoDB basics & Introduction
Jerwin Roy
 
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
leifwalsh
 
Mongo db basics
Harischandra M K
 
Austin bdug 2011_01_27_small_and_big_data
Alex Pinkin
 
Zero to 1 Billion+ Records: A True Story of Learning & Scaling GameChanger
MongoDB
 
Mongo db basics
Claudio Montoya
 
Introduction to MongoDB
Ravi Teja
 
Mongo db
Raghu nath
 
Introducción a NoSQL
MongoDB
 
Tms training
Chi Lee
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
Gianfranco Palumbo
 
Шардинг в MongoDB, Henrik Ingo (MongoDB)
Ontico
 
Data engineering Stl Big Data IDEA user group
Adam Doyle
 
Mongo db admin_20110329
radiocats
 
Introduction to MongoDB
NodeXperts
 

Similar to 21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration with ClickHouse (20)

PDF
ClickHouse Introduction, by Alexander Zaitsev, Altinity CTO
Altinity Ltd
 
PDF
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
Altinity Ltd
 
PDF
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Altinity Ltd
 
PDF
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
Altinity Ltd
 
PPTX
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
Altinity Ltd
 
PDF
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
Altinity Ltd
 
PDF
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
Altinity Ltd
 
PDF
ClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
Altinity Ltd
 
PDF
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Altinity Ltd
 
PDF
Big Data in Real-Time: How ClickHouse powers Admiral's visitor relationships ...
Altinity Ltd
 
PDF
Your first ClickHouse data warehouse
Altinity Ltd
 
PDF
Our Story With ClickHouse at seo.do
Metehan Çetinkaya
 
PDF
ClickHouse Deep Dive, by Aleksei Milovidov
Altinity Ltd
 
PPTX
Webinar 2017. Supercharge your analytics with ClickHouse. Vadim Tkachenko
Altinity Ltd
 
PPTX
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Altinity Ltd
 
PDF
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Altinity Ltd
 
PDF
[Meetup] a successful migration from elastic search to clickhouse
Vianney FOUCAULT
 
PDF
10 Good Reasons to Use ClickHouse
rpolat
 
PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Altinity Ltd
 
PPTX
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Altinity Ltd
 
ClickHouse Introduction, by Alexander Zaitsev, Altinity CTO
Altinity Ltd
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
Altinity Ltd
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Altinity Ltd
 
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
Altinity Ltd
 
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
Altinity Ltd
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
Altinity Ltd
 
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
Altinity Ltd
 
ClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
Altinity Ltd
 
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Altinity Ltd
 
Big Data in Real-Time: How ClickHouse powers Admiral's visitor relationships ...
Altinity Ltd
 
Your first ClickHouse data warehouse
Altinity Ltd
 
Our Story With ClickHouse at seo.do
Metehan Çetinkaya
 
ClickHouse Deep Dive, by Aleksei Milovidov
Altinity Ltd
 
Webinar 2017. Supercharge your analytics with ClickHouse. Vadim Tkachenko
Altinity Ltd
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Altinity Ltd
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Altinity Ltd
 
[Meetup] a successful migration from elastic search to clickhouse
Vianney FOUCAULT
 
10 Good Reasons to Use ClickHouse
rpolat
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Altinity Ltd
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Altinity Ltd
 
Ad

More from Athens Big Data (20)

PDF
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
Athens Big Data
 
PDF
21st Athens Big Data Meetup - 2nd Talk - Dive into ClickHouse storage system
Athens Big Data
 
PDF
19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...
Athens Big Data
 
PDF
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution
Athens Big Data
 
PDF
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
Athens Big Data
 
PDF
20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti
Athens Big Data
 
PDF
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
Athens Big Data
 
PDF
19th Athens Big Data Meetup - 1st Talk - NLP understanding
Athens Big Data
 
PDF
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
Athens Big Data
 
PDF
18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service
Athens Big Data
 
PDF
17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...
Athens Big Data
 
PDF
17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...
Athens Big Data
 
PDF
16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...
Athens Big Data
 
PDF
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
Athens Big Data
 
PDF
15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos
Athens Big Data
 
PDF
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
Athens Big Data
 
PDF
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
PDF
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...
Athens Big Data
 
PDF
11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...
Athens Big Data
 
PDF
9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading
Athens Big Data
 
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
Athens Big Data
 
21st Athens Big Data Meetup - 2nd Talk - Dive into ClickHouse storage system
Athens Big Data
 
19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...
Athens Big Data
 
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution
Athens Big Data
 
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
Athens Big Data
 
20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti
Athens Big Data
 
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
Athens Big Data
 
19th Athens Big Data Meetup - 1st Talk - NLP understanding
Athens Big Data
 
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
Athens Big Data
 
18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service
Athens Big Data
 
17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...
Athens Big Data
 
17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...
Athens Big Data
 
16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...
Athens Big Data
 
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
Athens Big Data
 
15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos
Athens Big Data
 
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
Athens Big Data
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...
Athens Big Data
 
11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...
Athens Big Data
 
9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading
Athens Big Data
 
Ad

Recently uploaded (20)

PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 

21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration with ClickHouse

  • 1. ClickHouse Enabling interactive data exploration Alexander Kuzmenkov, ClickHouse developer at Yandex
  • 2. What is ClickHouse? ● RDBMS for analytics ○ SQL ○ FOSS ● Distributed ○ Cross-datacenter replication ○ Tolerant to a single-datacenter failure ● Linear scaling ○ 100s of servers ○ 100G-1000G records daily ● Blazing fast ○ interactive data exploration These slides have a lot of links! Download at https://clck.ru/MMWHA
  • 3. History ● Yandex.Metrica (think Google Analytics) ○ 30B rows daily, 3PB total, 700 machines ○ Processing speed up to 2TB/s ● Used a MyISAM solution with pre-aggregation ○ Didn’t scale for the growing load ○ Could only build pre-defined reports ● Started developing own columnar DB in 2009 ● Enables “double real time” reporting ○ Add new events in real time ○ Build custom reports in real time ● Becomes the main engine in 2012 ● Open-sourced in 2016
  • 4. Adoption ● Yandex-wide ○ business analytics ● Thousands of companies worldwide ○ analyzing 1M DNS queries per second (Cloudflare) ○ geospatial processing (Carto) ○ real-time ad analytics platform (LifeStreet) ○ storage performance analysis and monitoring (Infinidat) ○ AdTech, FinTech, sensors, logs, event streams, time series, etc. Unusual applications: ● Blockchain transaction history ○ blockchair.com ○ bloxy.info ● CERN LHCb experiment ● Bioinformatics
  • 5. When to use ● stream of well-structured, immutable events ○ (mostly) fixed schema ○ append-only ○ heavy-weight ALTER UPDATE/DELETE as a “GDPR escape hatch” ● flexible real-time reporting ○ queries finish in seconds, not hours ○ no preprocessing needed ○ enables ad-hoc experiments
  • 6. When NOT to use ● OLTP ○ no transactions ○ single INSERT is atomic, but no cross-query atomicity ○ very heavy UPDATEs ● key/value storage ○ sparse indexes ○ not suitable for point reads ● document/blob storage ○ optimized for records < 100 kB ● highly normalized data ○ optimized for star schema
  • 7. How fast? Query 1 Query 2 Query 3 Query 4 Setup 0.005 0.01 0.10 0.188 BrytlytDB 2.1 & 5-node IBM Minsky cluster 0.051 0.15 0.05 0.794 kdb+/q & 4 Intel Xeon Phi 7210 CPUs 0.241 0.83 1.21 1.781 ClickHouse, 3 x c5d.9xlarge cluster 0.762 2.47 4.13 6.041 BrytlytDB 1.0 & 2-node p2.16xlarge cluster 1.034 3.06 5.35 12.748 ClickHouse, Intel Core i5 4670K 1.56 1.25 2.25 2.97 Redshift, 6-node ds2.8xlarge cluster 2 2.00 1.00 3 BigQuery 2.362 3.56 4.02 20.412 Spark 2.4 & 21 x m3.xlarge HDFS cluster 14.389 32.15 33.45 67.312 Vertica, Intel Core i5 4670K 1.1 Billion Taxi rides benchmark: 1.1G records, 51 columns, 500 GB uncompressed CSV More on performance at our site
  • 8. Why so fast? Read less data ● Data locality ○ columnar storage ○ sorted by PK ● Compression Process data faster ● Parallelism ○ multithreading ○ distributed queries ● Efficient computation ○ >40 different GROUP BY algorithms ○ vectorized query execution with SIMD
  • 9. Deployment ● Repos for major Linux distros ● Docker containers ● One self-contained binary ○ works everywhere ● ZooKeeper if you need replication ● Test it on your laptop ○ runs with minimal resources ● Stable release every two weeks ○ Thousands of scenarios tested for each change ● No data migration on update ○ Just run a new server
  • 10. Data ingestion ● INSERTs ● Many data formats ○ CSV, JSON, Parquet, CapnProto, ORC, ... ● Batching for optimal performance ○ Buffer table engine ○ Kafka table engine ○ Third-party solutions — chproxy, kittenhouse etc.
  • 11. Analyzing data ● A rich SQL dialect ○ strong typing ○ higher order functions like arrayMap ○ variety of aggregate function — quantiles, cardinality estimators etc. ○ sampling ○ Nested type for key/value records ○ LowCardinality type for dictionary encoding ● BI tools support ○ Tableau ○ Apache Superset ○ Holistics ○ others via ODBC/SQLAlchemy/...
  • 12. clickhouse-local $ clickhouse local --file ~/hits_v1.tsv --structure 'WatchID UInt64, JavaEnable UInt8, ...' --query 'SELECT UserID, SearchPhrase, count() FROM table GROUP BY UserID, SearchPhrase' Read 8873898 rows, 7.88 GiB in 5.208 sec., 1704038 rows/sec., 1.51 GiB/sec. UserID SearchPhrase count() 8410854169855355129 пальные кость играть терхи 3 The full power of ClickHouse engine over a data file.
  • 13. Interfaces Connect to ClickHouse ● Native binary protocol ○ Drivers for Python, Go, C++, ... ● RESTful HTTP ● ODBC ● JDBC ● MySQL wire protocol ● PostgreSQL ○ clickhouse_fdw ○ pg2ch (logical replication) Connect from ClickHouse ● File ● HDFS ● URL ● MySQL ● ODBC ● External dictionaries
  • 14. Resources ● Check the docs at our site ● View the talks at our YouTube channel ● Create issues on github ● Ask on Stack Overflow ● Email us at [email protected] ● Join English and Russian chats in Telegram ● Get commercial support from Altinity and others ● … and more
  • 15. Thank you! Questions? ● https://clickhouse.tech ● https://github.com/ClickHouse/ClickHouse ● [email protected]