SlideShare a Scribd company logo
T R E A S U R E D A T A
USER DEFINED PARTITIONING
A New Partitioning Strategy accelerating CDP Workload
Kai Sasaki
Software Engineer in Treasure Data
ABOUT ME
- Kai Sasaki (@Lewuathe)
- Software Engineer in Treasure Data since 2015
Working in Query Engine Team (Managing Hive, Presto in Treasure data)
- Contributor of Hadoop, Spark, Presto
TOPICS
PlazmaDB
PlazmaDB is a metadata storage for all log data in
Treasure Data. It supports import, export, INSERT
INTO, CREATE TABLE, DELETE etc on top of
PostgreSQL transaction mechanism.
Time Index Partitioning
Partitioning log data by the time log generated. The
time when the log is generated is specified as “time”
column in Treasure Data. It enables us to skip to
read unnecessary partitions.
User Defined Partitioning
(New!)
In addition to “time” column, we can use any
column as partitioning key. It provides us more
flexible partitioning strategy that fits CDP workload.
OVERVIEW OF QUERY ENGINE IN TD
PRESTO IN TREASURE DATA
• Multiple clusters with 50~60 worker cluster
• Presto 0.188
Stats
• 4.3+ million queries / month
• 400 trillion records / month
• 6+ PB / month
At the end of 2017
HIVE AND PRESTO ON PLAZMADB
Bulk Import
Fluentd
Mobile SDK
PlazmaDB
Presto
Hive
SQL, CDP
Amazon S3
PLAZMADB
PlazmaDB
Amazon S3
id data_set_id first_index_key last_index_key record_count path
P1 3065124 187250 1412323028 1412385139 109 abcdefg-1234567-abcdefg-1234567
P2 3065125 187250 1412323030 1412324030 209 abcdefg-1234567-abcdefg-9182841
P3 3065126 187250 1412327028 1412328028 31 abcdefg-1234567-abcdefg-5818231
P4 3065127 187250 1412325011 1412326001 102 abcdefg-1234567-abcdefg-7271828
P5 3065128 281254 1412324214 1412325210 987 abcdefg-1234567-abcdefg-6717284
P6 3065129 281254 1412325123 1412329800 541 abcdefg-1234567-abcdefg-5717274
Multi Column Indexes
s3://plazma-partitions/…
1-hour partitioning
PLAZMADB
PlazmaDB
Amazon S3
Realtime Storage
Amazon S3
Archive StorageMapReduce
Keeps 1-hour partitioning periodically.
Time-Indexed Partitioning
PROBLEM
• Time index partitioning is efficient only when “time” value is specified.

Specifying other columns cause full-scan which can make 

performance worse.
• The number of records in a partition highly depends on the table type, user usage.
SELECT
COUNT(1)
FROM table
WHERE
user_id = 1;
id data_set_id first_index_key last_index_key record_count path
P1 3065124 100 1412323028 1412385139 1 abcdefg-1234567-abcdefg-1234567
P2 3065125 100 1412323030 1412324030 1 abcdefg-1234567-abcdefg-9182841
P3 3065126 100 1412327028 1412328028 1 abcdefg-1234567-abcdefg-5818231
P4 3065127 200 1412325011 1412326001 101021 abcdefg-1234567-abcdefg-7271828
USER DEFINED PARTITIONING
USER DEFINED PARTITIONING
• User can specify the partitioning strategy based on their usage using partitioning key column 

max time range.
1h 1h 1h 1h1h
time
v1
v2
v3
c1
USER DEFINED PARTITIONING
1h 1h 1h 1h1h
time
c1
v1
v2
v3
… WHERE c1 = ‘v1’ AND time = …
• User can specify the partitioning strategy based on their usage using partitioning key column 

max time range.
USER DEFINED PARTITIONING
1h 1h 1h 1h1h
time
c1
v1
v2
v3
… WHERE c1 = ‘v1’ AND time = …
1h 1h 1h 1h1h
time
c1
v1
v2
v3
… WHERE c1 = ‘v1’ AND time = …
• User can specify the partitioning strategy based on their usage using partitioning key column 

max time range.
USER DEFINED PARTITIONING
CREATE TABLE via Presto or Hive
Insert data partitioned by set partitioning key
Set user defined configuration
The number of bucket, hash function, partitioning key
Read the data from UDP table
UDP table is now visible via Presto and HiveLOG
USER DEFINED CONFIGURATION
• We need to set columns to be used as partitioning key and the number of partitions. 

It should be custom configuration by each user.
user_table_id columns bucket_count partiton_function
T1 141849 [["o_orderkey","long"]] 32 hash
T2 141850 [[“user_id","long"]] 32 hash
T3 141910 [[“item_id”,”long"]] 16 hash
T4 151242
[[“region_id”,”long"],
[“device_id”,”long”]]
256 hash
CREATE UDP TABLE VIA PRESTO
• Presto and Hive support CREATE TABLE/INSERT INTO on UDP table
CREATE TABLE udp_customer
WITH (
bucketed_on = array[‘customer_id’],
bucket_count = 128
)
AS SELECT * from normal_customer;
CREATE UDP TABLE VIA PRESTO
• Override ConnectorPageSink to write MPC1 file based on user defined partitioning key.
PlazmaPageSink
PartitionedMPCWriter
TimeRangeMPCWriter
TimeRangeMPCWriter
TimeRangeMPCWriter
BufferedMPCWriter
BufferedMPCWriter
BufferedMPCWriter
.
.
.
b1
b2
b3
Page
1h
1h
1h
CREATE UDP TABLE VIA PRESTO
• Override ConnectorPageSink to write MPC1 file based on user defined partitioning key.
PlazmaPageSink
PartitionedMPCWriter
TimeRangeMPCWriter
TimeRangeMPCWriter
TimeRangeMPCWriter
BufferedMPCWriter
BufferedMPCWriter
BufferedMPCWriter
.
.
.
Page
CREATE UDP TABLE VIA PRESTO
id data_set_id first_index_key last_index_key record_count path
bucket_
number
P1 3065124 187250 1412323028 1412385139 109 abcdefg-1234567-abcdefg-1234567 1
P2 3065125 187250 1412323030 1412324030 209 abcdefg-1234567-abcdefg-9182841 2
P3 3065126 187250 1412327028 1412328028 31 abcdefg-1234567-abcdefg-5818231 3
P4 3065127 187250 1412325011 1412326001 102 abcdefg-1234567-abcdefg-7271828 2
P5 3065128 281254 1412324214 1412325210 987 abcdefg-1234567-abcdefg-6717284 16
P6 3065129 281254 1412325123 1412329800 541 abcdefg-1234567-abcdefg-5717274 14
• New bucket_number column is added to partition record in PlazmaDB.
READ DATA FROM UDP TABLE
ConnectorSplitManager#getSplits
returns data source splits to be read by Presto
cluster.
Decide target bucket from constraint
Constraint specifies the range should be read from
the table. ConnectorSplitManager asks PlazmaDB to
get the partitions on the target bucket.
Override Presto Connector to data source
Presto provides a plugin mechanism to connect any
data source flexibly. The connector provides the
information about metadata and location of real data
source, UDFs.
Receive constraint as TupleDomain
TupleDomain is created from query plan and
passed through TableLayout which is available
in ConnectorSplitManager
READ DATA FROM UDP TABLE
SplitManager
PlazmaDB
TableLayout
SQL
constraint
Map<ColumnHandle, Domain>
Distribute PageSource
… WHERE bucker_number in () …
PERFORMANCE
PERFORMANCE COMPARISON
SQLs on TPC-H (scaled factor=1000)
elapsedtime(sec)
0 sec
75 sec
150 sec
225 sec
300 sec
count1_filter groupby hashjoin
87.279
36.569
1.04
266.71
69.374
19.478
NORMAL UDP
COLOCATED JOIN
time
left right
l1 r1 l2 r2 l3 r3
left right left right
time
Distributed Join
l1 r1
l1 r1 l2 r2 l3 r3
l2 r2 l3 r3
Colocated Join
PERFORMANCE COMPARISON
SQLs on TPC-H (scaled factor=1000)
elapsedtime
0 sec
20 sec
40 sec
60 sec
80 sec
between mod_predicate count_distinct
NORMAL UDP
USER DEFINED PARTITIONING
1h 1h 1h 1h1h
time
c1
v1
v2
v3
… WHERE time = …
1h 1h 1h 1h1h
time
c1
v1
v2
v3
… WHERE time = …
FUTURE WORKS
• Maintaining efficient partitioning structure
• Developing Stella job to rearranging partitioning schema flexibly by using Presto resource.
• Various kinds of pipeline (streaming import etc) should support UDP table.
• Documentation
T R E A S U R E D A T A

More Related Content

What's hot (20)

PDF
Presto in my_use_case
wyukawa
 
PDF
Presto+MySQLで分散SQL
Sadayuki Furuhashi
 
PDF
How to Make Norikra Perfect
SATOSHI TAGOMORI
 
PDF
Presto - Hadoop Conference Japan 2014
Sadayuki Furuhashi
 
PDF
Data Analytics Service Company and Its Ruby Usage
SATOSHI TAGOMORI
 
PDF
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
PROIDEA
 
PPTX
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
DataStax
 
PDF
Distributed Logging Architecture in Container Era
SATOSHI TAGOMORI
 
PPTX
Druid at naver.com - part 1
Jungsu Heo
 
PDF
To Have Own Data Analytics Platform, Or NOT To
SATOSHI TAGOMORI
 
PDF
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Databricks
 
PDF
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter
 
PDF
Data Analytics Service Company and Its Ruby Usage
SATOSHI TAGOMORI
 
PDF
DataEngConf SF16 - Collecting and Moving Data at Scale
Hakka Labs
 
PDF
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Altinity Ltd
 
PDF
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Аліна Шепшелей
 
PDF
Web analytics at scale with Druid at naver.com
Jungsu Heo
 
PDF
Building a near real time search engine & analytics for logs using solr
lucenerevolution
 
PDF
PGConf APAC 2018 - Tale from Trenches
PGConf APAC
 
PDF
Introduction to Apache Tajo: Data Warehouse for Big Data
Gruter
 
Presto in my_use_case
wyukawa
 
Presto+MySQLで分散SQL
Sadayuki Furuhashi
 
How to Make Norikra Perfect
SATOSHI TAGOMORI
 
Presto - Hadoop Conference Japan 2014
Sadayuki Furuhashi
 
Data Analytics Service Company and Its Ruby Usage
SATOSHI TAGOMORI
 
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
PROIDEA
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
DataStax
 
Distributed Logging Architecture in Container Era
SATOSHI TAGOMORI
 
Druid at naver.com - part 1
Jungsu Heo
 
To Have Own Data Analytics Platform, Or NOT To
SATOSHI TAGOMORI
 
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Databricks
 
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter
 
Data Analytics Service Company and Its Ruby Usage
SATOSHI TAGOMORI
 
DataEngConf SF16 - Collecting and Moving Data at Scale
Hakka Labs
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Altinity Ltd
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Аліна Шепшелей
 
Web analytics at scale with Druid at naver.com
Jungsu Heo
 
Building a near real time search engine & analytics for logs using solr
lucenerevolution
 
PGConf APAC 2018 - Tale from Trenches
PGConf APAC
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Gruter
 

Similar to User Defined Partitioning on PlazmaDB (20)

PDF
Optimizing Presto Connector on Cloud Storage
Kai Sasaki
 
PDF
Real World Storage in Treasure Data
Kai Sasaki
 
PDF
[db tech showcase Tokyo 2019] Azure Cosmos DB Deep Dive ~ Partitioning, Globa...
Naoki (Neo) SATO
 
PDF
Partition and conquer large data in PostgreSQL 10
Ashutosh Bapat
 
PDF
Practical Partitioning in Production with Postgres
EDB
 
PDF
Practical Partitioning in Production with Postgres
Jimmy Angelakos
 
PDF
201810 td tech_talk
Keisuke Suzuki
 
PDF
Re-Engineering PostgreSQL as a Time-Series Database
All Things Open
 
PPTX
Partitioning 101
Connor McDonald
 
PDF
Optimizing Queries over Partitioned Tables in MPP Systems
EMC
 
PDF
The Truth About Partitioning
EDB
 
PDF
Modeling data and best practices for the Azure Cosmos DB.
Mohammad Asif
 
PPTX
Tablas y almacenamiento en windows azure
Eduardo Castro
 
PDF
Data Organisation: Table Partitioning in PostgreSQL
Mydbops
 
PDF
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Citus Data
 
PDF
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Citus Data
 
PDF
Large Table Partitioning with PostgreSQL and Django
EDB
 
PDF
Postgres Vision 2018: Five Sharding Data Models
EDB
 
PDF
Oracle
Maskur Kurniawan
 
PPTX
pgday.seoul 2019: TimescaleDB
Chan Shik Lim
 
Optimizing Presto Connector on Cloud Storage
Kai Sasaki
 
Real World Storage in Treasure Data
Kai Sasaki
 
[db tech showcase Tokyo 2019] Azure Cosmos DB Deep Dive ~ Partitioning, Globa...
Naoki (Neo) SATO
 
Partition and conquer large data in PostgreSQL 10
Ashutosh Bapat
 
Practical Partitioning in Production with Postgres
EDB
 
Practical Partitioning in Production with Postgres
Jimmy Angelakos
 
201810 td tech_talk
Keisuke Suzuki
 
Re-Engineering PostgreSQL as a Time-Series Database
All Things Open
 
Partitioning 101
Connor McDonald
 
Optimizing Queries over Partitioned Tables in MPP Systems
EMC
 
The Truth About Partitioning
EDB
 
Modeling data and best practices for the Azure Cosmos DB.
Mohammad Asif
 
Tablas y almacenamiento en windows azure
Eduardo Castro
 
Data Organisation: Table Partitioning in PostgreSQL
Mydbops
 
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Citus Data
 
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Citus Data
 
Large Table Partitioning with PostgreSQL and Django
EDB
 
Postgres Vision 2018: Five Sharding Data Models
EDB
 
pgday.seoul 2019: TimescaleDB
Chan Shik Lim
 
Ad

More from Kai Sasaki (20)

PDF
Graviton 2で実現する
コスト効率のよいCDP基盤
Kai Sasaki
 
PDF
Infrastructure for auto scaling distributed system
Kai Sasaki
 
PDF
Continuous Optimization for Distributed BigData Analysis
Kai Sasaki
 
PDF
Recent Changes and Challenges for Future Presto
Kai Sasaki
 
PDF
20180522 infra autoscaling_system
Kai Sasaki
 
PDF
Deep dive into deeplearn.js
Kai Sasaki
 
PDF
Presto updates to 0.178
Kai Sasaki
 
PPTX
How to ensure Presto scalability 
in multi use case
Kai Sasaki
 
PDF
Managing multi tenant resource toward Hive 2.0
Kai Sasaki
 
PDF
Embulk makes Japan visible
Kai Sasaki
 
PDF
Maintainable cloud architecture_of_hadoop
Kai Sasaki
 
PDF
図でわかるHDFS Erasure Coding
Kai Sasaki
 
PDF
Spark MLlib code reading ~optimization~
Kai Sasaki
 
PDF
How I tried MADE
Kai Sasaki
 
PDF
Reading kernel org
Kai Sasaki
 
PDF
Reading drill
Kai Sasaki
 
PDF
Kernel ext4
Kai Sasaki
 
PDF
Kernel bootstrap
Kai Sasaki
 
PDF
HyperLogLogを用いた、異なり数に基づく
 省リソースなk-meansの
k決定アルゴリズムの提案
Kai Sasaki
 
PDF
Kernel resource
Kai Sasaki
 
Graviton 2で実現する
コスト効率のよいCDP基盤
Kai Sasaki
 
Infrastructure for auto scaling distributed system
Kai Sasaki
 
Continuous Optimization for Distributed BigData Analysis
Kai Sasaki
 
Recent Changes and Challenges for Future Presto
Kai Sasaki
 
20180522 infra autoscaling_system
Kai Sasaki
 
Deep dive into deeplearn.js
Kai Sasaki
 
Presto updates to 0.178
Kai Sasaki
 
How to ensure Presto scalability 
in multi use case
Kai Sasaki
 
Managing multi tenant resource toward Hive 2.0
Kai Sasaki
 
Embulk makes Japan visible
Kai Sasaki
 
Maintainable cloud architecture_of_hadoop
Kai Sasaki
 
図でわかるHDFS Erasure Coding
Kai Sasaki
 
Spark MLlib code reading ~optimization~
Kai Sasaki
 
How I tried MADE
Kai Sasaki
 
Reading kernel org
Kai Sasaki
 
Reading drill
Kai Sasaki
 
Kernel ext4
Kai Sasaki
 
Kernel bootstrap
Kai Sasaki
 
HyperLogLogを用いた、異なり数に基づく
 省リソースなk-meansの
k決定アルゴリズムの提案
Kai Sasaki
 
Kernel resource
Kai Sasaki
 
Ad

Recently uploaded (20)

PPTX
Arduino Based Gas Leakage Detector Project
CircuitDigest
 
PDF
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
PPTX
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
DOCX
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
PPTX
Depth First Search Algorithm in 🧠 DFS in Artificial Intelligence (AI)
rafeeqshaik212002
 
PPTX
VITEEE 2026 Exam Details , Important Dates
SonaliSingh127098
 
PPTX
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
PDF
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
PDF
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
DOCX
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
PPTX
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
PDF
Zilliz Cloud Demo for performance and scale
Zilliz
 
PDF
Design Thinking basics for Engineers.pdf
CMR University
 
PPTX
Green Building & Energy Conservation ppt
Sagar Sarangi
 
PPTX
GitOps_Repo_Structure for begeinner(Scaffolindg)
DanialHabibi2
 
PPTX
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
PPTX
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
PPTX
Introduction to Neural Networks and Perceptron Learning Algorithm.pptx
Kayalvizhi A
 
PPTX
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
PPTX
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
Arduino Based Gas Leakage Detector Project
CircuitDigest
 
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
Depth First Search Algorithm in 🧠 DFS in Artificial Intelligence (AI)
rafeeqshaik212002
 
VITEEE 2026 Exam Details , Important Dates
SonaliSingh127098
 
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
Zilliz Cloud Demo for performance and scale
Zilliz
 
Design Thinking basics for Engineers.pdf
CMR University
 
Green Building & Energy Conservation ppt
Sagar Sarangi
 
GitOps_Repo_Structure for begeinner(Scaffolindg)
DanialHabibi2
 
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
Introduction to Neural Networks and Perceptron Learning Algorithm.pptx
Kayalvizhi A
 
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 

User Defined Partitioning on PlazmaDB

  • 1. T R E A S U R E D A T A USER DEFINED PARTITIONING A New Partitioning Strategy accelerating CDP Workload Kai Sasaki Software Engineer in Treasure Data
  • 2. ABOUT ME - Kai Sasaki (@Lewuathe) - Software Engineer in Treasure Data since 2015 Working in Query Engine Team (Managing Hive, Presto in Treasure data) - Contributor of Hadoop, Spark, Presto
  • 3. TOPICS PlazmaDB PlazmaDB is a metadata storage for all log data in Treasure Data. It supports import, export, INSERT INTO, CREATE TABLE, DELETE etc on top of PostgreSQL transaction mechanism. Time Index Partitioning Partitioning log data by the time log generated. The time when the log is generated is specified as “time” column in Treasure Data. It enables us to skip to read unnecessary partitions. User Defined Partitioning (New!) In addition to “time” column, we can use any column as partitioning key. It provides us more flexible partitioning strategy that fits CDP workload.
  • 4. OVERVIEW OF QUERY ENGINE IN TD
  • 5. PRESTO IN TREASURE DATA • Multiple clusters with 50~60 worker cluster • Presto 0.188 Stats • 4.3+ million queries / month • 400 trillion records / month • 6+ PB / month At the end of 2017
  • 6. HIVE AND PRESTO ON PLAZMADB Bulk Import Fluentd Mobile SDK PlazmaDB Presto Hive SQL, CDP Amazon S3
  • 7. PLAZMADB PlazmaDB Amazon S3 id data_set_id first_index_key last_index_key record_count path P1 3065124 187250 1412323028 1412385139 109 abcdefg-1234567-abcdefg-1234567 P2 3065125 187250 1412323030 1412324030 209 abcdefg-1234567-abcdefg-9182841 P3 3065126 187250 1412327028 1412328028 31 abcdefg-1234567-abcdefg-5818231 P4 3065127 187250 1412325011 1412326001 102 abcdefg-1234567-abcdefg-7271828 P5 3065128 281254 1412324214 1412325210 987 abcdefg-1234567-abcdefg-6717284 P6 3065129 281254 1412325123 1412329800 541 abcdefg-1234567-abcdefg-5717274 Multi Column Indexes s3://plazma-partitions/… 1-hour partitioning
  • 8. PLAZMADB PlazmaDB Amazon S3 Realtime Storage Amazon S3 Archive StorageMapReduce Keeps 1-hour partitioning periodically. Time-Indexed Partitioning
  • 9. PROBLEM • Time index partitioning is efficient only when “time” value is specified.
 Specifying other columns cause full-scan which can make 
 performance worse. • The number of records in a partition highly depends on the table type, user usage. SELECT COUNT(1) FROM table WHERE user_id = 1; id data_set_id first_index_key last_index_key record_count path P1 3065124 100 1412323028 1412385139 1 abcdefg-1234567-abcdefg-1234567 P2 3065125 100 1412323030 1412324030 1 abcdefg-1234567-abcdefg-9182841 P3 3065126 100 1412327028 1412328028 1 abcdefg-1234567-abcdefg-5818231 P4 3065127 200 1412325011 1412326001 101021 abcdefg-1234567-abcdefg-7271828
  • 11. USER DEFINED PARTITIONING • User can specify the partitioning strategy based on their usage using partitioning key column 
 max time range. 1h 1h 1h 1h1h time v1 v2 v3 c1
  • 12. USER DEFINED PARTITIONING 1h 1h 1h 1h1h time c1 v1 v2 v3 … WHERE c1 = ‘v1’ AND time = … • User can specify the partitioning strategy based on their usage using partitioning key column 
 max time range.
  • 13. USER DEFINED PARTITIONING 1h 1h 1h 1h1h time c1 v1 v2 v3 … WHERE c1 = ‘v1’ AND time = … 1h 1h 1h 1h1h time c1 v1 v2 v3 … WHERE c1 = ‘v1’ AND time = … • User can specify the partitioning strategy based on their usage using partitioning key column 
 max time range.
  • 14. USER DEFINED PARTITIONING CREATE TABLE via Presto or Hive Insert data partitioned by set partitioning key Set user defined configuration The number of bucket, hash function, partitioning key Read the data from UDP table UDP table is now visible via Presto and HiveLOG
  • 15. USER DEFINED CONFIGURATION • We need to set columns to be used as partitioning key and the number of partitions. 
 It should be custom configuration by each user. user_table_id columns bucket_count partiton_function T1 141849 [["o_orderkey","long"]] 32 hash T2 141850 [[“user_id","long"]] 32 hash T3 141910 [[“item_id”,”long"]] 16 hash T4 151242 [[“region_id”,”long"], [“device_id”,”long”]] 256 hash
  • 16. CREATE UDP TABLE VIA PRESTO • Presto and Hive support CREATE TABLE/INSERT INTO on UDP table CREATE TABLE udp_customer WITH ( bucketed_on = array[‘customer_id’], bucket_count = 128 ) AS SELECT * from normal_customer;
  • 17. CREATE UDP TABLE VIA PRESTO • Override ConnectorPageSink to write MPC1 file based on user defined partitioning key. PlazmaPageSink PartitionedMPCWriter TimeRangeMPCWriter TimeRangeMPCWriter TimeRangeMPCWriter BufferedMPCWriter BufferedMPCWriter BufferedMPCWriter . . . b1 b2 b3 Page 1h 1h 1h
  • 18. CREATE UDP TABLE VIA PRESTO • Override ConnectorPageSink to write MPC1 file based on user defined partitioning key. PlazmaPageSink PartitionedMPCWriter TimeRangeMPCWriter TimeRangeMPCWriter TimeRangeMPCWriter BufferedMPCWriter BufferedMPCWriter BufferedMPCWriter . . . Page
  • 19. CREATE UDP TABLE VIA PRESTO id data_set_id first_index_key last_index_key record_count path bucket_ number P1 3065124 187250 1412323028 1412385139 109 abcdefg-1234567-abcdefg-1234567 1 P2 3065125 187250 1412323030 1412324030 209 abcdefg-1234567-abcdefg-9182841 2 P3 3065126 187250 1412327028 1412328028 31 abcdefg-1234567-abcdefg-5818231 3 P4 3065127 187250 1412325011 1412326001 102 abcdefg-1234567-abcdefg-7271828 2 P5 3065128 281254 1412324214 1412325210 987 abcdefg-1234567-abcdefg-6717284 16 P6 3065129 281254 1412325123 1412329800 541 abcdefg-1234567-abcdefg-5717274 14 • New bucket_number column is added to partition record in PlazmaDB.
  • 20. READ DATA FROM UDP TABLE ConnectorSplitManager#getSplits returns data source splits to be read by Presto cluster. Decide target bucket from constraint Constraint specifies the range should be read from the table. ConnectorSplitManager asks PlazmaDB to get the partitions on the target bucket. Override Presto Connector to data source Presto provides a plugin mechanism to connect any data source flexibly. The connector provides the information about metadata and location of real data source, UDFs. Receive constraint as TupleDomain TupleDomain is created from query plan and passed through TableLayout which is available in ConnectorSplitManager
  • 21. READ DATA FROM UDP TABLE SplitManager PlazmaDB TableLayout SQL constraint Map<ColumnHandle, Domain> Distribute PageSource … WHERE bucker_number in () …
  • 23. PERFORMANCE COMPARISON SQLs on TPC-H (scaled factor=1000) elapsedtime(sec) 0 sec 75 sec 150 sec 225 sec 300 sec count1_filter groupby hashjoin 87.279 36.569 1.04 266.71 69.374 19.478 NORMAL UDP
  • 24. COLOCATED JOIN time left right l1 r1 l2 r2 l3 r3 left right left right time Distributed Join l1 r1 l1 r1 l2 r2 l3 r3 l2 r2 l3 r3 Colocated Join
  • 25. PERFORMANCE COMPARISON SQLs on TPC-H (scaled factor=1000) elapsedtime 0 sec 20 sec 40 sec 60 sec 80 sec between mod_predicate count_distinct NORMAL UDP
  • 26. USER DEFINED PARTITIONING 1h 1h 1h 1h1h time c1 v1 v2 v3 … WHERE time = … 1h 1h 1h 1h1h time c1 v1 v2 v3 … WHERE time = …
  • 27. FUTURE WORKS • Maintaining efficient partitioning structure • Developing Stella job to rearranging partitioning schema flexibly by using Presto resource. • Various kinds of pipeline (streaming import etc) should support UDP table. • Documentation
  • 28. T R E A S U R E D A T A