SlideShare a Scribd company logo
Presto in Treasure Data
(Updated)
Mitsunori Komatsu
• Mitsunori Komatsu, 

Software engineer @ Treasure Data.
• Presto, Hive, PlazmaDB, td-android-sdk, 

td-ios-sdk, Mobile SDK backend,

embedded-sdk
• github:komamitsu,

msgpack-java committer, Presto contributor,
etc…
About me
Today’s talk
• What's Presto?
• Pros & Cons
• Architecture
• Recent updates
• Who uses Presto?
• How do we use Presto?
What’s Presto?
Fast
• Distributed SQL query engine (MPP)
• Low latency and good performance
• No disk IO
• Pipelined execution (not Map Reduce)
• Compile a query plan down to byte code
• Off heap memory
• Suitable for ad-hoc query
Pluggable
• Pluggable backends (“connectors”)
• Cassandra / Hive / JMX / Kafka / MySQL /
PostgreSQL / System / TPCH
• We can add a new connector by 

extending SPI
• Treasure Data has been developed a
connector to access our storage
What kind of SQL
• Supports ANSI SQL (Not HiveQL)
• Easy to use Presto compared to HiveQL
• Structural type: Map, Array, JSON, Row
• Window functions
• Approximate queries
• http://blinkdb.org/
Limitations
• Fails with huge JOIN or DISTINCT
• In memory only (broadcast / distributed JOIN)
• No grace / hybrid hash join
• No fault tolerance
• Coordinator is SPOF
• No “cost based” optimization
• No authentication / authorization
• No native ODBC => Prestogres
Limitations
• Fails with huge JOIN or DISTINCT
• In memory only (broadcast / distributed JOIN)
• No grace / hybrid hash join
• No fault tolerance
• Coordinator is SPOF
• No “cost based” optimization
• No authentication / authorization
• No native ODBC => Prestogres
https://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/
Architectural overview
https://prestodb.io/overview.html
With Hive connector
Query plan
Output[nationkey, _col1] => [nationkey:bigint, count:bigint]

- _col1 := count
Exchange[GATHER] => nationkey:bigint, count:bigint
Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]

- count := "count"("count_15")
Exchange[REPARTITION] => nationkey:bigint, count_15:bigint
Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint]
- count_15 := "count"("expr")
Project => [nationkey:bigint, expr:bigint]
- expr := 1
InnerJoin[("custkey" = "custkey_0")] =>
[custkey:bigint, custkey_0:bigint, nationkey:bigint]
Project => [custkey:bigint]
Filter[("orderpriority" = '1-URGENT')] =>
[custkey:bigint, orderpriority:varchar]
TableScan[tpch:tpch:orders:sf0.01, original constraint=

('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]

- custkey := tpch:custkey:1

- orderpriority := tpch:orderpriority:5
Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint
TableScan[tpch:tpch:customer:sf0.01, original constraint=true] =>
[custkey_0:bigint, nationkey:bigint]

- custkey_0 := tpch:custkey:0

- nationkey := tpch:nationkey:3
select

c.nationkey,

count(1)

from orders o
join customer c

on o.custkey = c.custkey
where
o.orderpriority = '1-URGENT'
group by c.nationkey
Stage 1
Stage 2
Stage 0
Query, stage, task and split
Query
Task 0.0
Split
Task 1.0
Split
Task 1.1 Task 1.2
Split Split Split
Task 2.0
Split
Task 2.1 Task 2.2
Split Split Split Split Split Split Split
Split
For example…
TableScan
(FROM)
Aggregation
(GROUP BY)
Output
@worker#2 @worker#3 @worker#0
Query plan
Output[nationkey, _col1] => [nationkey:bigint, count:bigint]

- _col1 := count
Exchange[GATHER] => nationkey:bigint, count:bigint
Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]

- count := "count"("count_15")
Exchange[REPARTITION] => nationkey:bigint, count_15:bigint
Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint]
- count_15 := "count"("expr")
Project => [nationkey:bigint, expr:bigint]
- expr := 1
InnerJoin[("custkey" = "custkey_0")] =>
[custkey:bigint, custkey_0:bigint, nationkey:bigint]
Project => [custkey:bigint]
Filter[("orderpriority" = '1-URGENT')] =>
[custkey:bigint, orderpriority:varchar]
TableScan[tpch:tpch:orders:sf0.01, original constraint=

('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]

- custkey := tpch:custkey:1

- orderpriority := tpch:orderpriority:5
Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint
TableScan[tpch:tpch:customer:sf0.01, original constraint=true] =>
[custkey_0:bigint, nationkey:bigint]

- custkey_0 := tpch:custkey:0

- nationkey := tpch:nationkey:3
select

c.nationkey,

count(1)

from orders o
join customer c

on o.custkey = c.custkey
where
o.orderpriority = '1-URGENT'
group by c.nationkey
Query plan
Output[nationkey, _col1] => [nationkey:bigint, count:bigint]

- _col1 := count
Exchange[GATHER] => nationkey:bigint, count:bigint
Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]

- count := "count"("count_15")
Exchange[REPARTITION] => nationkey:bigint, count_15:bigint
Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint]
- count_15 := "count"("expr")
Project => [nationkey:bigint, expr:bigint]
- expr := 1
InnerJoin[("custkey" = "custkey_0")] =>
[custkey:bigint, custkey_0:bigint, nationkey:bigint]
Project => [custkey:bigint]
Filter[("orderpriority" = '1-URGENT')] =>
[custkey:bigint, orderpriority:varchar]
TableScan[tpch:tpch:orders:sf0.01, original constraint=

('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]

- custkey := tpch:custkey:1

- orderpriority := tpch:orderpriority:5
Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint
TableScan[tpch:tpch:customer:sf0.01, original constraint=true] =>
[custkey_0:bigint, nationkey:bigint]

- custkey_0 := tpch:custkey:0

- nationkey := tpch:nationkey:3
select

c.nationkey,

count(1)

from orders o
join customer c

on o.custkey = c.custkey
where
o.orderpriority = '1-URGENT'
group by c.nationkey
Stage 3
Query plan
Output[nationkey, _col1] => [nationkey:bigint, count:bigint]

- _col1 := count
Exchange[GATHER] => nationkey:bigint, count:bigint
Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]

- count := "count"("count_15")
Exchange[REPARTITION] => nationkey:bigint, count_15:bigint
Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint]
- count_15 := "count"("expr")
Project => [nationkey:bigint, expr:bigint]
- expr := 1
InnerJoin[("custkey" = "custkey_0")] =>
[custkey:bigint, custkey_0:bigint, nationkey:bigint]
Project => [custkey:bigint]
Filter[("orderpriority" = '1-URGENT')] =>
[custkey:bigint, orderpriority:varchar]
TableScan[tpch:tpch:orders:sf0.01, original constraint=

('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]

- custkey := tpch:custkey:1

- orderpriority := tpch:orderpriority:5
Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint
TableScan[tpch:tpch:customer:sf0.01, original constraint=true] =>
[custkey_0:bigint, nationkey:bigint]

- custkey_0 := tpch:custkey:0

- nationkey := tpch:nationkey:3
select

c.nationkey,

count(1)

from orders o
join customer c

on o.custkey = c.custkey
where
o.orderpriority = '1-URGENT'
group by c.nationkey
Stage 3
Stage 2
Query plan
Output[nationkey, _col1] => [nationkey:bigint, count:bigint]

- _col1 := count
Exchange[GATHER] => nationkey:bigint, count:bigint
Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]

- count := "count"("count_15")
Exchange[REPARTITION] => nationkey:bigint, count_15:bigint
Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint]
- count_15 := "count"("expr")
Project => [nationkey:bigint, expr:bigint]
- expr := 1
InnerJoin[("custkey" = "custkey_0")] =>
[custkey:bigint, custkey_0:bigint, nationkey:bigint]
Project => [custkey:bigint]
Filter[("orderpriority" = '1-URGENT')] =>
[custkey:bigint, orderpriority:varchar]
TableScan[tpch:tpch:orders:sf0.01, original constraint=

('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]

- custkey := tpch:custkey:1

- orderpriority := tpch:orderpriority:5
Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint
TableScan[tpch:tpch:customer:sf0.01, original constraint=true] =>
[custkey_0:bigint, nationkey:bigint]

- custkey_0 := tpch:custkey:0

- nationkey := tpch:nationkey:3
select

c.nationkey,

count(1)

from orders o
join customer c

on o.custkey = c.custkey
where
o.orderpriority = '1-URGENT'
group by c.nationkey
Stage 3
Stage 2
Stage 1
Query plan
Output[nationkey, _col1] => [nationkey:bigint, count:bigint]

- _col1 := count
Exchange[GATHER] => nationkey:bigint, count:bigint
Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]

- count := "count"("count_15")
Exchange[REPARTITION] => nationkey:bigint, count_15:bigint
Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint]
- count_15 := "count"("expr")
Project => [nationkey:bigint, expr:bigint]
- expr := 1
InnerJoin[("custkey" = "custkey_0")] =>
[custkey:bigint, custkey_0:bigint, nationkey:bigint]
Project => [custkey:bigint]
Filter[("orderpriority" = '1-URGENT')] =>
[custkey:bigint, orderpriority:varchar]
TableScan[tpch:tpch:orders:sf0.01, original constraint=

('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]

- custkey := tpch:custkey:1

- orderpriority := tpch:orderpriority:5
Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint
TableScan[tpch:tpch:customer:sf0.01, original constraint=true] =>
[custkey_0:bigint, nationkey:bigint]

- custkey_0 := tpch:custkey:0

- nationkey := tpch:nationkey:3
select

c.nationkey,

count(1)

from orders o
join customer c

on o.custkey = c.custkey
where
o.orderpriority = '1-URGENT'
group by c.nationkey
Stage 3
Stage 2
Stage 1
Stage 0
What each component does?
Presto Cli
Coordinator
- Parse Query
- Analyze Query
- Create Query Plan
- Execute Query
- Contains Stages
- Execute Stages
- Contains Tasks
- Issue Tasks
Discovery Service
Worker
Worker
- Execute Tasks
- Convert Query to Java
Bytecode (Operator)
- Execute Operator
Connector
- MetaData
- Table, Column…
- SplitManager
- Split, …
Connector
- RecordSetProvider
- RecordSet
- RecordCursor
- Read Storage
Connector
Storage
Worker
Connector
External

Metadata?
Presto Cli
Coordinator
- Parse Query
- Analyze Query
- Create Query Plan
- Execute Query
- Contains Stages
- Execute Stages
- Contains Tasks
- Issue Tasks
Discovery Service
Worker
Worker
- Execute Tasks
- Convert Query to Java
Bytecode (Operator)
- Execute Operator
Connector
- MetaData
- Table, Column…
- SplitManager
- Split, …
Connector
- RecordSetProvider
- RecordSet
- RecordCursor
- Read Storage
Connector
Storage
Worker
Connector
External

Metadata?
What each component does?
Presto Cli
Coordinator
- Parse Query
- Analyze Query
- Create Query Plan
- Execute Query
- Contains Stages
- Execute Stages
- Contains Tasks
- Issue Tasks
Discovery Service
Worker
Worker
- Execute Tasks
- Convert Query to Java
Bytecode (Operator)
- Execute Operator
Connector
- MetaData
- Table, Column…
- SplitManager
- Split, …
Connector
- RecordSetProvider
- RecordSet
- RecordCursor
- Read Storage
Connector
Storage
Worker
Connector
External

Metadata?
What each component does?
Presto Cli
Coordinator
- Parse Query
- Analyze Query
- Create Query Plan
- Execute Query
- Contains Stages
- Execute Stages
- Contains Tasks
- Issue Tasks
Discovery Service
Worker
Worker
- Execute Tasks
- Convert Query to Java
Bytecode (Operator)
- Execute Operator
Connector
- MetaData
- Table, Column…
- SplitManager
- Split, …
Connector
- RecordSetProvider
- RecordSet
- RecordCursor
- Read Storage
Connector
Storage
Worker
Connector
External

Metadata?
What each component does?
Presto Cli
Coordinator
- Parse Query
- Analyze Query
- Create Query Plan
- Execute Query
- Contains Stages
- Execute Stages
- Contains Tasks
- Issue Tasks
Discovery Service
Worker
Worker
- Execute Tasks
- Convert Query to Java
Bytecode (Operator)
- Execute Operator
Connector
- MetaData
- Table, Column…
- SplitManager
- Split, …
Connector
- RecordSetProvider
- RecordSet
- RecordCursor
- Read Storage
Connector
Storage
Worker
Connector
External

Metadata?
What each component does?
Multi connectors
Presto MySQL
- test.users
PostgreSQL
- public.accesses
- public.codes
Raptor
- default.

user_codes
create table raptor.default.user_codes
as
select c.text, u.name, count(1) as count
from postgres.public.accesses a
join mysql.test.users u
on cast(a.user as bigint) = u.id
join postgres.public.codes c
on a.code = c.code
where a.time < 1200000000
connector.name=mysql
connection-url=jdbc:mysql://127.0.0.1:3306
connection-user=root
- etc/catalog/mysql.properties
connector.name=postgresql
connection-url=jdbc:postgresql://127.0.0.1:5432/postgres
connection-user=komamitsu
- etc/catalog/postgres.properties
connector.name=raptor
metadata.db.type=h2
metadata.db.filename=var/data/db/MetaStore
- etc/catalog/postgres.properties
Multi connectors
:
2015-08-30T22:29:28.882+0900 DEBUG 20150830_132930_00032_btyir.
4.0-0-50 com.facebook.presto.plugin.jdbc.JdbcRecordCursor
Executing: SELECT `id`, `name` FROM `test`.`users`
:
2015-08-30T22:29:28.856+0900 DEBUG 20150830_132930_00032_btyir.
5.0-0-56 com.facebook.presto.plugin.jdbc.JdbcRecordCursor
Executing: SELECT "text", "code" FROM “public"."codes"
:
2015-08-30T22:30:09.294+0900 DEBUG 20150830_132930_00032_btyir.
3.0-2-70 com.facebook.presto.plugin.jdbc.JdbcRecordCursor
Executing: SELECT "user", "code", "time" FROM "public"."accesses" WHERE
(("time" < 1200000000))
:
- log message
Condition pushdown
Join pushdown isn’t supported
Multi connectors
Recent updates (0.108~0.117)
New functions:
normalize(), from_iso8601_timestamp(),
from_iso8601_date(), to_iso8601()
slice(), md5(), array_min(), array_max(), histogram()
element_at(), url_encode(), url_decode()
sha1(), sha256(), sha512()
multimap_agg(), checksum()
Teradata compatibility functions:
index(), char2hexint(), to_char(),
to_date(), to_timestamp()
Recent updates (0.108~0.117)
Cluster Resource Management.
- query.max-memory
- query.max-memory-per-node
- resources.reserved-system-memory
“Big Query” option was removed
Semi-joins are hash-partitioned if distributed_join is turned on.
Add support for partial cast from JSON.
For example, json can be cast to array<json>, map<varchar, json>, etc.
Use JSON_PARSE() and JSON_FORMAT() instead of CAST
Add query_max_run_time session property and query.max-run-time
config.
Queries are failed after the specified duration.
optimizer.optimize-hash-generation and distributed-joins-enabled are
both enabled by default now.
Recent updates (0.108~0.117)
Cluster Resource Management.
- query.max-memory
- query.max-memory-per-node
- resources.reserved-system-memory
“Big Query” option was removed
Semi-joins are hash-partitioned if distributed_join is turned on.
Add support for partial cast from JSON.
For example, json can be cast to array<json>, map<varchar, json>, etc.
Use JSON_PARSE() and JSON_FORMAT() instead of CAST
Add query_max_run_time session property and query.max-run-time
config.
Queries are failed after the specified duration.
optimizer.optimize-hash-generation and distributed-joins-enabled are
both enabled by default now.
https://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/
Who uses Presto?
• Facebook
http://www.slideshare.net/dain1/presto-meetup-2015
• Dropbox
Who uses Presto?
• Airbnb
Who uses Presto?
• Qubole
• SaaS
• Treasure Data
• SaaS
• Teradata
• commercial support
Who uses Presto?
As a service…
Today’s talk
• What's Presto?
• How do we use Presto?
• What’s Treasure Data?
• System architecture
• How we manage Presto
What’s Treasure Data?
What’s Treasure Data?
and more…
• Collect, store and analyze data
• With multi-tenancy
• With a schema-less structure
• On cloud, but mitigates the outage
Founded in 2011 in the U.S.
Over 85 employees
What’s Treasure Data?
and more…
Founded in 2011 in the U.S.
Over 85 employees
• Collect, store and analyze data
• With multi-tenancy
• With a schema-less structure
• On cloud, but mitigates the outage
What’s Treasure Data?
and more…
Founded in 2011 in the U.S.
Over 85 employees
• Collect, store and analyze data
• With multi-tenancy
• With a schema-less structure
• On cloud, but mitigates the outage
What’s Treasure Data?
and more…
Founded in 2011 in the U.S.
Over 85 employees
• Collect, store and analyze data
• With multi-tenancy
• With a schema-less structure
• On cloud, but mitigates the outage
What’s Treasure Data?
and more…
Founded in 2011 in the U.S.
Over 85 employees
• Collect, store and analyze data
• With multi-tenancy
• With a schema-less structure
• On cloud, but mitigates the outage
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log
App Log
Censor
CRM
ERP
RDBMS
Treasure Agent(Server)
SDK(JS, Android, iOS, Unity)
Streaming Collector
Batch /
Reliability
Ad-hoc /

Low latency
KPI$
KPI Dashboard
BI Tools
Other Products
RDBMS, Google Docs,
AWS S3, FTP Server, etc.
Metric Insights 
Tableau, 
Motion Board etc. 
POS
REST API
ODBC / JDBC
SQL, Pig 
Bulk Uploader
Embulk,

TD Toolbelt
SQL-based query
@AWS or @IDCF
Connectivity
Economy & Flexibility Simple & Supported
Collect! Store! Analyze!
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log
App Log
Censor
CRM
ERP
RDBMS
Treasure Agent(Server)
SDK(JS, Android, iOS, Unity)
Streaming Collector
Batch /
Reliability
Ad-hoc /

Low latency
KPI$
KPI Dashboard
BI Tools
Other Products
RDBMS, Google Docs,
AWS S3, FTP Server, etc.
Metric Insights 
Tableau, 
Motion Board etc. 
POS
REST API
ODBC / JDBC
SQL, Pig 
Bulk Uploader
Embulk,

TD Toolbelt
SQL-based query
@AWS or @IDCF
Connectivity
Economy & Flexibility Simple & Supported
Collect! Store! Analyze!
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log
App Log
Censor
CRM
ERP
RDBMS
Treasure Agent(Server)
SDK(JS, Android, iOS, Unity)
Streaming Collector
Batch /
Reliability
Ad-hoc /

Low latency
KPI$
KPI Dashboard
BI Tools
Other Products
RDBMS, Google Docs,
AWS S3, FTP Server, etc.
Metric Insights 
Tableau, 
Motion Board etc. 
POS
REST API
ODBC / JDBC
SQL, Pig 
Bulk Uploader
Embulk,

TD Toolbelt
SQL-based query
@AWS or @IDCF
Connectivity
Economy & Flexibility Simple & Supported
Collect! Store! Analyze!
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log
App Log
Censor
CRM
ERP
RDBMS
Treasure Agent(Server)
SDK(JS, Android, iOS, Unity)
Streaming Collector
Batch /
Reliability
Ad-hoc /

Low latency
KPI$
KPI Dashboard
BI Tools
Other Products
RDBMS, Google Docs,
AWS S3, FTP Server, etc.
Metric Insights 
Tableau, 
Motion Board etc. 
POS
REST API
ODBC / JDBC
SQL, Pig 
Bulk Uploader
Embulk,

TD Toolbelt
SQL-based query
@AWS or @IDCF
Connectivity
Economy & Flexibility Simple & Supported
Collect! Store! Analyze!
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log
App Log
Censor
CRM
ERP
RDBMS
Treasure Agent(Server)
SDK(JS, Android, iOS, Unity)
Streaming Collector
Batch /
Reliability
Ad-hoc /

Low latency
KPI$
KPI Dashboard
BI Tools
Other Products
RDBMS, Google Docs,
AWS S3, FTP Server, etc.
Metric Insights 
Tableau, 
Motion Board etc. 
POS
REST API
ODBC / JDBC
SQL, Pig 
Bulk Uploader
Embulk,

TD Toolbelt
SQL-based query
@AWS or @IDCF
Connectivity
Economy & Flexibility Simple & Supported
Collect! Store! Analyze!
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log
App Log
Censor
CRM
ERP
RDBMS
Treasure Agent(Server)
SDK(JS, Android, iOS, Unity)
Streaming Collector
Batch /
Reliability
Ad-hoc /

Low latency
KPI$
KPI Dashboard
BI Tools
Other Products
RDBMS, Google Docs,
AWS S3, FTP Server, etc.
Metric Insights 
Tableau, 
Motion Board etc. 
POS
REST API
ODBC / JDBC
SQL, Pig 
Bulk Uploader
Embulk,

TD Toolbelt
SQL-based query
@AWS or @IDCF
Connectivity
Economy & Flexibility Simple & Supported
Collect! Store! Analyze!
Integrations?
Treasure Data in figures
Some of our customers
System architecture
Components in Treasure Data
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
result bucket
(S3/RiakCS) Retry failed query
if needed
Authentication /
Authorization
Columnar file format.
Schema-less.
td-presto
connector
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Retry failed query
if needed
Authentication /
Authorization
Columnar file format.
Schema-less.
td-presto
connector
result bucket
(S3/RiakCS)
Components in Treasure Data
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Retry failed query
if needed
Authentication /
Authorization
Columnar file format.
Schema-less.
td-presto
connector
result bucket
(S3/RiakCS)
Components in Treasure Data
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Retry failed query
if needed
Authentication /
Authorization
Columnar file format.
Schema-less.
td-presto
connector
result bucket
(S3/RiakCS)
Components in Treasure Data
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Retry failed query
if needed
Authentication /
Authorization
Columnar file format.
Schema-less.
td-presto
connector
result bucket
(S3/RiakCS)
Components in Treasure Data
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Retry failed query
if needed
Authentication /
Authorization
Columnar file format.
Schema-less.
td-presto
connector
result bucket
(S3/RiakCS)
Components in Treasure Data
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Retry failed query
if needed
Authentication /
Authorization
Columnar file format.
Schema-less.
td-presto
connector
result bucket
(S3/RiakCS)
Components in Treasure Data
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Retry failed query
if needed
Authentication /
Authorization
Columnar file format.
Schema-less.
td-presto
connector
result bucket
(S3/RiakCS)
Components in Treasure Data
Schema on read
time code method user_id
2015-06-01 10:07:11 200 GET
2015-06-01 10:10:12 “200” GET
2015-06-01 10:10:20 200 GET
2015-06-01 10:11:30 200 POST
2015-06-01 10:20:45 200 GET
2015-06-01 10:33:50 400 GET 206
2015-06-01 10:40:11 200 GET 852
2015-06-01 10:51:32 200 PUT 1223
2015-06-01 10:58:02 200 GET 5118
2015-06-01 11:02:11 404 GET 12
2015-06-01 11:14:27 200 GET 3447
access_logs table
User added a new
column “user_id” in
imported data
User can select
this column with
only adding it to
the schema
(w/o reconstruct
the table)
Schema on read
Columnar file format
time code method user_id
2015-06-01 10:07:11 200 GET
2015-06-01 10:10:12 “200” GET
2015-06-01 10:10:20 200 GET
2015-06-01 10:11:30 200 POST
2015-06-01 10:20:45 200 GET
2015-06-01 10:33:50 400 GET 206
2015-06-01 10:40:11 200 GET 852
2015-06-01 10:51:32 200 PUT 1223
2015-06-01 10:58:02 200 GET 5118
2015-06-01 11:02:11 404 GET 12
2015-06-01 11:14:27 200 GET 3447
access_logs table
time
code
method
user_id
Columnar file
format
This query accesses
only code column


select code,
count(1) from tbl
group by code
td-presto connector
• MessagePack v07
• off heap
• avoiding “TypeProfile”
• Async IO with Jetty-client
• Scheduling & Resource
management
How we manage Presto
• Blue-Green Deployment
• Stress test tool
• Monitoring with DataDog
Blue-Green Deployment
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
result bucket
(S3)
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
production
rc
Blue-Green Deployment
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
result bucket
(S3)
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
production
rcTest Test Test!
Blue-Green Deployment
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
result bucket
(S3)
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
production!Release stable cluster
No downtime
Stress test tool
• Collect queries that has ever caused issues.
• Add a new query with just adding this entry.
• Issue the query, gets the result and
implements a calculated digest
automatically.

• We can send all the queries including very
heavy ones (around 6000 stages) to Presto
- job_id: 28889999
- result: 227d16d801a9a43148c2b7149ce4657c
- job_id: 28889999
Stress test tool
• Collect queries that has ever caused issues.
• Add a new query with just adding this entry.
• Issue the query, gets the result and
implements a calculated digest
automatically.

• We can send all the queries including very
heavy ones (around 6000 stages) to Presto
- job_id: 28889999
- result: 227d16d801a9a43148c2b7149ce4657c
- job_id: 28889999
Stress test tool
• Collect queries that has ever caused issues.
• Add a new query with just adding this entry.
• Issue the query, gets the result and
implements a calculated digest
automatically.

• We can send all the queries including very
heavy ones (around 6000 stages) to Presto
- job_id: 28889999
- result: 227d16d801a9a43148c2b7149ce4657c
- job_id: 28889999
Stress test tool
• Collect queries that has ever caused issues.
• Add a new query with just adding this entry.
• Issue the query, gets the result and
implements a calculated digest
automatically.

• We can send all the queries including very
heavy ones (around 6000 stages) to Presto
- job_id: 28889999
- result: 227d16d801a9a43148c2b7149ce4657c
- job_id: 28889999
Monitoring with DataDog
Presto coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
process
td-agent
in_presto_metrics/v1/jmx/mbean
/v1/query
/v1/node
out_metricsense
DataDog
Monitoring with DataDog
Query stalled time
- Most important for us.
- It triggers alert calls to us…
- It can be mainly increased by td-presto
connector problems. Most of them are race
condition issue.
How many queries processed
More than 20000 queries / day
How many queries processed
Most queries finish within 1min
Analytics Infrastructure, Simplified in the Cloud
We’re hiring! https://jobs.lever.co/treasure-data

More Related Content

What's hot (20)

PDF
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
PDF
SOS: Optimizing Shuffle I/O with Brian Cho and Ergin Seyfe
Databricks
 
PDF
Secondary Index Search in InnoDB
MIJIN AN
 
PDF
Stream processing with Apache Flink (Timo Walther - Ververica)
KafkaZone
 
PDF
Monkey-patching in Python: a magic trick or a powerful tool?
Elizaveta Shashkova
 
PDF
Hive tuning
Michael Zhang
 
PPTX
Presto overview
Shixiong Zhu
 
PDF
Introduction to Apache Calcite
Jordan Halterman
 
PDF
Etsy Activity Feeds Architecture
Dan McKinley
 
PDF
Container Performance Analysis
Brendan Gregg
 
PDF
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
Databricks
 
PDF
Hadoop Strata Talk - Uber, your hadoop has arrived
Vinoth Chandar
 
PDF
Data Monitoring with whylogs
Alexey Grigorev
 
PPTX
Split Miner: Discovering Accurate and Simple Business Process Models from Eve...
Marlon Dumas
 
PDF
Sparklens: Understanding the Scalability Limits of Spark Applications with R...
Databricks
 
PPTX
Data oriented design and c++
Mike Acton
 
PDF
The Graph Database Universe: Neo4j Overview
Neo4j
 
PDF
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
Databricks
 
PPTX
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Monica Beckwith
 
PDF
RocksDB Performance and Reliability Practices
Yoshinori Matsunobu
 
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
SOS: Optimizing Shuffle I/O with Brian Cho and Ergin Seyfe
Databricks
 
Secondary Index Search in InnoDB
MIJIN AN
 
Stream processing with Apache Flink (Timo Walther - Ververica)
KafkaZone
 
Monkey-patching in Python: a magic trick or a powerful tool?
Elizaveta Shashkova
 
Hive tuning
Michael Zhang
 
Presto overview
Shixiong Zhu
 
Introduction to Apache Calcite
Jordan Halterman
 
Etsy Activity Feeds Architecture
Dan McKinley
 
Container Performance Analysis
Brendan Gregg
 
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
Databricks
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Vinoth Chandar
 
Data Monitoring with whylogs
Alexey Grigorev
 
Split Miner: Discovering Accurate and Simple Business Process Models from Eve...
Marlon Dumas
 
Sparklens: Understanding the Scalability Limits of Spark Applications with R...
Databricks
 
Data oriented design and c++
Mike Acton
 
The Graph Database Universe: Neo4j Overview
Neo4j
 
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
Databricks
 
Garbage First Garbage Collector (G1 GC) - Migration to, Expectations and Adva...
Monica Beckwith
 
RocksDB Performance and Reliability Practices
Yoshinori Matsunobu
 

Viewers also liked (20)

PPSX
Bq sushi(BigQuery lessons learned)
(shibao)芝尾 (kouichiro)幸一郎
 
PPTX
Tuning Tips
Jun Shimizu
 
PPT
SQLチューニング勉強会資料
Shinnosuke Akita
 
PPT
障害とオペミスに備える! ~Oracle Databaseのバックアップを考えよう~
Shinnosuke Akita
 
PDF
[db tech showcase Sapporo 2015] B15:ビッグデータ/クラウドにデータ連携自由自在 (オンプレミス ↔ クラウド ↔ クラ...
Insight Technology, Inc.
 
PDF
[B31] LOGMinerってレプリケーションソフトで使われているけどどうなってる? by Toshiya Morita
Insight Technology, Inc.
 
PDF
DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜
Michitoshi Yoshida
 
PDF
Presto at Hadoop Summit 2016
kbajda
 
PDF
SQL Developerって必要ですか? 株式会社コーソル 河野 敏彦
CO-Sol for Community
 
PDF
監査ログをもっと身近に!〜統合監査のすすめ〜
Michitoshi Yoshida
 
PPT
35歳でDBAになった私がデータベースを壊して学んだこと
Shinnosuke Akita
 
PDF
おじさん二人が語る OOW デビューのススメ! Oracle OpenWorld 2016参加報告 [検閲版] 株式会社コーソル 杉本 篤信, 河野 敏彦
CO-Sol for Community
 
PDF
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い - Database Lounge Tokyo #2
Ryota Watabe
 
PDF
Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能
Ryota Watabe
 
PPTX
進化したのはサーバだけじゃない!〜DBA の毎日をもっと豊かにするユーティリティのすすめ〜
Michitoshi Yoshida
 
PDF
Oracle SQL Developerを使い倒そう! 株式会社コーソル 守田 典男
CO-Sol for Community
 
PDF
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い
Ryota Watabe
 
PDF
Study: The Future of VR, AR and Self-Driving Cars
LinkedIn
 
PDF
UX, ethnography and possibilities: for Libraries, Museums and Archives
Ned Potter
 
PDF
Designing Teams for Emerging Challenges
Aaron Irizarry
 
Bq sushi(BigQuery lessons learned)
(shibao)芝尾 (kouichiro)幸一郎
 
Tuning Tips
Jun Shimizu
 
SQLチューニング勉強会資料
Shinnosuke Akita
 
障害とオペミスに備える! ~Oracle Databaseのバックアップを考えよう~
Shinnosuke Akita
 
[db tech showcase Sapporo 2015] B15:ビッグデータ/クラウドにデータ連携自由自在 (オンプレミス ↔ クラウド ↔ クラ...
Insight Technology, Inc.
 
[B31] LOGMinerってレプリケーションソフトで使われているけどどうなってる? by Toshiya Morita
Insight Technology, Inc.
 
DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜
Michitoshi Yoshida
 
Presto at Hadoop Summit 2016
kbajda
 
SQL Developerって必要ですか? 株式会社コーソル 河野 敏彦
CO-Sol for Community
 
監査ログをもっと身近に!〜統合監査のすすめ〜
Michitoshi Yoshida
 
35歳でDBAになった私がデータベースを壊して学んだこと
Shinnosuke Akita
 
おじさん二人が語る OOW デビューのススメ! Oracle OpenWorld 2016参加報告 [検閲版] 株式会社コーソル 杉本 篤信, 河野 敏彦
CO-Sol for Community
 
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い - Database Lounge Tokyo #2
Ryota Watabe
 
Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能
Ryota Watabe
 
進化したのはサーバだけじゃない!〜DBA の毎日をもっと豊かにするユーティリティのすすめ〜
Michitoshi Yoshida
 
Oracle SQL Developerを使い倒そう! 株式会社コーソル 守田 典男
CO-Sol for Community
 
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い
Ryota Watabe
 
Study: The Future of VR, AR and Self-Driving Cars
LinkedIn
 
UX, ethnography and possibilities: for Libraries, Museums and Archives
Ned Potter
 
Designing Teams for Emerging Challenges
Aaron Irizarry
 
Ad

Similar to Presto in Treasure Data (presented at db tech showcase Sapporo 2015) (20)

PDF
Presto in Treasure Data
Mitsunori Komatsu
 
PDF
Introduction to Presto at Treasure Data
Taro L. Saito
 
PDF
Write Faster SQL with Trino.pdf
Eric Xiao
 
PDF
Tajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choi
Data Con LA
 
PDF
Facebook Presto presentation
Cyanny LIANG
 
PDF
Database & Technology 1 _ Tom Kyte _ SQL Techniques.pdf
InSync2011
 
PPTX
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Gruter
 
PPTX
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Sudhir Mallem
 
PDF
Dynamodb
Jean-Tiare LE BIGOT
 
PPTX
Hibernate collection & association mapping
Sharafat Ibn Mollah Mosharraf
 
PDF
Apache Big Data EU 2015 - Phoenix
Nick Dimiduk
 
PPTX
The Evolution of a Relational Database Layer over HBase
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PDF
SQL on Hadoop in Taiwan
Treasure Data, Inc.
 
PDF
SQL for Everything at CWT2014
N Masahiro
 
PPTX
High Performance Applications with MongoDB
MongoDB
 
PDF
Presto Fast SQL on Anything
Alluxio, Inc.
 
PDF
Cassandra Day NYC - Cassandra anti patterns
Christopher Batey
 
PPTX
Querying NoSQL with SQL - MIGANG - July 2017
Matthew Groves
 
KEY
Writeable ct es_pgcon_may_2011
David Fetter
 
Presto in Treasure Data
Mitsunori Komatsu
 
Introduction to Presto at Treasure Data
Taro L. Saito
 
Write Faster SQL with Trino.pdf
Eric Xiao
 
Tajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choi
Data Con LA
 
Facebook Presto presentation
Cyanny LIANG
 
Database & Technology 1 _ Tom Kyte _ SQL Techniques.pdf
InSync2011
 
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Gruter
 
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Sudhir Mallem
 
Hibernate collection & association mapping
Sharafat Ibn Mollah Mosharraf
 
Apache Big Data EU 2015 - Phoenix
Nick Dimiduk
 
The Evolution of a Relational Database Layer over HBase
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
SQL on Hadoop in Taiwan
Treasure Data, Inc.
 
SQL for Everything at CWT2014
N Masahiro
 
High Performance Applications with MongoDB
MongoDB
 
Presto Fast SQL on Anything
Alluxio, Inc.
 
Cassandra Day NYC - Cassandra anti patterns
Christopher Batey
 
Querying NoSQL with SQL - MIGANG - July 2017
Matthew Groves
 
Writeable ct es_pgcon_may_2011
David Fetter
 
Ad

Recently uploaded (20)

PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PDF
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PPTX
Build a Custom Agent for Agentic Testing.pptx
klpathrudu
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PPTX
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PDF
Add Background Images to Charts in IBM SPSS Statistics Version 31.pdf
Version 1 Analytics
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PDF
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
PDF
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
PPTX
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
PDF
Salesforce Experience Cloud Consultant.pdf
VALiNTRY360
 
PDF
UITP Summit Meep Pitch may 2025 MaaS Rebooted
campoamor1
 
PPTX
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
PPTX
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
iaas vs paas vs saas :choosing your cloud strategy
CloudlayaTechnology
 
PPTX
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
Build a Custom Agent for Agentic Testing.pptx
klpathrudu
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Add Background Images to Charts in IBM SPSS Statistics Version 31.pdf
Version 1 Analytics
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
Salesforce Experience Cloud Consultant.pdf
VALiNTRY360
 
UITP Summit Meep Pitch may 2025 MaaS Rebooted
campoamor1
 
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
iaas vs paas vs saas :choosing your cloud strategy
CloudlayaTechnology
 
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 

Presto in Treasure Data (presented at db tech showcase Sapporo 2015)

  • 1. Presto in Treasure Data (Updated) Mitsunori Komatsu
  • 2. • Mitsunori Komatsu, 
 Software engineer @ Treasure Data. • Presto, Hive, PlazmaDB, td-android-sdk, 
 td-ios-sdk, Mobile SDK backend,
 embedded-sdk • github:komamitsu,
 msgpack-java committer, Presto contributor, etc… About me
  • 3. Today’s talk • What's Presto? • Pros & Cons • Architecture • Recent updates • Who uses Presto? • How do we use Presto?
  • 5. Fast • Distributed SQL query engine (MPP) • Low latency and good performance • No disk IO • Pipelined execution (not Map Reduce) • Compile a query plan down to byte code • Off heap memory • Suitable for ad-hoc query
  • 6. Pluggable • Pluggable backends (“connectors”) • Cassandra / Hive / JMX / Kafka / MySQL / PostgreSQL / System / TPCH • We can add a new connector by 
 extending SPI • Treasure Data has been developed a connector to access our storage
  • 7. What kind of SQL • Supports ANSI SQL (Not HiveQL) • Easy to use Presto compared to HiveQL • Structural type: Map, Array, JSON, Row • Window functions • Approximate queries • http://blinkdb.org/
  • 8. Limitations • Fails with huge JOIN or DISTINCT • In memory only (broadcast / distributed JOIN) • No grace / hybrid hash join • No fault tolerance • Coordinator is SPOF • No “cost based” optimization • No authentication / authorization • No native ODBC => Prestogres
  • 9. Limitations • Fails with huge JOIN or DISTINCT • In memory only (broadcast / distributed JOIN) • No grace / hybrid hash join • No fault tolerance • Coordinator is SPOF • No “cost based” optimization • No authentication / authorization • No native ODBC => Prestogres https://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/
  • 11. Query plan Output[nationkey, _col1] => [nationkey:bigint, count:bigint]
 - _col1 := count Exchange[GATHER] => nationkey:bigint, count:bigint Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]
 - count := "count"("count_15") Exchange[REPARTITION] => nationkey:bigint, count_15:bigint Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint] - count_15 := "count"("expr") Project => [nationkey:bigint, expr:bigint] - expr := 1 InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, custkey_0:bigint, nationkey:bigint] Project => [custkey:bigint] Filter[("orderpriority" = '1-URGENT')] => [custkey:bigint, orderpriority:varchar] TableScan[tpch:tpch:orders:sf0.01, original constraint=
 ('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]
 - custkey := tpch:custkey:1
 - orderpriority := tpch:orderpriority:5 Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]
 - custkey_0 := tpch:custkey:0
 - nationkey := tpch:nationkey:3 select
 c.nationkey,
 count(1)
 from orders o join customer c
 on o.custkey = c.custkey where o.orderpriority = '1-URGENT' group by c.nationkey
  • 12. Stage 1 Stage 2 Stage 0 Query, stage, task and split Query Task 0.0 Split Task 1.0 Split Task 1.1 Task 1.2 Split Split Split Task 2.0 Split Task 2.1 Task 2.2 Split Split Split Split Split Split Split Split For example… TableScan (FROM) Aggregation (GROUP BY) Output @worker#2 @worker#3 @worker#0
  • 13. Query plan Output[nationkey, _col1] => [nationkey:bigint, count:bigint]
 - _col1 := count Exchange[GATHER] => nationkey:bigint, count:bigint Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]
 - count := "count"("count_15") Exchange[REPARTITION] => nationkey:bigint, count_15:bigint Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint] - count_15 := "count"("expr") Project => [nationkey:bigint, expr:bigint] - expr := 1 InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, custkey_0:bigint, nationkey:bigint] Project => [custkey:bigint] Filter[("orderpriority" = '1-URGENT')] => [custkey:bigint, orderpriority:varchar] TableScan[tpch:tpch:orders:sf0.01, original constraint=
 ('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]
 - custkey := tpch:custkey:1
 - orderpriority := tpch:orderpriority:5 Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]
 - custkey_0 := tpch:custkey:0
 - nationkey := tpch:nationkey:3 select
 c.nationkey,
 count(1)
 from orders o join customer c
 on o.custkey = c.custkey where o.orderpriority = '1-URGENT' group by c.nationkey
  • 14. Query plan Output[nationkey, _col1] => [nationkey:bigint, count:bigint]
 - _col1 := count Exchange[GATHER] => nationkey:bigint, count:bigint Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]
 - count := "count"("count_15") Exchange[REPARTITION] => nationkey:bigint, count_15:bigint Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint] - count_15 := "count"("expr") Project => [nationkey:bigint, expr:bigint] - expr := 1 InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, custkey_0:bigint, nationkey:bigint] Project => [custkey:bigint] Filter[("orderpriority" = '1-URGENT')] => [custkey:bigint, orderpriority:varchar] TableScan[tpch:tpch:orders:sf0.01, original constraint=
 ('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]
 - custkey := tpch:custkey:1
 - orderpriority := tpch:orderpriority:5 Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]
 - custkey_0 := tpch:custkey:0
 - nationkey := tpch:nationkey:3 select
 c.nationkey,
 count(1)
 from orders o join customer c
 on o.custkey = c.custkey where o.orderpriority = '1-URGENT' group by c.nationkey Stage 3
  • 15. Query plan Output[nationkey, _col1] => [nationkey:bigint, count:bigint]
 - _col1 := count Exchange[GATHER] => nationkey:bigint, count:bigint Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]
 - count := "count"("count_15") Exchange[REPARTITION] => nationkey:bigint, count_15:bigint Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint] - count_15 := "count"("expr") Project => [nationkey:bigint, expr:bigint] - expr := 1 InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, custkey_0:bigint, nationkey:bigint] Project => [custkey:bigint] Filter[("orderpriority" = '1-URGENT')] => [custkey:bigint, orderpriority:varchar] TableScan[tpch:tpch:orders:sf0.01, original constraint=
 ('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]
 - custkey := tpch:custkey:1
 - orderpriority := tpch:orderpriority:5 Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]
 - custkey_0 := tpch:custkey:0
 - nationkey := tpch:nationkey:3 select
 c.nationkey,
 count(1)
 from orders o join customer c
 on o.custkey = c.custkey where o.orderpriority = '1-URGENT' group by c.nationkey Stage 3 Stage 2
  • 16. Query plan Output[nationkey, _col1] => [nationkey:bigint, count:bigint]
 - _col1 := count Exchange[GATHER] => nationkey:bigint, count:bigint Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]
 - count := "count"("count_15") Exchange[REPARTITION] => nationkey:bigint, count_15:bigint Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint] - count_15 := "count"("expr") Project => [nationkey:bigint, expr:bigint] - expr := 1 InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, custkey_0:bigint, nationkey:bigint] Project => [custkey:bigint] Filter[("orderpriority" = '1-URGENT')] => [custkey:bigint, orderpriority:varchar] TableScan[tpch:tpch:orders:sf0.01, original constraint=
 ('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]
 - custkey := tpch:custkey:1
 - orderpriority := tpch:orderpriority:5 Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]
 - custkey_0 := tpch:custkey:0
 - nationkey := tpch:nationkey:3 select
 c.nationkey,
 count(1)
 from orders o join customer c
 on o.custkey = c.custkey where o.orderpriority = '1-URGENT' group by c.nationkey Stage 3 Stage 2 Stage 1
  • 17. Query plan Output[nationkey, _col1] => [nationkey:bigint, count:bigint]
 - _col1 := count Exchange[GATHER] => nationkey:bigint, count:bigint Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]
 - count := "count"("count_15") Exchange[REPARTITION] => nationkey:bigint, count_15:bigint Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint] - count_15 := "count"("expr") Project => [nationkey:bigint, expr:bigint] - expr := 1 InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, custkey_0:bigint, nationkey:bigint] Project => [custkey:bigint] Filter[("orderpriority" = '1-URGENT')] => [custkey:bigint, orderpriority:varchar] TableScan[tpch:tpch:orders:sf0.01, original constraint=
 ('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]
 - custkey := tpch:custkey:1
 - orderpriority := tpch:orderpriority:5 Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]
 - custkey_0 := tpch:custkey:0
 - nationkey := tpch:nationkey:3 select
 c.nationkey,
 count(1)
 from orders o join customer c
 on o.custkey = c.custkey where o.orderpriority = '1-URGENT' group by c.nationkey Stage 3 Stage 2 Stage 1 Stage 0
  • 18. What each component does? Presto Cli Coordinator - Parse Query - Analyze Query - Create Query Plan - Execute Query - Contains Stages - Execute Stages - Contains Tasks - Issue Tasks Discovery Service Worker Worker - Execute Tasks - Convert Query to Java Bytecode (Operator) - Execute Operator Connector - MetaData - Table, Column… - SplitManager - Split, … Connector - RecordSetProvider - RecordSet - RecordCursor - Read Storage Connector Storage Worker Connector External
 Metadata?
  • 19. Presto Cli Coordinator - Parse Query - Analyze Query - Create Query Plan - Execute Query - Contains Stages - Execute Stages - Contains Tasks - Issue Tasks Discovery Service Worker Worker - Execute Tasks - Convert Query to Java Bytecode (Operator) - Execute Operator Connector - MetaData - Table, Column… - SplitManager - Split, … Connector - RecordSetProvider - RecordSet - RecordCursor - Read Storage Connector Storage Worker Connector External
 Metadata? What each component does?
  • 20. Presto Cli Coordinator - Parse Query - Analyze Query - Create Query Plan - Execute Query - Contains Stages - Execute Stages - Contains Tasks - Issue Tasks Discovery Service Worker Worker - Execute Tasks - Convert Query to Java Bytecode (Operator) - Execute Operator Connector - MetaData - Table, Column… - SplitManager - Split, … Connector - RecordSetProvider - RecordSet - RecordCursor - Read Storage Connector Storage Worker Connector External
 Metadata? What each component does?
  • 21. Presto Cli Coordinator - Parse Query - Analyze Query - Create Query Plan - Execute Query - Contains Stages - Execute Stages - Contains Tasks - Issue Tasks Discovery Service Worker Worker - Execute Tasks - Convert Query to Java Bytecode (Operator) - Execute Operator Connector - MetaData - Table, Column… - SplitManager - Split, … Connector - RecordSetProvider - RecordSet - RecordCursor - Read Storage Connector Storage Worker Connector External
 Metadata? What each component does?
  • 22. Presto Cli Coordinator - Parse Query - Analyze Query - Create Query Plan - Execute Query - Contains Stages - Execute Stages - Contains Tasks - Issue Tasks Discovery Service Worker Worker - Execute Tasks - Convert Query to Java Bytecode (Operator) - Execute Operator Connector - MetaData - Table, Column… - SplitManager - Split, … Connector - RecordSetProvider - RecordSet - RecordCursor - Read Storage Connector Storage Worker Connector External
 Metadata? What each component does?
  • 23. Multi connectors Presto MySQL - test.users PostgreSQL - public.accesses - public.codes Raptor - default.
 user_codes create table raptor.default.user_codes as select c.text, u.name, count(1) as count from postgres.public.accesses a join mysql.test.users u on cast(a.user as bigint) = u.id join postgres.public.codes c on a.code = c.code where a.time < 1200000000
  • 25. : 2015-08-30T22:29:28.882+0900 DEBUG 20150830_132930_00032_btyir. 4.0-0-50 com.facebook.presto.plugin.jdbc.JdbcRecordCursor Executing: SELECT `id`, `name` FROM `test`.`users` : 2015-08-30T22:29:28.856+0900 DEBUG 20150830_132930_00032_btyir. 5.0-0-56 com.facebook.presto.plugin.jdbc.JdbcRecordCursor Executing: SELECT "text", "code" FROM “public"."codes" : 2015-08-30T22:30:09.294+0900 DEBUG 20150830_132930_00032_btyir. 3.0-2-70 com.facebook.presto.plugin.jdbc.JdbcRecordCursor Executing: SELECT "user", "code", "time" FROM "public"."accesses" WHERE (("time" < 1200000000)) : - log message Condition pushdown Join pushdown isn’t supported Multi connectors
  • 26. Recent updates (0.108~0.117) New functions: normalize(), from_iso8601_timestamp(), from_iso8601_date(), to_iso8601() slice(), md5(), array_min(), array_max(), histogram() element_at(), url_encode(), url_decode() sha1(), sha256(), sha512() multimap_agg(), checksum() Teradata compatibility functions: index(), char2hexint(), to_char(), to_date(), to_timestamp()
  • 27. Recent updates (0.108~0.117) Cluster Resource Management. - query.max-memory - query.max-memory-per-node - resources.reserved-system-memory “Big Query” option was removed Semi-joins are hash-partitioned if distributed_join is turned on. Add support for partial cast from JSON. For example, json can be cast to array<json>, map<varchar, json>, etc. Use JSON_PARSE() and JSON_FORMAT() instead of CAST Add query_max_run_time session property and query.max-run-time config. Queries are failed after the specified duration. optimizer.optimize-hash-generation and distributed-joins-enabled are both enabled by default now.
  • 28. Recent updates (0.108~0.117) Cluster Resource Management. - query.max-memory - query.max-memory-per-node - resources.reserved-system-memory “Big Query” option was removed Semi-joins are hash-partitioned if distributed_join is turned on. Add support for partial cast from JSON. For example, json can be cast to array<json>, map<varchar, json>, etc. Use JSON_PARSE() and JSON_FORMAT() instead of CAST Add query_max_run_time session property and query.max-run-time config. Queries are failed after the specified duration. optimizer.optimize-hash-generation and distributed-joins-enabled are both enabled by default now. https://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/
  • 29. Who uses Presto? • Facebook http://www.slideshare.net/dain1/presto-meetup-2015
  • 32. • Qubole • SaaS • Treasure Data • SaaS • Teradata • commercial support Who uses Presto? As a service…
  • 33. Today’s talk • What's Presto? • How do we use Presto? • What’s Treasure Data? • System architecture • How we manage Presto
  • 35. What’s Treasure Data? and more… • Collect, store and analyze data • With multi-tenancy • With a schema-less structure • On cloud, but mitigates the outage Founded in 2011 in the U.S. Over 85 employees
  • 36. What’s Treasure Data? and more… Founded in 2011 in the U.S. Over 85 employees • Collect, store and analyze data • With multi-tenancy • With a schema-less structure • On cloud, but mitigates the outage
  • 37. What’s Treasure Data? and more… Founded in 2011 in the U.S. Over 85 employees • Collect, store and analyze data • With multi-tenancy • With a schema-less structure • On cloud, but mitigates the outage
  • 38. What’s Treasure Data? and more… Founded in 2011 in the U.S. Over 85 employees • Collect, store and analyze data • With multi-tenancy • With a schema-less structure • On cloud, but mitigates the outage
  • 39. What’s Treasure Data? and more… Founded in 2011 in the U.S. Over 85 employees • Collect, store and analyze data • With multi-tenancy • With a schema-less structure • On cloud, but mitigates the outage
  • 40. Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported Collect! Store! Analyze!
  • 41. Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported Collect! Store! Analyze!
  • 42. Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported Collect! Store! Analyze!
  • 43. Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported Collect! Store! Analyze!
  • 44. Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported Collect! Store! Analyze!
  • 45. Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported Collect! Store! Analyze!
  • 47. Treasure Data in figures
  • 48. Some of our customers
  • 50. Components in Treasure Data worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto worker Presto worker Presto worker Presto worker result bucket (S3/RiakCS) Retry failed query if needed Authentication / Authorization Columnar file format. Schema-less. td-presto connector
  • 51. worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto worker Presto worker Presto worker Presto worker Retry failed query if needed Authentication / Authorization Columnar file format. Schema-less. td-presto connector result bucket (S3/RiakCS) Components in Treasure Data
  • 52. worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto worker Presto worker Presto worker Presto worker Retry failed query if needed Authentication / Authorization Columnar file format. Schema-less. td-presto connector result bucket (S3/RiakCS) Components in Treasure Data
  • 53. worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto worker Presto worker Presto worker Presto worker Retry failed query if needed Authentication / Authorization Columnar file format. Schema-less. td-presto connector result bucket (S3/RiakCS) Components in Treasure Data
  • 54. worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto worker Presto worker Presto worker Presto worker Retry failed query if needed Authentication / Authorization Columnar file format. Schema-less. td-presto connector result bucket (S3/RiakCS) Components in Treasure Data
  • 55. worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto worker Presto worker Presto worker Presto worker Retry failed query if needed Authentication / Authorization Columnar file format. Schema-less. td-presto connector result bucket (S3/RiakCS) Components in Treasure Data
  • 56. worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto worker Presto worker Presto worker Presto worker Retry failed query if needed Authentication / Authorization Columnar file format. Schema-less. td-presto connector result bucket (S3/RiakCS) Components in Treasure Data
  • 57. worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto worker Presto worker Presto worker Presto worker Retry failed query if needed Authentication / Authorization Columnar file format. Schema-less. td-presto connector result bucket (S3/RiakCS) Components in Treasure Data
  • 58. Schema on read time code method user_id 2015-06-01 10:07:11 200 GET 2015-06-01 10:10:12 “200” GET 2015-06-01 10:10:20 200 GET 2015-06-01 10:11:30 200 POST 2015-06-01 10:20:45 200 GET 2015-06-01 10:33:50 400 GET 206 2015-06-01 10:40:11 200 GET 852 2015-06-01 10:51:32 200 PUT 1223 2015-06-01 10:58:02 200 GET 5118 2015-06-01 11:02:11 404 GET 12 2015-06-01 11:14:27 200 GET 3447 access_logs table User added a new column “user_id” in imported data User can select this column with only adding it to the schema (w/o reconstruct the table) Schema on read
  • 59. Columnar file format time code method user_id 2015-06-01 10:07:11 200 GET 2015-06-01 10:10:12 “200” GET 2015-06-01 10:10:20 200 GET 2015-06-01 10:11:30 200 POST 2015-06-01 10:20:45 200 GET 2015-06-01 10:33:50 400 GET 206 2015-06-01 10:40:11 200 GET 852 2015-06-01 10:51:32 200 PUT 1223 2015-06-01 10:58:02 200 GET 5118 2015-06-01 11:02:11 404 GET 12 2015-06-01 11:14:27 200 GET 3447 access_logs table time code method user_id Columnar file format This query accesses only code column 
 select code, count(1) from tbl group by code
  • 60. td-presto connector • MessagePack v07 • off heap • avoiding “TypeProfile” • Async IO with Jetty-client • Scheduling & Resource management
  • 61. How we manage Presto • Blue-Green Deployment • Stress test tool • Monitoring with DataDog
  • 62. Blue-Green Deployment worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator result bucket (S3) Presto coordinator Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker production rc
  • 63. Blue-Green Deployment worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator result bucket (S3) Presto coordinator Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker production rcTest Test Test!
  • 64. Blue-Green Deployment worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator result bucket (S3) Presto coordinator Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker production!Release stable cluster No downtime
  • 65. Stress test tool • Collect queries that has ever caused issues. • Add a new query with just adding this entry. • Issue the query, gets the result and implements a calculated digest automatically.
 • We can send all the queries including very heavy ones (around 6000 stages) to Presto - job_id: 28889999 - result: 227d16d801a9a43148c2b7149ce4657c - job_id: 28889999
  • 66. Stress test tool • Collect queries that has ever caused issues. • Add a new query with just adding this entry. • Issue the query, gets the result and implements a calculated digest automatically.
 • We can send all the queries including very heavy ones (around 6000 stages) to Presto - job_id: 28889999 - result: 227d16d801a9a43148c2b7149ce4657c - job_id: 28889999
  • 67. Stress test tool • Collect queries that has ever caused issues. • Add a new query with just adding this entry. • Issue the query, gets the result and implements a calculated digest automatically.
 • We can send all the queries including very heavy ones (around 6000 stages) to Presto - job_id: 28889999 - result: 227d16d801a9a43148c2b7149ce4657c - job_id: 28889999
  • 68. Stress test tool • Collect queries that has ever caused issues. • Add a new query with just adding this entry. • Issue the query, gets the result and implements a calculated digest automatically.
 • We can send all the queries including very heavy ones (around 6000 stages) to Presto - job_id: 28889999 - result: 227d16d801a9a43148c2b7149ce4657c - job_id: 28889999
  • 69. Monitoring with DataDog Presto coordinator Presto worker Presto worker Presto worker Presto worker Presto process td-agent in_presto_metrics/v1/jmx/mbean /v1/query /v1/node out_metricsense DataDog
  • 70. Monitoring with DataDog Query stalled time - Most important for us. - It triggers alert calls to us… - It can be mainly increased by td-presto connector problems. Most of them are race condition issue.
  • 71. How many queries processed More than 20000 queries / day
  • 72. How many queries processed Most queries finish within 1min
  • 73. Analytics Infrastructure, Simplified in the Cloud We’re hiring! https://jobs.lever.co/treasure-data