SlideShare a Scribd company logo
Presto: SQL-on-Anything
DataWorks Summit, San Jose 2017
Martin Traverso, Facebook
Matt Fuller, Teradata
Let’s Make Some Noise!
@DataWorksSummit
#DWS17
#prestodb
#facebook
#teradata
What is Presto?
• Open source distributed SQL query engine
• Originally developed by Facebook
• ANSI SQL compliant
• Like Hive, it’s not a database
• Key Differentiators
• Performance & Scale
• Cross platform query capability, not only SQL on Hadoop
• Supports federated queries
• Used in production at many well known web-scale companies
• Distributed under the Apache License, hosted on GitHub
Presto: SQL-on-anything
Multiple clusters (1000s of nodes
total)
300PB in HDFS, MySQL, and Raptor
1000s users, 10-100s concurrent
queries
Presto in Action
250+ nodes on AWS
40+ PB stored in S3 (Parquet)
Over 650 users with 6K+ queries
daily
Presto in Action
300+ nodes (2 dedicated clusters)
100K+ & 20K+ queries daily
Presto in Action
200+ nodes on-premises
Parquet nested data
Presto in Action
120+ nodes in AWS
2PB is S3 and 200+ users
supported by Teradata
Presto in Action
Data Stream API
Worker
Data Stream API
Worker
Coordinator
Metadata
API
Parser/
analyzer
Planner Scheduler
Worker
Client
Data Location
API
PluggablePresto Architecture
Presto is not a Database!
• Presto is a distributed query execution engine
• Storage Independent
• Pluggable extensions
• Connectors
• Functions
• Types
• System access controllers
• Resource group configuration managers
• Event listeners
• ...
• Built-in core functionalities
• parser, execution, types, sql functions, monitoring
Parser/
analyzer
Planner
Worker
Metadata API
Hive
Cassandra
Kafka
MySQL
…
Scheduler
Coordinator
Presto Extensibility - Connector
Data Location API
Hive
Cassandra
Kafka
MySQL
…
Data Stream API
Hive
Cassandra
Kafka
MySQL
…
Amazon S3
Presto Connectors
Plugins
Classloader Isolation
Plugin Interface
Connector Configuration
• Catalog namespace owned by connector
Catalog Name
Connector Configuration
getTableHandle(...)
Table Handle
getTableMetadata(...)
Query Analysis
Query Planning
getSplits(...)
Iterator<Split>
Query Execution
Split
• Handle to logical chunk of a table
• Attributes
• Remote access?
• Location
Coordinator
Worker
Worker
Worker
+ splits
Query Execution
+ splits
+ splits
getNextPage()
Table Scan
Operator
PageSource
Page
Filter
Operator
Aggregation
Operator
Query Execution
Presto Connectors @Facebook
• Hive connector
• Warehouse (ad-hoc / batch)
• Raptor connector
• Dashboards
• Reporting backend for A/B testing framework
• Sharded MySQL connector
• Reporting backend for user-facing products
• Other custom connectors for specialized data stores
Presto Connectors @Teradata Customers
• Teradata QueryGrid + Presto
• Teradata, Hadoop, S3, Cassandra, RESTful
• Customer Use Cases
• Recent sales data in Teradata needs to be joined with archived sales data that
resided in Hadoop
• Hadoop user using Presto needs to access pre-computed financial record in
Teradata
• Existing supplier data that is in Teradata is joined with archived product data that
resides in Amazon S3
AMP
AMP
AMP
AMP
Q
G
E
x
c
h
a
n
g
e
Q
G
E
x
c
h
a
n
g
e
PE Coordinator
Worker Thread
Worker Thread
Worker Thread
Worker Thread
Init & metadata exchange
Bi-directional
fully parallel
data exchange
TERADATA PRESTO
• Key features:
• Low latency
• High performance
• Concurrency
• Pushdown
• Data conversion
• Compression
• Efficient CPU usage
Teradata QueryGrid (powered by Presto)
Teradata QueryGrid SQL Examples
Teradata query joining data from Hadoop via Presto:
SELECT * FROM websales_current UNION ALL SELECT * FROM
websales_archive@presto;
Presto query joining data in Teradata:
SELECT * FROM td.sales.websales_current UNION ALL SELECT
* FROM hive.sales.websales_archive;
Conclusions
• Presto Connector API is expressive
• 3rd Party data source is 1st class citizen
• Single ANSI SQL to rule them all
• Use BI tools on data which is not BI friendly
• Rapid data integration
Write your own connector!
• Issue SQL to GitHub!
• https://developer.github.com/v3/
• SELECT count(*) FROM prestodb.presto.stargazers;
• Connector Example
• https://github.com/prestodb/presto/tree/master/presto-example-http
• Documentation
• https://prestodb.io/docs/current/develop.html
Additional Resources
• Website
• www.prestodb.io
• Presto Users Groups
• www.groups.google.com/group/presto-users
• GitHub:
• www.github.com/prestodb/presto
• www.github.com/Teradata/presto (Teradata’s development “fork”)
Presto: SQL-on-anything

More Related Content

What's hot (20)

PDF
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Databricks
 
PDF
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
PDF
Spark SQL Deep Dive @ Melbourne Spark Meetup
Databricks
 
PDF
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
PDF
Apache Iceberg: An Architectural Look Under the Covers
ScyllaDB
 
PDF
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
PPTX
Presto best practices for Cluster admins, data engineers and analysts
Shubham Tagra
 
PDF
Understanding Presto - Presto meetup @ Tokyo #1
Sadayuki Furuhashi
 
PPTX
Introduction to Redis
TO THE NEW | Technology
 
PDF
Intro to HBase
alexbaranau
 
PDF
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
Altinity Ltd
 
PDF
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
PDF
How to Avoid Common Mistakes When Using Reactor Netty
VMware Tanzu
 
PPTX
iceberg introduction.pptx
Dori Waldman
 
PPTX
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Dremio Corporation
 
PPTX
Apache Tez – Present and Future
DataWorks Summit
 
PDF
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Adam Doyle
 
PDF
Reading The Source Code of Presto
Taro L. Saito
 
PDF
Facebook Messages & HBase
强 王
 
PDF
Facebook Presto presentation
Cyanny LIANG
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Databricks
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Databricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Apache Iceberg: An Architectural Look Under the Covers
ScyllaDB
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
Presto best practices for Cluster admins, data engineers and analysts
Shubham Tagra
 
Understanding Presto - Presto meetup @ Tokyo #1
Sadayuki Furuhashi
 
Introduction to Redis
TO THE NEW | Technology
 
Intro to HBase
alexbaranau
 
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
Altinity Ltd
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
How to Avoid Common Mistakes When Using Reactor Netty
VMware Tanzu
 
iceberg introduction.pptx
Dori Waldman
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Dremio Corporation
 
Apache Tez – Present and Future
DataWorks Summit
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Adam Doyle
 
Reading The Source Code of Presto
Taro L. Saito
 
Facebook Messages & HBase
强 王
 
Facebook Presto presentation
Cyanny LIANG
 

Viewers also liked (8)

PDF
Presto at Hadoop Summit 2016
kbajda
 
PDF
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
kbajda
 
PPTX
Presto: Distributed sql query engine
kiran palaka
 
PDF
Presto - SQL on anything
Grzegorz Kokosiński
 
PDF
Presto @ Facebook: Past, Present and Future
DataWorks Summit
 
PPTX
How to ensure Presto scalability 
in multi use case
Kai Sasaki
 
PDF
Optimizing Presto Connector on Cloud Storage
Kai Sasaki
 
PPTX
Hive, Presto, and Spark on TPC-DS benchmark
Dongwon Kim
 
Presto at Hadoop Summit 2016
kbajda
 
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
kbajda
 
Presto: Distributed sql query engine
kiran palaka
 
Presto - SQL on anything
Grzegorz Kokosiński
 
Presto @ Facebook: Past, Present and Future
DataWorks Summit
 
How to ensure Presto scalability 
in multi use case
Kai Sasaki
 
Optimizing Presto Connector on Cloud Storage
Kai Sasaki
 
Hive, Presto, and Spark on TPC-DS benchmark
Dongwon Kim
 
Ad

Similar to Presto: SQL-on-anything (20)

PDF
What's new in SQL on Hadoop and Beyond
DataWorks Summit/Hadoop Summit
 
PDF
Presto Strata Hadoop SJ 2016 short talk
kbajda
 
PDF
SQL on Hadoop in Taiwan
Treasure Data, Inc.
 
PDF
Presto@Uber
Zhenxiao Luo
 
PDF
Boston Hadoop Meetup: Presto for the Enterprise
Matt Fuller
 
PPTX
Presto for the Enterprise @ Hadoop Meetup
Wojciech Biela
 
PDF
SQL for Everything at CWT2014
N Masahiro
 
PDF
Presto - Hadoop Conference Japan 2014
Sadayuki Furuhashi
 
PDF
Presto @ Zalando - Big Data Tech Warsaw 2020
Piotr Findeisen
 
PDF
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
viirya
 
PPTX
Open Source SQL for Hadoop: Where are we and Where are we Going?
DataWorks Summit
 
PDF
Presto - Analytical Database. Overview and use cases.
Wojciech Biela
 
PDF
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Dipti Borkar
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Matt Fuller
 
PDF
Presto & differences between popular SQL engines (Spark, Redshift, and Hive)
Holden Ackerman
 
PPTX
Level 101 for Presto: What is PrestoDB?
Ali LeClerc
 
PDF
Presto: Query Anything - Data Engineer’s perspective
Alluxio, Inc.
 
PDF
Presto talk @ Global AI conference 2018 Boston
kbajda
 
PPTX
Big dataproposal
Qubole
 
What's new in SQL on Hadoop and Beyond
DataWorks Summit/Hadoop Summit
 
Presto Strata Hadoop SJ 2016 short talk
kbajda
 
SQL on Hadoop in Taiwan
Treasure Data, Inc.
 
Presto@Uber
Zhenxiao Luo
 
Boston Hadoop Meetup: Presto for the Enterprise
Matt Fuller
 
Presto for the Enterprise @ Hadoop Meetup
Wojciech Biela
 
SQL for Everything at CWT2014
N Masahiro
 
Presto - Hadoop Conference Japan 2014
Sadayuki Furuhashi
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Piotr Findeisen
 
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
viirya
 
Open Source SQL for Hadoop: Where are we and Where are we Going?
DataWorks Summit
 
Presto - Analytical Database. Overview and use cases.
Wojciech Biela
 
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Dipti Borkar
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Hello, Enterprise! Meet Presto. (Presto Boston Meetup 10062015)
Matt Fuller
 
Presto & differences between popular SQL engines (Spark, Redshift, and Hive)
Holden Ackerman
 
Level 101 for Presto: What is PrestoDB?
Ali LeClerc
 
Presto: Query Anything - Data Engineer’s perspective
Alluxio, Inc.
 
Presto talk @ Global AI conference 2018 Boston
kbajda
 
Big dataproposal
Qubole
 
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
PPTX
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
DataWorks Summit
 

Recently uploaded (20)

PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
July Patch Tuesday
Ivanti
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 

Presto: SQL-on-anything