SlideShare a Scribd company logo
What's New in SQL-on-Hadoop and Beyond
Martin Traverso, Facebook
Kamil Bajda-Pawlikowski, Teradata
Agenda
● Introduction
● Presto at Facebook
● Presto users and use cases
● New features
● Roadmap
Introduction
What is Presto
● Open source distributed SQL engine
● ANSI SQL syntax
● Custom built for interactive analytic queries
● Queries data across multiple data stores
● Flexible deployment (on premise or cloud)
● Extensible
What's new in SQL on Hadoop and Beyond
Presto at Facebook
Presto @ Facebook
● Ad-hoc/interactive queries for Hadoop warehouse
● Batch processing for Hadoop warehouse
● Analytics for user-facing products
● Analytics over various specialized stores
Hadoop Warehouse - Stats
● 1000s of internal daily active users
● Millions of queries each month
● Scan PBs of data every day
● Process trillions of rows every day
● 10s of concurrent queries
Hadoop Warehouse - Batch
Presto for User-facing Products
● Requirements
○ Hundreds of ms to seconds latency, low variability
○ Availability
○ Update semantics
○ 10 - 15 way joins
● Stats
○ > 99.99% query success rate
○ 100% system availability
○ 25 - 200 concurrent queries
○ 1 - 20 queries per second
○ <100ms - 5s latency
Presto with Raptor
● Large data sets (petabytes)
● Milliseconds to seconds latency
● Predictable performance
● 5-15 minute load latency
● Reliable data loads (no duplicates, no missing data)
● High availability
● 10s of concurrent queries
Presto users and use cases
Presto users
See more at https://github.com/prestodb/presto/wiki/Presto-Users
Netflix stats
Interactive, reporting, and app-driven queries
Data warehouse: 40PB in S3
~250 nodes across multiple clusters
~650 users with ~6K+ queries/day
Twitter stats
Ad-hoc and low-latency queries
~200 nodes dedicated to Presto
Parquet with nested data structures
Uber stats
2 clusters
100+ machines
2000+ queries per day
HDFS on premise
FINRA stats
120+ EC2 nodes (r3.4xlarge)
2+ PBs of data on S3 (bzip2 & orc)
200+ users
Distro supported by Teradata
New features
SQL features
● DDL syntax
CREATE / ALTER / DROP TABLE
● DML syntax
INSERT / DELETE
● SQL features:
Data types: DECIMAL, VARCHAR(n), INT, SMALLINT, TINYINT
CUBE, ROLLUP, GROUPING SETS
INTERSECT
Non-equi joins
Uncorrelated subqueries
Other features
● Performance
Join and aggregation optimizations
● Connectors
Redis
MongoDB
● Kerberos
● Presto-Admin
● Ambari and YARN (via Apache Slider)
● Enterprise-grade ODBC & JDBC drivers
● BI tools certifications
Information Builders, Looker, MicroStrategy, MS Power BI, Qlik, Tableau, ZoomData
Drivers and BI tools
Roadmap
Short term
● LDAP
● SQL features
Data types: FLOAT, CHAR(n), VAR/BINARY(n)
EXISTS, EXCEPT
Correlated subqueries
Lambda expressions
Prepared statements
● Connectors
Accumulo (by Bloomberg)
Long term
● Materialized Query Tables
● Workload management
● Spill to disk
● Cost-based Optimizer
See more at https://github.com/prestodb/presto/wiki/Roadmap
More about Presto
GitHub: https://github.com/prestodb & https://github.com/Teradata/presto
Website: http://prestodb.io
Group: https://groups.google.com/group/presto-users
Distro: http://www.teradata.com/presto

More Related Content

What's hot (20)

PPTX
Querying Druid in SQL with Superset
DataWorks Summit
 
PDF
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
Spark Summit
 
PPTX
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
 
PDF
#BDAM: EDW Optimization with Hadoop and CDAP, by Sagar Kapare from Cask
Cask Data
 
PPTX
Innovation in the Enterprise Rent-A-Car Data Warehouse
DataWorks Summit
 
PPT
The Evolution of Big Data Pipelines at Intuit
DataWorks Summit/Hadoop Summit
 
PPTX
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
Cask Data
 
PDF
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
confluent
 
PPTX
Lego-like building blocks of Storm and Spark Streaming Pipelines
DataWorks Summit/Hadoop Summit
 
PDF
High-Scale Entity Resolution in Hadoop
DataWorks Summit/Hadoop Summit
 
PPTX
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Data Con LA
 
PPTX
Solr + Hadoop: Interactive Search for Hadoop
gregchanan
 
PPTX
Debunking Common Myths in Stream Processing
DataWorks Summit/Hadoop Summit
 
PPTX
Building Data Pipelines with Spark and StreamSets
Pat Patterson
 
PPTX
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
 
PPTX
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
DataWorks Summit
 
PDF
About CDAP
Cask Data
 
PDF
Big Telco - Yousun Jeong
Spark Summit
 
PDF
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Spark Summit
 
PPTX
Embeddable data transformation for real time streams
Joey Echeverria
 
Querying Druid in SQL with Superset
DataWorks Summit
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
Spark Summit
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
 
#BDAM: EDW Optimization with Hadoop and CDAP, by Sagar Kapare from Cask
Cask Data
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
DataWorks Summit
 
The Evolution of Big Data Pipelines at Intuit
DataWorks Summit/Hadoop Summit
 
"Who Moved my Data? - Why tracking changes and sources of data is critical to...
Cask Data
 
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
confluent
 
Lego-like building blocks of Storm and Spark Streaming Pipelines
DataWorks Summit/Hadoop Summit
 
High-Scale Entity Resolution in Hadoop
DataWorks Summit/Hadoop Summit
 
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Data Con LA
 
Solr + Hadoop: Interactive Search for Hadoop
gregchanan
 
Debunking Common Myths in Stream Processing
DataWorks Summit/Hadoop Summit
 
Building Data Pipelines with Spark and StreamSets
Pat Patterson
 
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
 
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
DataWorks Summit
 
About CDAP
Cask Data
 
Big Telco - Yousun Jeong
Spark Summit
 
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Spark Summit
 
Embeddable data transformation for real time streams
Joey Echeverria
 

Viewers also liked (20)

PPTX
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
 
PPTX
Producing Spark on YARN for ETL
DataWorks Summit/Hadoop Summit
 
PPTX
A Multi Colored YARN
DataWorks Summit/Hadoop Summit
 
PPTX
Knowledge from Noise
DataWorks Summit/Hadoop Summit
 
PPTX
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
DataWorks Summit/Hadoop Summit
 
PPTX
Simplified Cluster Operation & Troubleshooting
DataWorks Summit/Hadoop Summit
 
PPTX
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
DataWorks Summit/Hadoop Summit
 
PDF
SQL on Hadoop
nvvrajesh
 
PPTX
Hybrid & Logical Data Warehouse
Heungsoon Yang
 
PDF
Data Virtualization Reference Architectures: Correctly Architecting your Solu...
Denodo
 
PPTX
Scheduling Policies in YARN
DataWorks Summit/Hadoop Summit
 
PPTX
Introduction to sentry
mozillazg
 
PDF
Supporting Data Services Marketplace using Data Virtualization
Denodo
 
PPTX
End-to-End Security and Auditing in a Big Data as a Service Deployment
DataWorks Summit/Hadoop Summit
 
PPTX
Apache HBase: State of the Union
DataWorks Summit/Hadoop Summit
 
PPTX
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
DataWorks Summit/Hadoop Summit
 
PDF
Apache Sentry for Hadoop security
bigdatagurus_meetup
 
PPTX
Quark Virtualization Engine for Analytics
DataWorks Summit/Hadoop Summit
 
PPTX
Operating and Supporting Apache HBase Best Practices and Improvements
DataWorks Summit/Hadoop Summit
 
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
 
Producing Spark on YARN for ETL
DataWorks Summit/Hadoop Summit
 
A Multi Colored YARN
DataWorks Summit/Hadoop Summit
 
Knowledge from Noise
DataWorks Summit/Hadoop Summit
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
DataWorks Summit/Hadoop Summit
 
Simplified Cluster Operation & Troubleshooting
DataWorks Summit/Hadoop Summit
 
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
DataWorks Summit/Hadoop Summit
 
SQL on Hadoop
nvvrajesh
 
Hybrid & Logical Data Warehouse
Heungsoon Yang
 
Data Virtualization Reference Architectures: Correctly Architecting your Solu...
Denodo
 
Scheduling Policies in YARN
DataWorks Summit/Hadoop Summit
 
Introduction to sentry
mozillazg
 
Supporting Data Services Marketplace using Data Virtualization
Denodo
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
DataWorks Summit/Hadoop Summit
 
Apache HBase: State of the Union
DataWorks Summit/Hadoop Summit
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
DataWorks Summit/Hadoop Summit
 
Apache Sentry for Hadoop security
bigdatagurus_meetup
 
Quark Virtualization Engine for Analytics
DataWorks Summit/Hadoop Summit
 
Operating and Supporting Apache HBase Best Practices and Improvements
DataWorks Summit/Hadoop Summit
 
Ad

Similar to What's new in SQL on Hadoop and Beyond (20)

PDF
Presto at Hadoop Summit 2016
kbajda
 
PDF
Presto - Analytical Database. Overview and use cases.
Wojciech Biela
 
PPTX
Presto: SQL-on-anything
DataWorks Summit
 
PDF
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
kbajda
 
PDF
Presto Strata Hadoop SJ 2016 short talk
kbajda
 
PDF
SQL on Hadoop in Taiwan
Treasure Data, Inc.
 
PDF
Boston Hadoop Meetup: Presto for the Enterprise
Matt Fuller
 
PDF
SQL for Everything at CWT2014
N Masahiro
 
ODP
Presto
Knoldus Inc.
 
PDF
Presto - Hadoop Conference Japan 2014
Sadayuki Furuhashi
 
PDF
Presto - SQL on anything
Grzegorz Kokosiński
 
PPTX
Presto for the Enterprise @ Hadoop Meetup
Wojciech Biela
 
PDF
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
viirya
 
PDF
Presto@Uber
Zhenxiao Luo
 
PDF
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Dipti Borkar
 
PDF
Presto @ Zalando - Big Data Tech Warsaw 2020
Piotr Findeisen
 
PDF
Facebook Presto presentation
Cyanny LIANG
 
PPTX
Open Source SQL for Hadoop: Where are we and Where are we Going?
DataWorks Summit
 
PPTX
Big dataproposal
Qubole
 
PDF
Understanding Presto - Presto meetup @ Tokyo #1
Sadayuki Furuhashi
 
Presto at Hadoop Summit 2016
kbajda
 
Presto - Analytical Database. Overview and use cases.
Wojciech Biela
 
Presto: SQL-on-anything
DataWorks Summit
 
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
kbajda
 
Presto Strata Hadoop SJ 2016 short talk
kbajda
 
SQL on Hadoop in Taiwan
Treasure Data, Inc.
 
Boston Hadoop Meetup: Presto for the Enterprise
Matt Fuller
 
SQL for Everything at CWT2014
N Masahiro
 
Presto
Knoldus Inc.
 
Presto - Hadoop Conference Japan 2014
Sadayuki Furuhashi
 
Presto - SQL on anything
Grzegorz Kokosiński
 
Presto for the Enterprise @ Hadoop Meetup
Wojciech Biela
 
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
viirya
 
Presto@Uber
Zhenxiao Luo
 
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Dipti Borkar
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Piotr Findeisen
 
Facebook Presto presentation
Cyanny LIANG
 
Open Source SQL for Hadoop: Where are we and Where are we Going?
DataWorks Summit
 
Big dataproposal
Qubole
 
Understanding Presto - Presto meetup @ Tokyo #1
Sadayuki Furuhashi
 
Ad

More from DataWorks Summit/Hadoop Summit (20)

PPT
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
PPT
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
PDF
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
PDF
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
PDF
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
PDF
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
PDF
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
PPTX
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
PPTX
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
PDF
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
PPTX
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
PPTX
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
PPTX
HBase in Practice
DataWorks Summit/Hadoop Summit
 
PPTX
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
PDF
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
PPTX
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
PPTX
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 

Recently uploaded (20)

PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Advancing WebDriver BiDi support in WebKit
Igalia
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
July Patch Tuesday
Ivanti
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Advancing WebDriver BiDi support in WebKit
Igalia
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Biography of Daniel Podor.pdf
Daniel Podor
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 

What's new in SQL on Hadoop and Beyond

  • 1. What's New in SQL-on-Hadoop and Beyond Martin Traverso, Facebook Kamil Bajda-Pawlikowski, Teradata
  • 2. Agenda ● Introduction ● Presto at Facebook ● Presto users and use cases ● New features ● Roadmap
  • 4. What is Presto ● Open source distributed SQL engine ● ANSI SQL syntax ● Custom built for interactive analytic queries ● Queries data across multiple data stores ● Flexible deployment (on premise or cloud) ● Extensible
  • 7. Presto @ Facebook ● Ad-hoc/interactive queries for Hadoop warehouse ● Batch processing for Hadoop warehouse ● Analytics for user-facing products ● Analytics over various specialized stores
  • 8. Hadoop Warehouse - Stats ● 1000s of internal daily active users ● Millions of queries each month ● Scan PBs of data every day ● Process trillions of rows every day ● 10s of concurrent queries
  • 10. Presto for User-facing Products ● Requirements ○ Hundreds of ms to seconds latency, low variability ○ Availability ○ Update semantics ○ 10 - 15 way joins ● Stats ○ > 99.99% query success rate ○ 100% system availability ○ 25 - 200 concurrent queries ○ 1 - 20 queries per second ○ <100ms - 5s latency
  • 11. Presto with Raptor ● Large data sets (petabytes) ● Milliseconds to seconds latency ● Predictable performance ● 5-15 minute load latency ● Reliable data loads (no duplicates, no missing data) ● High availability ● 10s of concurrent queries
  • 12. Presto users and use cases
  • 13. Presto users See more at https://github.com/prestodb/presto/wiki/Presto-Users
  • 14. Netflix stats Interactive, reporting, and app-driven queries Data warehouse: 40PB in S3 ~250 nodes across multiple clusters ~650 users with ~6K+ queries/day
  • 15. Twitter stats Ad-hoc and low-latency queries ~200 nodes dedicated to Presto Parquet with nested data structures
  • 16. Uber stats 2 clusters 100+ machines 2000+ queries per day HDFS on premise
  • 17. FINRA stats 120+ EC2 nodes (r3.4xlarge) 2+ PBs of data on S3 (bzip2 & orc) 200+ users Distro supported by Teradata
  • 19. SQL features ● DDL syntax CREATE / ALTER / DROP TABLE ● DML syntax INSERT / DELETE ● SQL features: Data types: DECIMAL, VARCHAR(n), INT, SMALLINT, TINYINT CUBE, ROLLUP, GROUPING SETS INTERSECT Non-equi joins Uncorrelated subqueries
  • 20. Other features ● Performance Join and aggregation optimizations ● Connectors Redis MongoDB ● Kerberos ● Presto-Admin ● Ambari and YARN (via Apache Slider)
  • 21. ● Enterprise-grade ODBC & JDBC drivers ● BI tools certifications Information Builders, Looker, MicroStrategy, MS Power BI, Qlik, Tableau, ZoomData Drivers and BI tools
  • 23. Short term ● LDAP ● SQL features Data types: FLOAT, CHAR(n), VAR/BINARY(n) EXISTS, EXCEPT Correlated subqueries Lambda expressions Prepared statements ● Connectors Accumulo (by Bloomberg)
  • 24. Long term ● Materialized Query Tables ● Workload management ● Spill to disk ● Cost-based Optimizer See more at https://github.com/prestodb/presto/wiki/Roadmap
  • 25. More about Presto GitHub: https://github.com/prestodb & https://github.com/Teradata/presto Website: http://prestodb.io Group: https://groups.google.com/group/presto-users Distro: http://www.teradata.com/presto