SlideShare a Scribd company logo
Real-time Analytics with
Trino and Apache Pinot
Xiang Fu, Elon Azoulay
Oct 22, 2021
Speaker Intro
Xiang Fu
● Co-founder - StarTree
● Previously: Streaming Platform @ Uber
● PMC & Committer: Apache Pinot
Elon Azoulay
● Software Engineer - Stealth Mode Startup
● Previously: Data @ Facebook
Agenda
● Today’s Compromises: Latency vs. Flexibility
● Not trade-off using Apache Pinot
● Trino Pinot Connector
● Benchmark
Today’s Compromises: Latency vs Flexibility
Flexibility takes time: Join on the Fly
customers
orders
customers.state =
‘California’ AND
customers.gender
= ‘Female’
JOIN customers ON
(customers.customer_id
= orders.customer_id)
Group By
customers.city,
Month(orders.date)
sum(orders.amount)
FILTER JOIN GROUP BY AGGREGATION
- Flexible to do any computation
- High query cost: disk & network
I/O, Data Partitioning, Data Serde
ETL Trade-offs: Pre-joined Table
customers
orders
state =
‘California’
AND gender
= ‘Female’
JOIN customers ON
(customers.customer_id
= orders.customer_id)
Group By city,
Month(orders.
date)
sum(amount)
FILTER
JOIN GROUP BY AGGREGATION
user_orders
_joined
Pre-Joined
Table
- Flexible to explore user dimensions
- Query time is still proportional to the
data scan, not predictable
ETL Trade-offs: Pre-aggregated Table
state =
‘California’
AND gender
= ‘Female’
Group By city,
Month(orders.
date)
sum(sum_amount)
FILTER GROUP BY AGGREGATION
user_orders
_joined
Pre-Joined
Table
user_orders_
aggregated
Pre-Aggregated
Table
SELECT
sum(amount) as
sum_amount, date,
city GROUP BY
date, city
Aggregation
+ GroupBy
- Reduced query runtime workload
- Query time is still proportional to the
multiplication of non-groupBy columns
ETL Trade-offs: Pre-cubed Table
state =
‘California’
AND gender
= ‘Female’
sum_amount,
month, city
FILTER
user_orders
_joined
Pre-Joined
Table
user_orders
_cubed
Pre-Aggregated
Table
SELECT
sum(amount) as
sum_amount, date,
Month(date) as
month, city
GROUP BY CUBE
(date, city,
Month(date))
Cubing
PROJECTION
- Predictable query runtime
- Storage overhead: one raw record
translates to multiple records
- Dimension explosion
Fact Table
Dimension Table Pre-Join Pre-Aggregation Pre-Cube
Latency
Flexibility
low
high
low
high
Not to Trade-off Using Apache Pinot
Throughput high
low
User Facing Applications Business Facing Metrics
Apache Pinot Overview
Anomaly Detection
- Ingestion: Millions of events/sec
- Workload: Thousands of queries/sec
- Performance: Millisecond
- Operation: Thousands Nodes Cluster
ADLS
GCS
Real-Time Offline
BI Visualization Data Products Anomaly Detection
ADLS
GCS
Real-Time Offline
OLTP
Server1 Server2 Server3
Zookeeper
Broker 1 Broker 2
Controller
Secrets Behind Apache Pinot
Scan
Aggregation
Filter
Storage
Bloom
Filter
Inverted Index
Columnar Store
Byte
Encoding
Sorted
Index ❏Common Techniques
❏Pinot
Compression
Star-Tree Pre-aggregation
Star-
Tree
Index
Bit/RLE
Encoding
Per-segment flexible query planning
Range
Index
Text
Index
Apache Pinot - StarTree Index
• Configurable trade-off between latency and space by partial pre-aggregation
technique
• Be able to achieve a hard upper bound for query latencies
No pre-computation
Latency
Storage
Full Pre-Cube
(KV Store)
Partial pre-computation
(Startree Index)
T= 10000
T= 100
Trino Overview
Source: Trino Architecture
BI Visualization Data Products Anomaly Detection Ad hoc Analysis
ADLS
GCS
Trino Pinot Connector
Trino Pinot Connector
Trino Pinot Connector: Aggregation Pushdown
Chasing the light: Aggregation pushdown
- Issue single Pinot broker request
- Best-effort push down for aggregations like
count/sum/min/max/distinct/approximate_distinct, etc
- 10~100x latency improvement
Passthrough Broker Queries
SELECT CASE WHEN team = ‘Giants’ then ‘BIG’ else ‘SMALL’
END AS size, team, count(*) FROM
pinot.default.”SELECT team
FROM baseball_stats WHERE conference = ‘America East’”
GROUP BY CASE WHEN team = ‘Giants’ then ‘BIG’ else
‘SMALL’ END, 2
Group by Expression Support
Passthrough Broker Queries
SELECT team, count(*) FROM
pinot.default.”SELECT team, player
FROM baseball_stats WHERE conference = ‘America East’”
ORDER BY CASE WHEN team = ‘Giants’ then ‘BIG’ else
‘SMALL’ END, 2
Order by Expression Support
Trino Pinot Connector: Server Query + Pinot Streaming API
Pinot Streaming(gRpc) Connector
- Distributed workload in parallel among Trino workers
- Configurable memory footprint for data pulling from Pinot
- Open the gate of queries requires full table scan or join
Ongoing and Future Work on the Connector
● Data Insertion
○ Push segments to the controller
○ Adds or replaces segments.
Ongoing and Future Work on the Connector
● Pinot Segments Deletion
● Table & Column Creation/Alter/Drop
CREATE TABLE DIMTABLE
(LONG_COL bigint, STRING_COL varchar)
WITH (
PRIMARY_KEY_COLUMNS = ARRAY['long_col'],
OFFLINE_CONFIG = '{
"tableName": "dimtable",
"tableType": "OFFLINE",
"isDimTable": true,
"segmentsConfig": {
"segmentPushType": "REFRESH",
"replication": "1"
},
"tenants": {
"broker": "DefaultTenant",
"server": "DefaultTenant"
},
"tableIndexConfig": {
"loadMode": "MMAP"
}
}’
);
Perf Benchmark
Benchmark Config:
- 1 Pinot Controllers (4 cores/8GB)
- 1 Pinot Brokers (4 cores/8GB)
- 3 Pinot Servers (4 cores/8GB)
- 1 Trino Coordinator (4 cores/8GB)
- 1 Trino Workers (4 cores/8GB)
Data Set:
- 40 Million rows data set
Query:
- Aggregation GroupBy + Predicate
Trino:
- Aggregation Pushdown enable/disable
Perf Benchmark
Query Type: Aggregation Group By + Predicate pushdown
Thank you
- Getting Started: https://tinyurl.com/trinoPinotTutorial
- Run Trino in Kubernetes: https://github.com/trinodb/charts
- StarTree: https://www.startree.ai/
- Apache Pinot: https://pinot.apache.org/
- Pinot on github: https://github.com/apache/pinot
- Pinot slack: https://tinyurl.com/pinotSlackChannel
- Apache Pinot Twitter: https://twitter.com/ApachePinot
- Apache Pinot Meetup: https://www.meetup.com/apache-pinot
- Starburst: https://www.starburst.io/
- Trino: https://trino.io/
- Trino on github: https://github.com/trinodb/trino
- Trino slack: https://trino.io/slack.html

More Related Content

What's hot (20)

PDF
Pinot: Near Realtime Analytics @ Uber
Xiang Fu
 
PDF
Parquet performance tuning: the missing guide
Ryan Blue
 
PDF
The Parquet Format and Performance Optimization Opportunities
Databricks
 
PDF
Apache Spark Core – Practical Optimization
Databricks
 
PPTX
Extending Flink SQL for stream processing use cases
Flink Forward
 
PDF
Presto Summit 2018 - 09 - Netflix Iceberg
kbajda
 
PDF
Iceberg: a fast table format for S3
DataWorks Summit
 
PPTX
Apache Arrow Flight Overview
Jacques Nadeau
 
PPTX
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
PDF
Dynamic Partition Pruning in Apache Spark
Databricks
 
PDF
RocksDB Performance and Reliability Practices
Yoshinori Matsunobu
 
PDF
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
confluent
 
PDF
MyRocks Deep Dive
Yoshinori Matsunobu
 
PPTX
Autoscaling Flink with Reactive Mode
Flink Forward
 
PDF
Delta Lake Streaming: Under the Hood
Databricks
 
PPTX
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
 
PDF
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
StreamNative
 
PPTX
Elastic Stack Introduction
Vikram Shinde
 
PDF
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
PDF
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Cloudera, Inc.
 
Pinot: Near Realtime Analytics @ Uber
Xiang Fu
 
Parquet performance tuning: the missing guide
Ryan Blue
 
The Parquet Format and Performance Optimization Opportunities
Databricks
 
Apache Spark Core – Practical Optimization
Databricks
 
Extending Flink SQL for stream processing use cases
Flink Forward
 
Presto Summit 2018 - 09 - Netflix Iceberg
kbajda
 
Iceberg: a fast table format for S3
DataWorks Summit
 
Apache Arrow Flight Overview
Jacques Nadeau
 
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
Dynamic Partition Pruning in Apache Spark
Databricks
 
RocksDB Performance and Reliability Practices
Yoshinori Matsunobu
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
confluent
 
MyRocks Deep Dive
Yoshinori Matsunobu
 
Autoscaling Flink with Reactive Mode
Flink Forward
 
Delta Lake Streaming: Under the Hood
Databricks
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
StreamNative
 
Elastic Stack Introduction
Vikram Shinde
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Cloudera, Inc.
 

Similar to Real-time Analytics with Trino and Apache Pinot (20)

PDF
Scaling up uber's real time data analytics
Xiang Fu
 
PDF
Creating PostgreSQL-as-a-Service at Scale
Sean Chittenden
 
PDF
Sprint 73
ManageIQ
 
PDF
Introduction to Apache Kafka
Ricardo Bravo
 
PDF
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
HostedbyConfluent
 
PDF
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
HostedbyConfluent
 
PPTX
Oracle Goldengate for Big Data - LendingClub Implementation
Vengata Guruswamy
 
PPTX
LendingClub RealTime BigData Platform with Oracle GoldenGate
Rajit Saha
 
PPTX
Advanced Analytics using Apache Hive
Murtaza Doctor
 
PDF
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
confluent
 
PPTX
Kakfa summit london 2019 - the art of the event-streaming app
Neil Avery
 
PPTX
Apache Pinot Meetup Sept02, 2020
Mayank Shrivastava
 
PDF
AIDC NY: BODO AI Presentation - 09.19.2019
Intel® Software
 
PPTX
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData
 
PPTX
Comprehensive container based service monitoring with kubernetes and istio
Fred Moyer
 
PDF
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
ScyllaDB
 
PDF
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Saurabh Verma
 
PPTX
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
ScyllaDB
 
PDF
Dynamic Authorization & Policy Control for Docker Environments
Torin Sandall
 
PDF
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward
 
Scaling up uber's real time data analytics
Xiang Fu
 
Creating PostgreSQL-as-a-Service at Scale
Sean Chittenden
 
Sprint 73
ManageIQ
 
Introduction to Apache Kafka
Ricardo Bravo
 
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
HostedbyConfluent
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
HostedbyConfluent
 
Oracle Goldengate for Big Data - LendingClub Implementation
Vengata Guruswamy
 
LendingClub RealTime BigData Platform with Oracle GoldenGate
Rajit Saha
 
Advanced Analytics using Apache Hive
Murtaza Doctor
 
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
confluent
 
Kakfa summit london 2019 - the art of the event-streaming app
Neil Avery
 
Apache Pinot Meetup Sept02, 2020
Mayank Shrivastava
 
AIDC NY: BODO AI Presentation - 09.19.2019
Intel® Software
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData
 
Comprehensive container based service monitoring with kubernetes and istio
Fred Moyer
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
ScyllaDB
 
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Saurabh Verma
 
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
ScyllaDB
 
Dynamic Authorization & Policy Control for Docker Environments
Torin Sandall
 
Flink Forward San Francisco 2019: Massive Scale Data Processing at Netflix us...
Flink Forward
 
Ad

Recently uploaded (20)

PDF
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pdf
ghjghvhjgc
 
PDF
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PPTX
CapCut Pro PC Crack Latest Version Free Free
josanj305
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
Home Cleaning App Development Services.pdf
V3cube
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
NASA A Researcher’s Guide to International Space Station : Fundamental Physics
Dr. PANKAJ DHUSSA
 
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pdf
ghjghvhjgc
 
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
CapCut Pro PC Crack Latest Version Free Free
josanj305
 
Digital Circuits, important subject in CS
contactparinay1
 
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Home Cleaning App Development Services.pdf
V3cube
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
NASA A Researcher’s Guide to International Space Station : Fundamental Physics
Dr. PANKAJ DHUSSA
 
Ad

Real-time Analytics with Trino and Apache Pinot

  • 1. Real-time Analytics with Trino and Apache Pinot Xiang Fu, Elon Azoulay Oct 22, 2021
  • 2. Speaker Intro Xiang Fu ● Co-founder - StarTree ● Previously: Streaming Platform @ Uber ● PMC & Committer: Apache Pinot Elon Azoulay ● Software Engineer - Stealth Mode Startup ● Previously: Data @ Facebook
  • 3. Agenda ● Today’s Compromises: Latency vs. Flexibility ● Not trade-off using Apache Pinot ● Trino Pinot Connector ● Benchmark
  • 5. Flexibility takes time: Join on the Fly customers orders customers.state = ‘California’ AND customers.gender = ‘Female’ JOIN customers ON (customers.customer_id = orders.customer_id) Group By customers.city, Month(orders.date) sum(orders.amount) FILTER JOIN GROUP BY AGGREGATION - Flexible to do any computation - High query cost: disk & network I/O, Data Partitioning, Data Serde
  • 6. ETL Trade-offs: Pre-joined Table customers orders state = ‘California’ AND gender = ‘Female’ JOIN customers ON (customers.customer_id = orders.customer_id) Group By city, Month(orders. date) sum(amount) FILTER JOIN GROUP BY AGGREGATION user_orders _joined Pre-Joined Table - Flexible to explore user dimensions - Query time is still proportional to the data scan, not predictable
  • 7. ETL Trade-offs: Pre-aggregated Table state = ‘California’ AND gender = ‘Female’ Group By city, Month(orders. date) sum(sum_amount) FILTER GROUP BY AGGREGATION user_orders _joined Pre-Joined Table user_orders_ aggregated Pre-Aggregated Table SELECT sum(amount) as sum_amount, date, city GROUP BY date, city Aggregation + GroupBy - Reduced query runtime workload - Query time is still proportional to the multiplication of non-groupBy columns
  • 8. ETL Trade-offs: Pre-cubed Table state = ‘California’ AND gender = ‘Female’ sum_amount, month, city FILTER user_orders _joined Pre-Joined Table user_orders _cubed Pre-Aggregated Table SELECT sum(amount) as sum_amount, date, Month(date) as month, city GROUP BY CUBE (date, city, Month(date)) Cubing PROJECTION - Predictable query runtime - Storage overhead: one raw record translates to multiple records - Dimension explosion
  • 9. Fact Table Dimension Table Pre-Join Pre-Aggregation Pre-Cube Latency Flexibility low high low high Not to Trade-off Using Apache Pinot Throughput high low
  • 10. User Facing Applications Business Facing Metrics Apache Pinot Overview Anomaly Detection - Ingestion: Millions of events/sec - Workload: Thousands of queries/sec - Performance: Millisecond - Operation: Thousands Nodes Cluster ADLS GCS Real-Time Offline
  • 11. BI Visualization Data Products Anomaly Detection ADLS GCS Real-Time Offline OLTP Server1 Server2 Server3 Zookeeper Broker 1 Broker 2 Controller
  • 12. Secrets Behind Apache Pinot Scan Aggregation Filter Storage Bloom Filter Inverted Index Columnar Store Byte Encoding Sorted Index ❏Common Techniques ❏Pinot Compression Star-Tree Pre-aggregation Star- Tree Index Bit/RLE Encoding Per-segment flexible query planning Range Index Text Index
  • 13. Apache Pinot - StarTree Index • Configurable trade-off between latency and space by partial pre-aggregation technique • Be able to achieve a hard upper bound for query latencies No pre-computation Latency Storage Full Pre-Cube (KV Store) Partial pre-computation (Startree Index) T= 10000 T= 100
  • 15. BI Visualization Data Products Anomaly Detection Ad hoc Analysis ADLS GCS Trino Pinot Connector
  • 17. Trino Pinot Connector: Aggregation Pushdown Chasing the light: Aggregation pushdown - Issue single Pinot broker request - Best-effort push down for aggregations like count/sum/min/max/distinct/approximate_distinct, etc - 10~100x latency improvement
  • 18. Passthrough Broker Queries SELECT CASE WHEN team = ‘Giants’ then ‘BIG’ else ‘SMALL’ END AS size, team, count(*) FROM pinot.default.”SELECT team FROM baseball_stats WHERE conference = ‘America East’” GROUP BY CASE WHEN team = ‘Giants’ then ‘BIG’ else ‘SMALL’ END, 2 Group by Expression Support
  • 19. Passthrough Broker Queries SELECT team, count(*) FROM pinot.default.”SELECT team, player FROM baseball_stats WHERE conference = ‘America East’” ORDER BY CASE WHEN team = ‘Giants’ then ‘BIG’ else ‘SMALL’ END, 2 Order by Expression Support
  • 20. Trino Pinot Connector: Server Query + Pinot Streaming API Pinot Streaming(gRpc) Connector - Distributed workload in parallel among Trino workers - Configurable memory footprint for data pulling from Pinot - Open the gate of queries requires full table scan or join
  • 21. Ongoing and Future Work on the Connector ● Data Insertion ○ Push segments to the controller ○ Adds or replaces segments.
  • 22. Ongoing and Future Work on the Connector ● Pinot Segments Deletion ● Table & Column Creation/Alter/Drop CREATE TABLE DIMTABLE (LONG_COL bigint, STRING_COL varchar) WITH ( PRIMARY_KEY_COLUMNS = ARRAY['long_col'], OFFLINE_CONFIG = '{ "tableName": "dimtable", "tableType": "OFFLINE", "isDimTable": true, "segmentsConfig": { "segmentPushType": "REFRESH", "replication": "1" }, "tenants": { "broker": "DefaultTenant", "server": "DefaultTenant" }, "tableIndexConfig": { "loadMode": "MMAP" } }’ );
  • 23. Perf Benchmark Benchmark Config: - 1 Pinot Controllers (4 cores/8GB) - 1 Pinot Brokers (4 cores/8GB) - 3 Pinot Servers (4 cores/8GB) - 1 Trino Coordinator (4 cores/8GB) - 1 Trino Workers (4 cores/8GB) Data Set: - 40 Million rows data set Query: - Aggregation GroupBy + Predicate Trino: - Aggregation Pushdown enable/disable
  • 24. Perf Benchmark Query Type: Aggregation Group By + Predicate pushdown
  • 25. Thank you - Getting Started: https://tinyurl.com/trinoPinotTutorial - Run Trino in Kubernetes: https://github.com/trinodb/charts - StarTree: https://www.startree.ai/ - Apache Pinot: https://pinot.apache.org/ - Pinot on github: https://github.com/apache/pinot - Pinot slack: https://tinyurl.com/pinotSlackChannel - Apache Pinot Twitter: https://twitter.com/ApachePinot - Apache Pinot Meetup: https://www.meetup.com/apache-pinot - Starburst: https://www.starburst.io/ - Trino: https://trino.io/ - Trino on github: https://github.com/trinodb/trino - Trino slack: https://trino.io/slack.html

Editor's Notes

  • #11: Realtime OLAP Database Columnar, Indexed Storage Low latency analytics Distributed – highly available, reliable, scalable Lambda architecture Offline data pushes Real-time stream ingestion Open Source
  • #12: Explain controller - single coordinator controlling all actions cluster state - to maintain partition assignment - partition to server mapping Cluster state maintains start and end offsets
  • #14: For certain query pattern (slice and dice on a given list of dimensions), we allow users to configure a upper bound of documents to scan. Pinot will intelligently partial pre-aggregate the records to achieve the requirement, but without exploding the storage
  • #15: Components: Coordinator (query endpoint, metadata), workers (process query results) Divides the work into splits which are processed in parallel Results returned from coordinator in final phase of processing
  • #17: Join 2 pinot tables
  • #19: Build Broker Query Pushdown filter, aggregation, limit Produce a single broker split Submit broker request Produce Results to Trino Process joins, other filters, aggregations and final limit Return results to client
  • #20: Build Broker Query Pushdown filter, aggregation, limit Produce a single broker split Submit broker request Produce Results to Trino Process joins, other filters, aggregations and final limit Return results to client
  • #21: First step is to get the metadata from the pinot controller, Talk about how this is configurable with cache ttl config
  • #24: Pinot - Fast single table OLAP Trino - Powerful connector ecosystem Complete system - covers entire landscape Get the best of Trino and Pinot Proven stack at Uber and many more
  • #25: Pinot - Fast single table OLAP Trino - Powerful connector ecosystem Complete system - covers entire landscape Get the best of Trino and Pinot Proven stack at Uber and many more