SlideShare a Scribd company logo
5
Most read
12
Most read
14
Most read
Real-time
Analytics Using
Apache Pinot
How LinkedIn, Uber Eats and Stripe create
Real Time Dashboards for millions of users.
Agenda
Who is Barkha? (why would you want to listen to me?)
The evolution of Analytics
How LinkedIn Solved their Problem
Try some Pinot with me
Overheard @ Big Data Fest 2023
• 5 Year trends in Big data will see
• Streaming APIs
• Will Data Warehouse Survive?
• Integration with LLM/AI/ML
• Thiago de Faria
• 5 Year trends in Big data will see
• Democratization of Data Warehousing
• Commoditized Data Warehousing
• Most companies are barely doing BI let alone AI.
• Joe Reis
About Barkha
• Founder South Florida Women in
Technology
• Developer Advocate @StarTree
• Linkedin.com/in/BarkhaHerman
• Twitter @BarkhaH
Analytics?
Real Time?
Scale?
OH WHY?
Why do we need Real-time
Analytics? Or Analytics? Or at
Scale?
Historic
Analytics
Batch
Shared Data
No Scale Concerns
Modern
Analytics
Data Freshness
Daily reports vs.
How late is my food
delivery?
Query Performance
Reports < 2 minutes vs.
Dashboards take < 10
millisecond to load
Scale
All division managers
worldwide access report
(> 1000) vs.
Millions of users access
dashboard
How LinkedIn
solved Analytics @
Scale
By inventing Pinot
LinkedIn: Who Viewed
your Profile? • Capture profile view information
and its deduplication
• Compute view sources (e.g.,
search, profile page, etc.)
• View relevance (e.g., a senior
leader viewed your profile)
• View obfuscations based on the
viewing member’s privacy settings
Before Pinot
• Elastic Search based solution
• 1000 Nodes
• 1500 queries / sec
• 20+ million users
After Pinot
• 75 Nodes
• 5000 queries / sec
• 70+ million users
Pinot
Building
Blocks • Segment is the physical
store.
• Table are conceptual and
accept both real-time and
batch data.
• Tenants provide
functional segregation.
• Cluster allow for scale
based on use.
Pinot
Building
Blocks
Indexes
Pinot
supports
the
following
indexing
techniques
Inverted index - Used for exact lookups
Range index - Used for range queries.
Text index - Used for phrase, term, Boolean, prefix, or regex queries.
Geospatial index - Based on H3, a hexagon-based hierarchical gridding.
Used for finding points that exist within a certain distance from another point.
JSON index - Used for querying columns in JSON documents.
Star-Tree index - Pre-aggregates results across multiple columns.
StarTree Index
Don’t pre cube everything…
Apache Pinot Architecture
Demo
Pizza Shop Demo
https://github.com/startreedata/pizza-shop-demo
Overheard @ Big Data Fest 2023
• 5 Year trends in Big data will see
• Streaming APIs  Apache Pinot is built to solve Streaming First Problems
• Will Data Warehouse Survive?  Apache Pinot builds Customer Facing Analytics which is on the rise
• Integration with LLM/AI/ML  Apps built on top of Pinot such as ThirdEye use Statistics and allow for AI/ML Add Ons.
• Thiago de Faria
• 5 Year trends in Big data will see
• Democratization of Data Warehousing  Apache Pinot builds Customer Facing Analytics which is on the rise
• Commoditized Data Warehousing  Apache Pinot builds Customer Facing Analytics which is on the rise
• Most companies are barely doing BI let alone AI.  Easy Analytics + Apps built on top of Pinot such as ThirdEye.
• Joe Reis
Using Real-
time Analytics
@ Scale
What can you do with it?
Who Uses
Apache
Pinot?
What’s Next?
Please Connect!!!!! I need brownie points.
Thank you for listening!

More Related Content

Similar to Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot (20)

PDF
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
HostedbyConfluent
 
PDF
Real-Time Analytics: Going Beyond Stream Processing With Apache Pinot
Alluxio, Inc.
 
PDF
Pinot: Realtime Distributed OLAP datastore
Kishore Gopalakrishna
 
PDF
Pinotcoursera 151103183418-lva1-app6892 (1)
Nayeli Bonilla
 
PDF
OSA Con 2022 - Building a Real-time Analytics Application with Apache Pulsar ...
Altinity Ltd
 
PDF
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Ltd
 
PDF
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Seunghyun Lee
 
PDF
New Features in Apache Pinot
Siddharth Teotia
 
PDF
Pinot: Near Realtime Analytics @ Uber
Xiang Fu
 
PDF
Enabling product personalisation using Apache Kafka, Apache Pinot and Trino w...
HostedbyConfluent
 
PDF
Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...
HostedbyConfluent
 
PDF
Building real time analytics applications using pinot : A LinkedIn case study
Kishore Gopalakrishna
 
PPTX
Apache Pinot Meetup Sept02, 2020
Mayank Shrivastava
 
PDF
Analytics: The Final Data Frontier (or, Why Users Need Your Data and How Pino...
HostedbyConfluent
 
PDF
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
DataWorks Summit
 
PDF
Detailed guide to the Apache Spark Framework
Aegis Software Canada
 
PDF
Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Shirshanka Das
 
PPTX
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
DataKitchen
 
PPTX
Big Data and NoSQL for Database and BI Pros
Andrew Brust
 
PDF
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
TUMRA | Big Data Science - Gain a competitive advantage through Big Data & Data Science
 
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
HostedbyConfluent
 
Real-Time Analytics: Going Beyond Stream Processing With Apache Pinot
Alluxio, Inc.
 
Pinot: Realtime Distributed OLAP datastore
Kishore Gopalakrishna
 
Pinotcoursera 151103183418-lva1-app6892 (1)
Nayeli Bonilla
 
OSA Con 2022 - Building a Real-time Analytics Application with Apache Pulsar ...
Altinity Ltd
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Ltd
 
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Seunghyun Lee
 
New Features in Apache Pinot
Siddharth Teotia
 
Pinot: Near Realtime Analytics @ Uber
Xiang Fu
 
Enabling product personalisation using Apache Kafka, Apache Pinot and Trino w...
HostedbyConfluent
 
Look how easy it is to go from events to blazing-fast analytics! | Neha Pawar...
HostedbyConfluent
 
Building real time analytics applications using pinot : A LinkedIn case study
Kishore Gopalakrishna
 
Apache Pinot Meetup Sept02, 2020
Mayank Shrivastava
 
Analytics: The Final Data Frontier (or, Why Users Need Your Data and How Pino...
HostedbyConfluent
 
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
DataWorks Summit
 
Detailed guide to the Apache Spark Framework
Aegis Software Canada
 
Bigger Faster Easier: LinkedIn Hadoop Summit 2015
Shirshanka Das
 
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
DataKitchen
 
Big Data and NoSQL for Database and BI Pros
Andrew Brust
 
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
TUMRA | Big Data Science - Gain a competitive advantage through Big Data & Data Science
 

More from Anant Corporation (20)

PPTX
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
Anant Corporation
 
PPTX
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
Anant Corporation
 
PDF
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Anant Corporation
 
PDF
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
Anant Corporation
 
PDF
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Anant Corporation
 
PPTX
YugabyteDB Developer Tools
Anant Corporation
 
PPTX
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Anant Corporation
 
PPTX
Machine Learning Orchestration with Airflow
Anant Corporation
 
PDF
Cassandra Lunch 130: Recap of Cassandra Forward Talks
Anant Corporation
 
PDF
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
Anant Corporation
 
PDF
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Anant Corporation
 
PDF
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & Future
Anant Corporation
 
PDF
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Anant Corporation
 
PDF
Data Engineer's Lunch #85: Designing a Modern Data Stack
Anant Corporation
 
PPTX
CL 121
Anant Corporation
 
PDF
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Anant Corporation
 
PDF
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
Anant Corporation
 
PPTX
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Anant Corporation
 
PPTX
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Anant Corporation
 
PPTX
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Anant Corporation
 
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
Anant Corporation
 
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
Anant Corporation
 
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Anant Corporation
 
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
Anant Corporation
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Anant Corporation
 
YugabyteDB Developer Tools
Anant Corporation
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Anant Corporation
 
Machine Learning Orchestration with Airflow
Anant Corporation
 
Cassandra Lunch 130: Recap of Cassandra Forward Talks
Anant Corporation
 
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
Anant Corporation
 
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Anant Corporation
 
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & Future
Anant Corporation
 
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Anant Corporation
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Anant Corporation
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Anant Corporation
 
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
Anant Corporation
 
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Anant Corporation
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Anant Corporation
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Anant Corporation
 
Ad

Recently uploaded (20)

PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
Digital Circuits, important subject in CS
contactparinay1
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Ad

Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot