SlideShare a Scribd company logo
Pulsar Summit
San Francisco
Hotel Nikko
August 18 2022
Keynote
Data Democracy:
Journey to
User-Facing Analytics
Xiang Fu
Co-Founder • StarTree
About Me:
Co-Founder at StarTree, cloud-native platform to build the next generation of data
analytics applications for millions of users.
Founder and PMC of Apache Pinot, a realtime, distributed OLAP datastore
Previously, architect at Uber's data platform team solving streaming data serving,
processing, and analytics problems at a large scale.
Data Democracy - Are We There?
SQL Editors
Dashboard
Internal Facing Analytics
Operators
Analysts
Past — Present
Current technology has done a
great job delivering insights for
INTERNAL USERS
Analytical Data Apps
latency-sensitive
User Facing Analytics
Present — Future
Users Customers
To truly democratize data, we
need to deliver high quality
insights to EXTERNAL USERS
The Gap
Vanishing window of
opportunity for events
Time Value of Data
Value
Time
Event
Insight
Streaming Changed The Game…
Data warehouses & lakes
Hours to Days
Stream
Milliseconds to Seconds
And Started A Cycle
Streaming technologies like PULSAR increased speed and reduced costs
to store events, kicking of a cycle…
Collect more
events
Improved user
experience
Increase user
engagement
Streaming
Sources
Messaging
Pub-sub
Log
Aggregation
Streaming
Processing
Real Time
Analytics
Streaming Spawned New Use Cases
- Ingest data as soon as events happen
- Query that data as soon as it’s ingested
- Do above at scale.
In simple terms, we need to:
How Do We Do Real-time Analytics ?
Simple is HARD!
Enter Apache Pinot
Ingestion Sources (Real-time, Batch, SQL)
Efficient compute and indexing powerhouse
Compressed and scalable storage (PB scale)
Advanced Query Support
Multi-Tenant and Distributed Architecture
Apache Pinot At A Glance
5000 queries/sec
~5ms average latency
<100ms 95th percentile
2016
After Pinot
5,000 Queries / sec
700M+ members
Before Pinot
1500 Queries / sec
200M+ members
2014
45X Improvement
in Efficiency
1000 Nodes
75 Nodes
Apache Pinot Impact
2013 2015 2019 2021
Started @ LinkedIn Apache Graduation
StarTree Founded
Open Source
Apache Pinot Timeline
40+
Companies
Slack Users
800
55k
Downloads
100+
Companies
Slack Users
2500+
1M+
Downloads
2020
2022
Apache Pinot Community Growth
Apache Pinot Adoption
Apache Pinot Architecture
Strong Integration with Pulsar
INTERNAL FACING ANALYTICS
USER FACING ANALYTICS
Business Analysts
Platform Operators
Application Users
Business Partners
Food Delivery FinTech
Long Orders Insights
Nearby Orders in App,
Restaurants Manager Dashboard
Merchants Dashboard
Ledger Observability
Real Time Use Cases
events/sec
1M+
queries/sec
200K+
query
latency
Ms
data size
1PB+
rows
1T
query
latency
< 1s
data size
200TB+
queries/sec
30K+
query
latency
< 100ms
Confidential - Do not duplicate or distribute without consent of StarTree Inc.
Apache Pinot At Scale
Democratizing data through
User-Facing Analytics
Who Viewed My Profile
LinkedIn
Publishing Analytics Platform
Restaurant Manager
Uber
Orders Near You
Contact Me:
Thank you!
Pulsar Summit
San Francisco
Hotel Nikko
August 18 2022
xiangfu@startree.ai
@xiangfu0

More Related Content

PDF
Unlocking value with event-driven architecture by Confluent
confluent
 
PPTX
Digital Business Transformation in the Streaming Era
Attunity
 
PDF
Take Action: The New Reality of Data-Driven Business
Inside Analysis
 
PPTX
The Cloud - What's different
Chen-Tien Tsai
 
PDF
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Denodo
 
PDF
Platforms, Cloud-Native Architectures, and APIs: Chicago Adapt or Die Keynote
Apigee | Google Cloud
 
PDF
A Winning Strategy for the Digital Economy
Eric Kavanagh
 
PDF
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Nicola Sandoli
 
Unlocking value with event-driven architecture by Confluent
confluent
 
Digital Business Transformation in the Streaming Era
Attunity
 
Take Action: The New Reality of Data-Driven Business
Inside Analysis
 
The Cloud - What's different
Chen-Tien Tsai
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Denodo
 
Platforms, Cloud-Native Architectures, and APIs: Chicago Adapt or Die Keynote
Apigee | Google Cloud
 
A Winning Strategy for the Digital Economy
Eric Kavanagh
 
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Nicola Sandoli
 

Similar to Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022 (20)

PPTX
Unlock Innovation with AWS Generative AI: Transform Your Business with Scalab...
Akhil Khandelwal
 
PDF
UiPath 23.4 Product Release Updates
DianaGray10
 
PDF
WSO2 ITALIA SMARTTALK #8 ASYNCAPI.pdf
Profesia Srl, Lynx Group
 
PPTX
Re-Inventing Enterprise IT Around APIs & Apps
WSO2
 
PDF
Envisioning the Future Enterprise
WSO2
 
PPTX
Streaming Data and Stream Processing with Apache Kafka
confluent
 
PDF
Big Data Paris - A Modern Enterprise Architecture
MongoDB
 
PPTX
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
DataWorks Summit
 
PDF
API and App Ecosystems - Build The Best: a deep dive
Cisco DevNet
 
PPTX
Digital Transformation Mindset - More Than Just Technology
confluent
 
PDF
Real-Time Analytics with Confluent and MemSQL
SingleStore
 
PPTX
#TDXRecap India tour
Shashank Srivatsavaya (ShashForce)
 
PDF
Confluent Partner Tech Talk with BearingPoint
confluent
 
PDF
Streaming Visualization
Guido Schmutz
 
PPTX
Lunch and Learn and Sneakers
Bill Zajac
 
PDF
Analytical Innovation: How to Build the Next Generation Data Platform
VMware Tanzu
 
PDF
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
confluent
 
PDF
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...
Codemotion
 
PDF
WSO2Con EU 2015: Reference Architecture for EDA
WSO2
 
PDF
Alfresco Day Roma 2015: Digital Renaissance
Alfresco Software
 
Unlock Innovation with AWS Generative AI: Transform Your Business with Scalab...
Akhil Khandelwal
 
UiPath 23.4 Product Release Updates
DianaGray10
 
WSO2 ITALIA SMARTTALK #8 ASYNCAPI.pdf
Profesia Srl, Lynx Group
 
Re-Inventing Enterprise IT Around APIs & Apps
WSO2
 
Envisioning the Future Enterprise
WSO2
 
Streaming Data and Stream Processing with Apache Kafka
confluent
 
Big Data Paris - A Modern Enterprise Architecture
MongoDB
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
DataWorks Summit
 
API and App Ecosystems - Build The Best: a deep dive
Cisco DevNet
 
Digital Transformation Mindset - More Than Just Technology
confluent
 
Real-Time Analytics with Confluent and MemSQL
SingleStore
 
Confluent Partner Tech Talk with BearingPoint
confluent
 
Streaming Visualization
Guido Schmutz
 
Lunch and Learn and Sneakers
Bill Zajac
 
Analytical Innovation: How to Build the Next Generation Data Platform
VMware Tanzu
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
confluent
 
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...
Codemotion
 
WSO2Con EU 2015: Reference Architecture for EDA
WSO2
 
Alfresco Day Roma 2015: Digital Renaissance
Alfresco Software
 
Ad

More from StreamNative (20)

PDF
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
StreamNative
 
PDF
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
StreamNative
 
PDF
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
StreamNative
 
PDF
Distributed Database Design Decisions to Support High Performance Event Strea...
StreamNative
 
PDF
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
StreamNative
 
PDF
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
StreamNative
 
PDF
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
StreamNative
 
PDF
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
StreamNative
 
PDF
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
StreamNative
 
PDF
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
StreamNative
 
PDF
Understanding Broker Load Balancing - Pulsar Summit SF 2022
StreamNative
 
PDF
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
StreamNative
 
PDF
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
StreamNative
 
PDF
Event-Driven Applications Done Right - Pulsar Summit SF 2022
StreamNative
 
PDF
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
StreamNative
 
PDF
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
StreamNative
 
PDF
Welcome and Opening Remarks - Pulsar Summit SF 2022
StreamNative
 
PDF
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
StreamNative
 
PDF
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
StreamNative
 
PDF
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
StreamNative
 
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
StreamNative
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
StreamNative
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
StreamNative
 
Distributed Database Design Decisions to Support High Performance Event Strea...
StreamNative
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
StreamNative
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
StreamNative
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
StreamNative
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
StreamNative
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
StreamNative
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
StreamNative
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
StreamNative
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
StreamNative
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
StreamNative
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
StreamNative
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
StreamNative
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
StreamNative
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
StreamNative
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
StreamNative
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
StreamNative
 
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
StreamNative
 
Ad

Recently uploaded (20)

PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
The Future of Artificial Intelligence (AI)
Mukul
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 

Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022