SlideShare a Scribd company logo
ORGANIZATION NAME 
Photo: Courtesy of O'Reilly Conference on Flickr 
How LinkedIn Democratizes 
Big Data Visualization
Democratizes 
Big Data Visualization 
How 
Jonathan Wu 
Praveen Neppalli Naga 
Chi-Yi Kuan
313,000,000 
Members 
End of Q2 2014
25,000,000,000 
Page Views 
Q2 2014
3,000,000+ 
Endorsements
3,500,000+ 
Companies
What can we do with Linkedin data 
?
Sales 
Talent flow between companies
Product & engineering
Is it simple? 
Member attributes 
Page View events data
Photo Credit: https://www.flickr.com/photos/johnjoh/1060267344 
Data is the new vineyard
Photo Credit: https://www.flickr.com/photos/johnjoh/1060267344 
Data is the new vineyard
Data infra: collect & prepare data 
Collect & Prepare Data 
Mysql, Oracle, Kafka + Hadoop 
Serve Data 
Pinot 
Taste Data 
Easy-to-use visualization
Data Computation 
ETL 
HDFS 
Y 
A 
R 
N 
Map-Reduce 
Spark 
Tez 
Pig 
Hive 
Cubert 
Kafka 
Data Stores 
Hadoop
Data infra: Serve data 
Collect & Prepare Data 
Kafka + Hadoop 
Serve Data 
Pinot 
Taste Data 
Easy-to-use visualization
Products for members/customers with real-time interactive analytics 
•Who’s Viewed Your Profile 
•Ads Reporting 
•Jobs Analytics 
Categories of interactive analytics products 
Interactive business analytics for internal use 
•How feature X is performing 
Real-time business monitoring 
•Page view changes across mobile devices in different regions
Requirements for real-time interactive analytics 
Slice and dice billions of records, hundreds of dimensions 
End to end freshness of minutes not hours 
Sub-second query response times 
e.g. Which are top regions that contribute to my profile views? Which industries in those regions?
Pinot 
Distributed Analytics Infrastructure that serves Interactive Analytics products at Linkedin.
Data Indexes 
Distributed System 
Ingestion 
What is Pinot? 
Compressed Columnar indexes (supports Mmap and In-memory) 
Apache Helix for cluster management 
Apache Kafka (for near real-time) and Hadoop
Data Indexes 
Single Value Index 
Multi Value Index 
Inverted Index 
•Fixed bit length encoding 
•Sorted Index 
•Secondary Sorted Index 
•Multi-value Fixed bit length encoding 
•BitMap Multi-value Index 
•P4Delta 
•Modified P4Delta 
•BitMap
Cluster Management 
•Create Resources 
•Update Resource meta data 
•Expand/Contract partitions dynamically 
•Query Router
Data Ingestion 
Kafka for Realtime 
Hadoop for Historical
High Level Architecture 
PINOT 
Hadoop 
Kafka 
Historical 
Realtime 
CLUSTER MANAGER 
Controller 
Helix 
Zookeeper 
Broker 1 
Broker 2 
Server 1 
Server 2 
Server 3
Core Features 
Low latency and high QPS OLAP Queries with real-time ingestion 
Support complex dimensions 
Operational simplicity 
Data bootstrapping & reconciliation
Usage @ Linkedin 
About 18 member facing products on Linkedin.com 
Internal Reporting 
Open-source.…coming soon
Reporting UI: serve & taste data 
Collect & Prepare Data 
Kafka + Hadoop 
Serve Data 
Pinot 
Taste Data 
Easy-to-use visualization
I want to access big data without running SQL 
Business need
Start a new dashboard with one click
Select what metrics/dimensions you want
Charts are rendered in just a few seconds
Zoom into a single chart
Filter on various dimensions
Access everywhere
Portal that connects dashboards, internal reports, and internal Wiki Pages 
Enterprise analytics portal
Scale of the data 
Pinot for interactive analysis 
Self service visualization for insights 
Summary
How LinkedIn Democratizes Big Data Visualization
We are hiring 
Jonathan Wu 
www.linkedin.com/in/pneppalli 
www.linkedin.com/in/jiyewu 
www.linkedin.com/in/chiyikuan 
jowu@linkedin.com 
Praveen Neppalli Naga pneppalli@linkedin.com 
Chi-yi Kuan ckuan@linkedin.com 
650-605-2184 
650-962-3299 
650-426-6301

More Related Content

What's hot (19)

PPTX
Charles Ivie
Connected Data World
 
PPTX
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
semanticsconference
 
PDF
Fried data summit data quality data analytics together
Jeff Fried
 
PPSX
RDF and OWL : the powerful duo | Tara Raafat
Connected Data World
 
PDF
Ontos NLP Stack, Sep. 2016
Martin Voigt
 
PPTX
Solution architecture
Rajat Agrawal
 
PPTX
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
semanticsconference
 
PDF
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
semanticsconference
 
PDF
Couchbase and Apache Kafka - Bridging the gap between RDBMS and NoSQL
DATAVERSITY
 
PPTX
The Evolution of Search and Big Data
Search Technologies
 
PPTX
Enterprise architecture for big data projects
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
PDF
New from BookNet Canada: BNC CataList - Carol Gordon - Tech Forum 2018
BookNet Canada
 
PDF
Callcenter HPE IDOL overview
Tania Akinina
 
PDF
Smarter content with a Dynamic Semantic Publishing Platform
Ontotext
 
PPTX
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
semanticsconference
 
PDF
Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...
Databricks
 
PDF
II-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla Air
Dr. Haxel Consult
 
PPTX
Sören Auer | Enterprise Knowledge Graphs
semanticsconference
 
PDF
Using neo4j for enterprise metadata requirements
Neo4j
 
Charles Ivie
Connected Data World
 
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
semanticsconference
 
Fried data summit data quality data analytics together
Jeff Fried
 
RDF and OWL : the powerful duo | Tara Raafat
Connected Data World
 
Ontos NLP Stack, Sep. 2016
Martin Voigt
 
Solution architecture
Rajat Agrawal
 
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
semanticsconference
 
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
semanticsconference
 
Couchbase and Apache Kafka - Bridging the gap between RDBMS and NoSQL
DATAVERSITY
 
The Evolution of Search and Big Data
Search Technologies
 
Enterprise architecture for big data projects
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
New from BookNet Canada: BNC CataList - Carol Gordon - Tech Forum 2018
BookNet Canada
 
Callcenter HPE IDOL overview
Tania Akinina
 
Smarter content with a Dynamic Semantic Publishing Platform
Ontotext
 
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
semanticsconference
 
Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...
Databricks
 
II-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla Air
Dr. Haxel Consult
 
Sören Auer | Enterprise Knowledge Graphs
semanticsconference
 
Using neo4j for enterprise metadata requirements
Neo4j
 

Similar to How LinkedIn Democratizes Big Data Visualization (20)

PDF
Big Data in Action – Real-World Solution Showcase
Inside Analysis
 
PDF
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
Denodo
 
PDF
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
StreamNative
 
PDF
Big Data & SQL: The On-Ramp to Hadoop
Inside Analysis
 
PPTX
Marketing Digital Command Center
DataWorks Summit
 
PDF
Big data for Telco: opportunity or threat?
Swiss Big Data User Group
 
PDF
Hadoop as an Analytic Platform: Why Not?
Inside Analysis
 
PDF
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
DATAVERSITY
 
PPTX
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
DataWorks Summit
 
PPTX
A modern, flexible approach to Hadoop implementation incorporating innovation...
DataWorks Summit
 
PPTX
Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...
Looker
 
PPTX
4th Industrial Revolution
Rolando Rangel
 
PDF
The Great Lakes: How to Approach a Big Data Implementation
Inside Analysis
 
PPTX
Big Data in Azure
DataWorks Summit/Hadoop Summit
 
PDF
Introducing Neo4j
Neo4j
 
PDF
Level Up – How to Achieve Hadoop Acceleration
Inside Analysis
 
PDF
Making the Most of Power BI with SQL Server 2014 and Azure
Perficient, Inc.
 
PPSX
Best practices to deliver data analytics to the business with power bi
Satya Shyam K Jayanty
 
PPTX
TIBCO Advanced Analytics Meetup (TAAM) November 2015
Bipin Singh
 
PDF
Tapdata Product Intro
Tapdata
 
Big Data in Action – Real-World Solution Showcase
Inside Analysis
 
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
Denodo
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
StreamNative
 
Big Data & SQL: The On-Ramp to Hadoop
Inside Analysis
 
Marketing Digital Command Center
DataWorks Summit
 
Big data for Telco: opportunity or threat?
Swiss Big Data User Group
 
Hadoop as an Analytic Platform: Why Not?
Inside Analysis
 
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
DATAVERSITY
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
DataWorks Summit
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
DataWorks Summit
 
Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...
Looker
 
4th Industrial Revolution
Rolando Rangel
 
The Great Lakes: How to Approach a Big Data Implementation
Inside Analysis
 
Introducing Neo4j
Neo4j
 
Level Up – How to Achieve Hadoop Acceleration
Inside Analysis
 
Making the Most of Power BI with SQL Server 2014 and Azure
Perficient, Inc.
 
Best practices to deliver data analytics to the business with power bi
Satya Shyam K Jayanty
 
TIBCO Advanced Analytics Meetup (TAAM) November 2015
Bipin Singh
 
Tapdata Product Intro
Tapdata
 
Ad

Recently uploaded (20)

PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PDF
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
PDF
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
PPTX
recruitment Presentation.pptxhdhshhshshhehh
devraj40467
 
PDF
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PPTX
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PDF
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPTX
Climate Action.pptx action plan for climate
justfortalabat
 
PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
PPT
deep dive data management sharepoint apps.ppt
novaprofk
 
PPTX
GenAI-Introduction-to-Copilot-for-Bing-March-2025-FOR-HUB.pptx
cleydsonborges1
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
recruitment Presentation.pptxhdhshhshshhehh
devraj40467
 
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
Climate Action.pptx action plan for climate
justfortalabat
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
deep dive data management sharepoint apps.ppt
novaprofk
 
GenAI-Introduction-to-Copilot-for-Bing-March-2025-FOR-HUB.pptx
cleydsonborges1
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
Ad

How LinkedIn Democratizes Big Data Visualization