SlideShare a Scribd company logo
Krist Wongsuphasawat /@kristw
visualizationdata
A quick tour for data science enthusiasts
visualizationdata
What is it about?
What is it good for?
How is it related to data science?
Example projects
…
1. What is it about?
“A picture is worth more than a thousand words.”
— ใครสักคนได้กล่าวไว้
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Picture
Data Visual display
Help audience consume a lot of information rapidly
Data Visual display
2. What is it good for?
Example / History
data
Data Visualization: A Quick Tour for Data Science Enthusiasts
location (lat,lon => x,y), quantity of troops (width), direction (color)
time (x), temperature (y)
Example / Cholera epidemic
List of deceased patients
!
Mr. Smith, who lived at 11 Sunny St.
Miss White, who lived at 23 Cloudy Rd.
Mr. Jones, who lived at 30 Rainy St.
Mrs. Robinson, who lived at 34 Windy Rd.
…
data
John Snow
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
What is it good for?
Storytelling
Communicate known information
Exploratory data analysis
Explore data to reveal insights
More powerful
Visualization = Visual display + Interaction
3. How is it related
to data science?
Turn data into
valuable insights
data product
interesting stories
data
wrangling
output
insights, products, stories
exploratory
data analysis
report
results
raw data
in-depth
analysis
data
wrangling
output
insights, products, stories
exploratory
data analysis
report
results
in-depth
analysis
communication,
storytelling
raw data
4. Example projects
4.1 Ballon d’Or
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
FIFA released voting data
• 3 voters / country
• National team captain
• National team coach
• Journalist (media)
• Each voter select 3 players for 1st, 2nd and 3rd place
Rules
Data Visualization: A Quick Tour for Data Science Enthusiasts
data
wrangling
output
insights, products, stories
exploratory
data analysis
report
results
in-depth
analysis
communication,
storytelling
raw data
• Given data are tables in PDF.
• Extract to csv
• Format data to desired format.
Data Wrangling
Demo / Ballon d’Or
https://medium.com/@kristw/who-voted-for-who-diving-into-ballon-dor-voting-data-e09138ba9712
4.2 Public-facing vis
& New year 2013
interactive.twitter.com
Geo
Heatmap
Low density
High density
Geo
San Francisco
flickr.com/photos/twitteroffice/8798020541
Low density
High density
Geo
San Francisco
Rebuild the world
based on
tweet volumes
twitter.github.io/interactive/andes/
How are these phrases used in Tweets?
Is there any pattern?
data
wrangling
output
insights, products, stories
exploratory
data analysis
report
results
in-depth
analysis
communication,
storytelling
raw data
Big data wrangling
Having all Tweets
How people think I feel.
How people think I feel. How I really feel.
Having all Tweets
• Too much data, want only relevant Tweets
• contain “สวัสดีปีใหม่”
• variations: หวัดดีปีใหม่, หวัดดีปีหม่ายยย
• typos: หวัดตีปีใหม่
• Need to aggregate & reduce size
• Long processing time (hours)
Challenges
Hadoop Cluster
Data Storage
Workflow
Hadoop Cluster
Pig / Hive / Scalding (slow)
Data Storage
Tool
Workflow
Hadoop Cluster
Pig / Hive / Scalding (slow)
Data Storage
Tool
Workflow
Hadoop Cluster
Pig / Hive / Scalding (slow)
Data Storage
Tool
Smaller datasetYour laptop
Workflow
Hadoop Cluster
Pig / Hive / Scalding (slow)
Data Storage
Tool
Final dataset
Tool node.js / python / etc. (fast)
Your laptop
Workflow
Smaller dataset
Exploratory Data Analysis
Improve design
for releasing to public
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
Demo / New Year 2013
twitter.github.io/interactive/newyear2014/
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
Another fun fact:
Developed using 2012 data
Then update data on Jan 2, 2013
4.3 Data Analysis Tool
data
wrangling
output
insights, products, stories
exploratory
data analysis
report
results
in-depth
analysis
communication,
storytelling
raw data
Logging user activities
UsersUseTwitter
UsersUse
Product Managers
Curious
Twitter
UsersUse
Curious
Engineers
Log data
in Hadoop
Write Twitter
Instrument
Product Managers
What are being logged?
tweet
activities
What are being logged?
tweet from home timeline on twitter.com
tweet from search page on iPhone
activities
What are being logged?
tweet from home timeline on twitter.com
tweet from search page on iPhone
sign up
log in
retweet
etc.
activities
Organize?
log event a.k.a. “client event”
[Lee et al. 2012]
log event a.k.a. “client event”
client : page : section : component : element : action
web : home : timeline : tweet_box : button : tweet
1) User ID
2) Timestamp
3) Event name
4) Event detail
[Lee et al. 2012]
Twitter for Banana
Count page visits
banana : home : - : - : - : impression
home page
User sessions
Session#1
A
B
start
end
Session#4
start
end
A
Session#2
B
start
end
A
Session#3
C
start
end
Aclient event
client event
Funnel
home page
profile page
Funnel analysis
banana : home : - : - : - : impression
banana : profile : - : - : - : impression
1 jobhome page
profile page
1 hour
Funnel analysis
banana : home : - : - : - : impression
banana : profile : - : - : - : impression banana : search : - : - : - : impression
home page
profile page search page
2 jobs
2 hours
Funnel analysis
banana : home : - : - : - : impression
banana : profile : - : - : - : impression banana : search : - : - : - : impression
home page
profile page search page
Specify all funnels manually!
n jobs
n hours
Goal
banana : home : - : - : - : impression
… ……
1 job => all funnels, visualized
home page
User sessions
Session#1
A
B
start
end
Session#4
start
end
A
Session#2
B
start
end
A
Session#3
C
start
end
A
Aggregate
4 sessions
A
BB C
start
end endend
A A
end
A
Aggregate
A
BB C
start
end endend
end
4 sessions
Aggregate
C
start
end endend
end
A
B
4 sessions
Aggregate
C
start
end endend
end
A
B
4 sessions
Aggregate
C
start
end endend
A
B end
4 sessions
Aggregate
C
start
endend
A
B end
4 sessions
Aggregate
C
start
endend
A
B end
4 sessions
Aggregate
start
endend
A
CB end
4 sessions
Aggregate
4,000,000 sessions
endend
A
CB end
start
Demo / Flying Sessions
Using Visualizations to Monitor Changes and Harvest Insights from a Global-Scale Logging
Infrastructure at Twitter by Krist Wongsuphasawat and Jimmy Lin. in Proc. IEEE Conference on Visual
Analytics Science and Technology (VAST), Paris, France, 13 November, 2014
visualizationdata
What is it about?
Data => Visual display + Interaction
What is it good for?
Exploratory data analysis & storytelling
How is it related to data science?
It is one of the skills often utilized in the process.
Example projects
interactive.twitter.com @kristw / kristw.yellowpigz.com
Thank you
Questions?

More Related Content

What's hot (20)

PDF
Life of a data scientist (pub)
Buhwan Jeong
 
PPTX
How To Become a Data Scientist in Iran Marketplace
Mohamadreza Mohtat
 
PDF
Python for Data Science
Gabriel Moreira
 
PDF
Be a Data Scientist in 8 steps!
PromptCloud
 
PDF
What to expect when you are visualizing (v.2)
Krist Wongsuphasawat
 
PDF
Introduction to Python for Data Science
Arc & Codementor
 
PPTX
Big Data: Architectures and Approaches
Thoughtworks
 
PDF
6 things to expect when you are visualizing
Krist Wongsuphasawat
 
PDF
Clare Corthell: Learning Data Science Online
sfdatascience
 
PDF
Intro to Python for Data Science
TJ Stalcup
 
PDF
R, Data Wrangling & Kaggle Data Science Competitions
Krishna Sankar
 
PDF
How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...
Galvanize
 
PDF
Claudia Gold: Learning Data Science Online
sfdatascience
 
PDF
Data science presentation
MSDEVMTL
 
PDF
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Big Data Spain
 
PDF
Sentiment Analysis In Retail Domain
Edureka!
 
PDF
Data Science : Make Smarter Business Decisions
Edureka!
 
PDF
Analyzing social media with Python and other tools (4/4)
Department of Communication Science, University of Amsterdam
 
PDF
UBC STAT545 2014 Cm001 intro to-course
Jennifer Bryan
 
PPTX
How to become a data scientist
DeZyre
 
Life of a data scientist (pub)
Buhwan Jeong
 
How To Become a Data Scientist in Iran Marketplace
Mohamadreza Mohtat
 
Python for Data Science
Gabriel Moreira
 
Be a Data Scientist in 8 steps!
PromptCloud
 
What to expect when you are visualizing (v.2)
Krist Wongsuphasawat
 
Introduction to Python for Data Science
Arc & Codementor
 
Big Data: Architectures and Approaches
Thoughtworks
 
6 things to expect when you are visualizing
Krist Wongsuphasawat
 
Clare Corthell: Learning Data Science Online
sfdatascience
 
Intro to Python for Data Science
TJ Stalcup
 
R, Data Wrangling & Kaggle Data Science Competitions
Krishna Sankar
 
How to Become a Data Scientist – By Ryan Orban, VP of Operations and Expansio...
Galvanize
 
Claudia Gold: Learning Data Science Online
sfdatascience
 
Data science presentation
MSDEVMTL
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Big Data Spain
 
Sentiment Analysis In Retail Domain
Edureka!
 
Data Science : Make Smarter Business Decisions
Edureka!
 
Analyzing social media with Python and other tools (4/4)
Department of Communication Science, University of Amsterdam
 
UBC STAT545 2014 Cm001 intro to-course
Jennifer Bryan
 
How to become a data scientist
DeZyre
 

Viewers also liked (9)

PDF
CfJSummit2015 Day2 データとプログラミングで絵を描こう
Yuichi Yazaki
 
PDF
Code for Japan 第10回 Brigadeワークショップ
Yuichi Yazaki
 
PPTX
リスク可視化の基本的方法
Takayuki Itoh
 
PDF
Data Visualization Japanの目指すもの
Yuichi Yazaki
 
PDF
San Diego Japan Bio Forum: ライフサイエンス向けデータ可視化技術の現状
Keiichiro Ono
 
PDF
Visualizing biological graphs in Cytoscape.js
Benjamin Keller
 
PDF
Html5j data visualization_and_d3
Daichi Morifuji
 
PDF
「モダンな」可視化アプリケーション開発とはどのようなものか?
Keiichiro Ono
 
PPTX
Large Scale Graph Analytics with JanusGraph
P. Taylor Goetz
 
CfJSummit2015 Day2 データとプログラミングで絵を描こう
Yuichi Yazaki
 
Code for Japan 第10回 Brigadeワークショップ
Yuichi Yazaki
 
リスク可視化の基本的方法
Takayuki Itoh
 
Data Visualization Japanの目指すもの
Yuichi Yazaki
 
San Diego Japan Bio Forum: ライフサイエンス向けデータ可視化技術の現状
Keiichiro Ono
 
Visualizing biological graphs in Cytoscape.js
Benjamin Keller
 
Html5j data visualization_and_d3
Daichi Morifuji
 
「モダンな」可視化アプリケーション開発とはどのようなものか?
Keiichiro Ono
 
Large Scale Graph Analytics with JanusGraph
P. Taylor Goetz
 
Ad

Similar to Data Visualization: A Quick Tour for Data Science Enthusiasts (20)

PDF
Big Data and Hadoop in the Cloud
Amazon Web Services LATAM
 
PDF
What to expect when you are visualizing
Krist Wongsuphasawat
 
PPTX
Big Data in NATO and Your Role
Jay Gendron
 
PDF
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Krist Wongsuphasawat
 
PDF
MPhil Lecture on Data Vis for Analysis
Shawn Day
 
PDF
Logs & Visualizations at Twitter
Krist Wongsuphasawat
 
PDF
Replication in Data Science - A Dance Between Data Science & Machine Learning...
June Andrews
 
PDF
Appboy analytics - NYC MUG 11/19/13
MongoDB
 
PDF
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
confluent
 
PDF
Using Apache Kafka to Analyze Session Windows
confluent
 
PPTX
DMDS Winter Workshop 2 Slides
Paige Morgan
 
PDF
Gearing up! A Designer-Focused Evaluation of Ideation Tools for Connected Pro...
Dries De Roeck
 
PDF
Into The Wonderful
Matt Wood
 
PPT
PUC Masterclass Big Data
Arjen de Vries
 
PDF
Bridging Big Data and Data Science Using Scalable Workflows
Ilkay Altintas, Ph.D.
 
PDF
Designers @ Hackathons
Eric Bell
 
PDF
How to create a personal knowledge graph IBM Meetup Big Data Madrid 2017
Juantomás García Molina
 
PPT
From Project to Program: Building Sustainable Digital Collections
egore
 
PDF
Data Science Salon: In your own words: computing customer similarity from tex...
Formulatedby
 
PDF
Lean innovation - Basic principles of Lean
Joeri Vercammen, PhD
 
Big Data and Hadoop in the Cloud
Amazon Web Services LATAM
 
What to expect when you are visualizing
Krist Wongsuphasawat
 
Big Data in NATO and Your Role
Jay Gendron
 
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Krist Wongsuphasawat
 
MPhil Lecture on Data Vis for Analysis
Shawn Day
 
Logs & Visualizations at Twitter
Krist Wongsuphasawat
 
Replication in Data Science - A Dance Between Data Science & Machine Learning...
June Andrews
 
Appboy analytics - NYC MUG 11/19/13
MongoDB
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
confluent
 
Using Apache Kafka to Analyze Session Windows
confluent
 
DMDS Winter Workshop 2 Slides
Paige Morgan
 
Gearing up! A Designer-Focused Evaluation of Ideation Tools for Connected Pro...
Dries De Roeck
 
Into The Wonderful
Matt Wood
 
PUC Masterclass Big Data
Arjen de Vries
 
Bridging Big Data and Data Science Using Scalable Workflows
Ilkay Altintas, Ph.D.
 
Designers @ Hackathons
Eric Bell
 
How to create a personal knowledge graph IBM Meetup Big Data Madrid 2017
Juantomás García Molina
 
From Project to Program: Building Sustainable Digital Collections
egore
 
Data Science Salon: In your own words: computing customer similarity from tex...
Formulatedby
 
Lean innovation - Basic principles of Lean
Joeri Vercammen, PhD
 
Ad

More from Krist Wongsuphasawat (20)

PDF
What I tell myself before visualizing
Krist Wongsuphasawat
 
PDF
Navigating the Wide World of Data Visualization Libraries
Krist Wongsuphasawat
 
PDF
Encodable: Configurable Grammar for Visualization Components
Krist Wongsuphasawat
 
PDF
6 things to expect when you are visualizing (2020 Edition)
Krist Wongsuphasawat
 
PDF
ร้อยเรื่องราวจากข้อมูล / Storytelling with Data
Krist Wongsuphasawat
 
PDF
Reveal the talking points of every episode of Game of Thrones from fans' conv...
Krist Wongsuphasawat
 
PDF
From Data to Visualization, what happens in between?
Krist Wongsuphasawat
 
PDF
A Narrative Display for Sports Tournament Recap
Krist Wongsuphasawat
 
PDF
Visualization for Event Sequences Exploration
Krist Wongsuphasawat
 
PDF
Krist Wongsuphasawat's Dissertation Proposal Slides: Interactive Exploration ...
Krist Wongsuphasawat
 
PDF
Usability of Google Docs
Krist Wongsuphasawat
 
PDF
Outflow: Exploring Flow, Factors and Outcome of Temporal Event Sequences
Krist Wongsuphasawat
 
PDF
Information Visualization for Knowledge Discovery
Krist Wongsuphasawat
 
PDF
Krist Wongsuphasawat's Dissertation Defense: Interactive Exploration of Tempo...
Krist Wongsuphasawat
 
PDF
Information Visualization for Health Care
Krist Wongsuphasawat
 
PDF
LifeFlow: Understanding Millions of Event Sequences in a Million Pixels
Krist Wongsuphasawat
 
PDF
Information Visualization for Knowledge Discovery: An Introduction
Krist Wongsuphasawat
 
PDF
Finding Comparable Temporal Categorical Records: A Similarity Measure with an...
Krist Wongsuphasawat
 
PDF
Outflow: Visualizing Patients Flow by Symptoms & Outcome
Krist Wongsuphasawat
 
What I tell myself before visualizing
Krist Wongsuphasawat
 
Navigating the Wide World of Data Visualization Libraries
Krist Wongsuphasawat
 
Encodable: Configurable Grammar for Visualization Components
Krist Wongsuphasawat
 
6 things to expect when you are visualizing (2020 Edition)
Krist Wongsuphasawat
 
ร้อยเรื่องราวจากข้อมูล / Storytelling with Data
Krist Wongsuphasawat
 
Reveal the talking points of every episode of Game of Thrones from fans' conv...
Krist Wongsuphasawat
 
From Data to Visualization, what happens in between?
Krist Wongsuphasawat
 
A Narrative Display for Sports Tournament Recap
Krist Wongsuphasawat
 
Visualization for Event Sequences Exploration
Krist Wongsuphasawat
 
Krist Wongsuphasawat's Dissertation Proposal Slides: Interactive Exploration ...
Krist Wongsuphasawat
 
Usability of Google Docs
Krist Wongsuphasawat
 
Outflow: Exploring Flow, Factors and Outcome of Temporal Event Sequences
Krist Wongsuphasawat
 
Information Visualization for Knowledge Discovery
Krist Wongsuphasawat
 
Krist Wongsuphasawat's Dissertation Defense: Interactive Exploration of Tempo...
Krist Wongsuphasawat
 
Information Visualization for Health Care
Krist Wongsuphasawat
 
LifeFlow: Understanding Millions of Event Sequences in a Million Pixels
Krist Wongsuphasawat
 
Information Visualization for Knowledge Discovery: An Introduction
Krist Wongsuphasawat
 
Finding Comparable Temporal Categorical Records: A Similarity Measure with an...
Krist Wongsuphasawat
 
Outflow: Visualizing Patients Flow by Symptoms & Outcome
Krist Wongsuphasawat
 

Recently uploaded (20)

PPTX
Dr djdjjdsjsjsjsjsjsjjsjdjdjdjdjjd1.pptx
Nandy31
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PDF
Early_Diabetes_Detection_using_Machine_L.pdf
maria879693
 
PDF
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
PPTX
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
PDF
Choosing the Right Database for Indexing.pdf
Tamanna
 
PPTX
Climate Action.pptx action plan for climate
justfortalabat
 
PPT
deep dive data management sharepoint apps.ppt
novaprofk
 
PPTX
recruitment Presentation.pptxhdhshhshshhehh
devraj40467
 
PDF
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
PDF
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
PDF
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
PDF
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PPTX
GenAI-Introduction-to-Copilot-for-Bing-March-2025-FOR-HUB.pptx
cleydsonborges1
 
PPTX
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
PDF
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
Dr djdjjdsjsjsjsjsjsjjsjdjdjdjdjjd1.pptx
Nandy31
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
Early_Diabetes_Detection_using_Machine_L.pdf
maria879693
 
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
Choosing the Right Database for Indexing.pdf
Tamanna
 
Climate Action.pptx action plan for climate
justfortalabat
 
deep dive data management sharepoint apps.ppt
novaprofk
 
recruitment Presentation.pptxhdhshhshshhehh
devraj40467
 
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
GenAI-Introduction-to-Copilot-for-Bing-March-2025-FOR-HUB.pptx
cleydsonborges1
 
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 

Data Visualization: A Quick Tour for Data Science Enthusiasts