VISUALIZATION
Krist Wongsuphasawat (@kristw)
FROM DATA TO
Senior Data Visualization Scientist, Twitter
Twitter Analytics / Visual Insights
Internal
Dashboarding system
Exploratory data visualization tools
!
External
Public facing visualizations
#interactive
#interactive
http://twitter.github.io/interactive
Examples
What are visualizations?
pretty graphics
POWER OF THE EYES
pretty
MEANINGFUL
Anscombe’s Quartet
X Y
10.0 8.04
8.0 6.95
13.0 7.58
9.0 8.81
11.0 8.33
14.0 9.96
6.0 7.24
4.0 4.26
12.0 10.84
7.0 4.82
5.0 5.68
X Y
10.0 9.14
8.0 8.14
13.0 8.74
9.0 8.77
11.0 9.26
14.0 8.10
6.0 6.13
4.0 3.10
12.0 9.13
7.0 7.26
5.0 4.74
X Y
10.0 7.46
8.0 6.77
13.0 12.74
9.0 7.11
11.0 7.81
14.0 8.84
6.0 6.08
4.0 5.39
12.0 8.15
7.0 6.42
5.0 5.73
X Y
8.0 6.58
8.0 5.76
8.0 7.71
8.0 8.84
8.0 8.47
8.0 7.04
8.0 5.25
19.0 12.50
8.0 5.56
8.0 7.91
8.0 6.89
#1 #2 #3 #4
Anscombe’s Quartet
Property Value
Mean of X 11.0
Variance of X 10.0
Mean of Y 7.5
Variance of Y 3.75
Correlation between X and Y 0.816
Linear regression y = 3.0 +0.5x
#1 #2 #3 #4
Identical statistics!
Anscombe’s Quartet
#1 #2 #3 #4
0!
2!
4!
6!
8!
10!
12!
0! 5! 10! 15!
0!
1!
2!
3!
4!
5!
6!
7!
8!
9!
10!
0! 5! 10! 15!
0!
2!
4!
6!
8!
10!
12!
14!
0! 5! 10! 15!
0!
2!
4!
6!
8!
10!
12!
14!
0! 10! 20!
but very different
Napoleon’s March
geography
time
course (attack/retreat)
quantity of troops
temperature
direction
London Cholera Outbreak
London Cholera Outbreak
Visualization
• Power
• Understand data quickly
• Discover hidden facts
• Usage
• Storytelling / Reporting
• Exploratory data analysis
“Visualization”
• Information Visualization (academia)
• InfoVis
• Data Visualization (commonly used)
• DataVis
!
• infographics (...)
How to start?
• What tool should I use?
!
!
DATA
How to start?
• What tool should I use?
!
!
!
1. What type of data do I have?
DATA
DATA
1) What type of data?
DATA
1) What type of data?
vis7
vis5
vis3
vis2
vis1
vis6
vis4
Many options...
Which visualization technique should I use?
1) What type of data?
• Visualizations are categorized by data types:
• 2,3- dimensional
• Multi-dimensional
• Temporal
• Tree
• Network
• etc.
Let’s take a tour.
2D, 3D data
(real world objects)
!
a.k.a. Scientific Visualization (SciVis)
2D: Maps
3D: Brain
Multi-dimensional data
abstract dimensions
(+ real world dimensions)
Flowers
species sepalLength sepalWidth petalLength petalWidth
setosa 5.1 3.5 1.4 0.2
setosa 4.9 3.0 1.4 0.2
setosa 4.7 3.2 1.3 0.2
virginica 4.6 3.1 1.5 0.2
virginica 5.0 3.6 1.4 0.2
virginica 5.4 3.9 1.7 0.4
DATA
Scatterplot
http://bl.ocks.org/mbostock/3887118
Sepal Length
Sepal Width
Scatterplot Matrix
http://bl.ocks.org/mbostock/4063663
Sepal
Length
Sepal
Width
Petal
Length
Petal
Width
Cars
Name
economy
(mpg)
cylinders
power
(hp)
weight
(lb)
0-60 mph
(s)
Ford Mustang 18 6 88 3139 14.5
Honda Accord 31.5 4 68 2045 18.5
Honda Civic 24 4 97 2489 15
Mazda RX-7 23.7 3 100 2420 12.5
DATA
Parallel Coordinates
http://bl.ocks.org/jasondavies/1341281
The Geography of Tweets
@miguelrios
The Geography of Tweets
@miguelrios
tweet counts latitude longitude
20,000 27.174526 78.042153
9,000 49.124093 52.201304
1,000 12.2995 31.59592
... ... ...
DATA
abstract
dimension
real world
dimensions
Temporal Data
value changes over time
events
Line charts
http://bl.ocks.org/mbostock/3884955
Calendar chart
Events on timeline
http://evolutionofweb.appspot.com/#/evolution/day
Trees
hierarchy
Tree
http://bl.ocks.org/mbostock/4339083
Stock Market
Financial
All stocks
Healthcare Technology ...
Apple Google Canon ...
DATA
TreeMaps
http://www.marketwatch.com/tools/stockresearch/marketmap
Icicle
http://bl.ocks.org/mbostock/1005873
Sunburst
http://bl.ocks.org/mbostock/4348373
Networks
nodes and edges
Character Co-occurrences
{!
nodes: [!
'valjean',!
'fantine',!
'cosette',!
...!
],!
edges: [!
{character1: 'valjean', character2: 'fantine', 10},!
{character1: 'valjean', character2: 'cosette', 5},!
...!
]!
}!
DATA
Node-link diagram
http://bl.ocks.org/mbostock/4062045
Matrix
http://bost.ocks.org/mike/miserables/
Combination
Multi-D + Temporal
Multi-D + Tree
Multi-D + Network
Temporal + Tree
Temporal + Network
...
Life Expectancy
(Multi-D + Temporal)
http://www.gapminder.org/videos/the-river-of-myths/
VISUALIZATION
visual encodings + interactions
tooltips
animation
highlight
filter
etc.
bar chart
line chart
matrix
node-link
treemaps
etc.
or multiple views
(data type)
DATA
1) What type of data?
vis7
vis5
vis3
vis2
vis1
vis6
vis4
Many options...
Which visualization technique should I use?
DATA
1) What type of data?
vis7
vis3
vis4
Less options...
Still, which one should I use?
How to start?
• What tool should I use?
!
!
!
1. What type of data do I have?
2. What do I want from the data?
DATA
2) What do I want from the data?
• Many ways to visualize one type of data.
• Things to consider:
• audience (data scientist, execs, etc.)
• goal (storytelling, exploratory analysis)
• tasks
Storytelling
Exploratory
Four more years
https://www.youtube.com/watch?v=01un0ORjQps
Photogrid (Treemap + photo)
http://twitter.github.io/interactive/sochi
Soccer Tournament
https://uclfinal.twitter.com/
State of the Union
http://twitter.github.io/interactive/sotu2014/#p1
Ok, now tools.
1. What type of data do I have?
2. What do I want from the data?
Tools
Option 1: Programming library
Option 2: Packaged software
You have to write code.
(Mostly) no coding involved
Programming libraries
• d3.js, processing, R, etc.
!
• Copy and modify from examples.
• Can do custom stuffs (if you can figure out how)
• More overhead for common task
Packaged software
• Tableau (multi-dimensional)
• Gephi (graph)
• NodeXL (graph)
• Research projects (contact authors)
!
• Just use the software. No hassle of code/debug
• Limited functionalities to what the tools can do
• Custom designs more difficult
Ideal workflow
1. What type of data do I have?
2. What do I want from the data?
3. Pick appropriate techniques/tools
4. Done!
Ideal workflow
1. What type of data do I have?
2. What do I want from the data?
3. Pick appropriate techniques/tools
4. Done!
Not that easy!
Real-life workflow
data are dirty unsatisfied
transform
What type of data do I have?
Pre-process data
What do I want from the data?
Pick appropriate techniques/tools
See results change goal
change perspective
New year 2014
http://twitter.github.io/interactive/newyear2014/
Behind the scene
VISUALIZATION
FROM DATA TO
@kristw
VISUALIZATION
FROM DATA TO
@kristw
DATA first, not tools.
VISUALIZATION
FROM DATA TO
@kristw
DATA first, not tools.
visual encodings

(by data types)
+ interactionschoose:
VISUALIZATION
FROM DATA TO
visual encodings

(by data types)
+ interactions
DATA first, not tools.
@kristw
choose:
twitter.github.io/interactive
Thank you

From Data to Visualization, what happens in between?