information visualisation
Alan Dix
example
Map your moves
where New Yorkers move (10 years data)
distorted map
circle = moves for
one zip code
red – out
blue – in
overlaid
http://moritz.stefaner.eu/projects/map%20your%20moves/
example
Map your moves
interactive:
selecting a zip code
shows where
movements to/from
also hiding:
what you don’t show
also important
http://moritz.stefaner.eu/projects/map%20your%20moves/
what is visualistion?
making data easier to understand
using direct sensory experience
especially visual!
but can have aural, tactile ‘visualisation’
direct sensory experience
N.B. sensory rather than linguisitic
sort of right/left brain stuff!
but ... may include text, numbers, etc.
visualising in text
alignment - numbers
think purpose!
which is biggest?
532.56
179.3
256.317
15
73.948
1035
3.142
497.6256
visualising in text
alignment - numbers
visually:
long number = big number
align decimal points
or right align integers
627.865
1.005763
382.583
2502.56
432.935
2.0175
652.87
56.34
visualising in text
TableLens
like a ‘spreadsheet’ ...
... but some rows squashed to one pixel high
numbers become small histogram bars
visualising in text
TableLens
N.B. also an example of focus+context
context
whole dataset
can also be seen
in overview
focus
some rows
in full detail
especially visual
visual cortex is 50% of the brain!
... but disability, context, etc.,
may mean non-visual forms needed
why visualisation?
for the data analyst
scientist, statistician, possibly you!
for the data consumer
audience, client, reader, end-user
why visualisation?
consumer
understanding
rhetoric
focus on well
understood, simple
representations
why visualisation?
consumer
understanding
rhetoric
to help others see
what the analyst
has already seen
infographics
data journalism
http://www.guardian.co.uk/news/datablog/2010/oct/18/deficit-debt-government-borrowing-data
why visualisation?
consumer
understanding
rhetoric
to persuade readers
of particular point
(and not others!)
lies, damn lies,
and graphs
the business plan
hockey stick!
why visualisation?
analyst
understanding
exploration
powerful, often novel
visualisations,
training possible
why visualisation?
consumer
understanding
exploration
to make more clear
particular aspects
of data
confirming hypotheses
e.g. box plots in stats
noticing exceptions
graph from: Measurement of the neutrino velocity with the OPERA detector in the CNGS beam
why visualisation?
consumer
understanding
exploration
to find new things
that have not been
previously considered
seeking the unknown
avoiding the obvious
wary of happenstance
a brief history of visualisation
from 2500 BC to 2012
a brief history ...
static visualisation
– the first 2500 years
interactive visualisation
– the glorious ’90s
and now?
– web and mass data
– visual analytics
static visualisation
from clay tablets to Tufte
Mesopotamian tablets
static visualisation
from clay tablets to Tufte
Mesopotamian tablets
10th Century time line
static visualisation
from clay tablets to Tufte
Mesopotamian tablets
10th Century time line
1855 Paris-Lyon train timetable
static visualisation
from clay tablets to Tufte
Mesopotamian tablets
10th Century time line
1855 Paris-Lyon train timetable
Excel etc.
static visualisation
read Tufte’s books ...
– The Visual Display of Quantitative Information
– Envisioning Information
– Visual Explanations
interactive visualisation
early 1990s growing graphics power
– 3D graphics
– complex visualisations
– real-time interaction possible
... and now
loads of data
web visualisation
data journalism
http://www.guardian.co.uk/news/datablog/2010/oct/18/deficit-debt-government-borrowing-data
http://www-958.ibm.com/software/data/cognos/manyeyes/
and visual analytics!
visualisation in context
data visualisation
plain visualisation
direct
interaction
data visualisation
visual analytics
processing
world
organisational
social & political
context
direct
interaction
data visualisation
decision
action
processing
the big picture
designing visualisation
choosing representations
visualisation factors
– visual ‘affordances’
• what we can see
– objectives, goals and tasks
• what we need to see
– aesthetics
• what we like to see
what we can see
what we need to see
what we like to see
trade-off
visualisation factors
– visual affordances
– objectives, goals and tasks
– aesthetics
static representation  trade-off
interaction reduces trade-off
–stacking histogram, overview vs. detail, etc. etc.
interaction reduces trade-off
– stacking histogram, overview vs. detail, etc. etc.
relaxing constraints
normal stacked histogram
good for:
–overall trend
–relative proportions
–trend in bottom
category
bad for others
–what is happening
to bananas?
?
make your own (iii)
relaxing constraints
interactive stacking histograms ...
or ... dancing histograms
normal histogram
except ...
normal histogram
except ...
dancing histograms
make your own (iii)
relaxing constraints
interactive stacking histograms ...
or ... dancing histograms
normal histogram
except ...
hover over cell
to show detail
make your own (iii)
relaxing constraints
interactive stacking histograms ...
or ... dancing histograms
normal histogram
except ...
hover over cell
to reveal detail
click on legend
to change
baseline
demonstration
kinds of interaction
highlighting and focus
drill down and hyperlinks
overview and context
changing parameters
changing representations
temporal fusion
Shneiderman’s
visualisation mantra
overview first,
zoom and filter,
then details on demand
http://www.sapdesignguild.org/community/book_people/visualization/controls/FilmFinder.htm
overview
zoom and filter
using sliders
details
on demand
classic visualisations
displaying groups/clusters
numeric attributes
– use average
or region
categorical attributes
– show values of attributes common to cluster
text, images, sound
– no sensible ‘average’ to display
– use typical documents/images
– central to cluster ...
or spread within cluster
using clusters
the scatter/gather browser
take a collection of documents
scatter:
– group into fixed number of clusters
– displays clusters to user
gather:
– user selects one or
more clusters
– system collects
these together
scatter:
– system clusters this
new collection
...
displaying clusters
scatter-gather browser
keywords (created by clustering algorithm)
‘typical’ documents
(with many cluster keywords)
hierarchical data
hierarchies are everywhere!
– file systems
– organisation charts
– taxonomies
– classification trees
– ontologies
– xml
problems with trees ...
width grows rapidlyhard to fit text labels
overlapping low level nodes
use 3D?
cone tree
– use stacked circles of subtrees
good use of 3D
still have occlusion ...
but ‘normal’ in 3D
shadows help to
disambiguate
but text labels
difficult
cone trees  cam trees
horizontal layout makes labels readable
small things matter!
x
x/a – 4
x/b – 2
y
y/c – 1
y/d – 1
y/e – 1
disect 2D space - treemaps
takes tree of items with some ‘size’
– e.g. file hierarchy, financial accounts
alternatively divides space horizontally/vertically for
each level, proportionate to total size
x [6] y [3]
x/a [4]
x/b [2] y/e [1]
y/c [1]
y/d [1]
http://www.cs.umd.edu/hcil/treemap-history/
treemaps (2)
later variants improved the shape and appearance of
maps
treemaps (3)
plus algorithms for vast data sets, for thumbnail
images, etc. etc.
distort space ...
tree branching factor b:
– number of nodes at depth d = bd
Euclidean 2D space:
– amount of space at radius r = 2πr
– not enough space!
non-Euclidean hyperbolic space:
– exponential space at radius r
hyperbolic browser
– lays out tree in hyperbolic space
– then uses 2D representation of hyperbolic space
multiple attributes
often data items have several attributes
e.g. document:
– type (journal, conference, book)
– date of publication
– author(s)
– multiple keywords (perhaps in taxonomy)
– citation count
– popularity
traditional approach ...
boolean queries
>new query
?type=‘journal’ and keyword=‘visualisation’
=query processing complete - 2175 results
list all (Y/N)
>N
>refine query
refine: type=‘journal’ and keyword=‘visualisation’
+author=‘smith’
=query processing complete - 0 results
faceted browsing
e.g. HiBrowse (one of the earliest)
multiple selection boxes
– ‘or’ within box - ‘and’ between boxes
digital libraries
HCI 173
formal models
interaction 157
task analysis
visualisation 39
web
keywords
all 173
catarci 53
dix 9
jones 17
shneiderman 153
smith 0
wilson 22
authors
all 173
book
conference
journal 173
other
types
digital libraries
HCI 173
formal models
interaction 157
task analysis
visualisation 39
web
keywords
all 173
catarci 53
dix 9
jones 17
shneiderman 153
smith 0
wilson 22
authors
all 173
book
conference
journal 173
other
types
HiBrowse (ii)
shows how many items with particular value
– e.g. 39 documents with keyword=‘visualisation’ and type=‘journal’e.g. 39 documents with keyword=‘visualisation’ and type=‘journal’
digital libraries
HCI 173
formal models
interaction 157
task analysis
visualisation 39
web
keywords
all 173
catarci 53
dix 9
jones 17
shneiderman 153
smith 0
wilson 22
authors
all 173
book
conference
journal 173
other
types
HiBrowse (iii)
can predict the effect of refining selection
– e.g. selecting ‘smith’ would give empty resulte.g. selecting ‘smith’ would give empty result
digital libraries
HCI 173
formal models
interaction 157
task analysis
visualisation 39
web
all 173
catarci 53
dix 9
jones 17
shneiderman 153
smith 0
wilson 22
all 173
book
conference
journal 173
other
keywords authors
digital libraries
HCI 39
formal models
interaction
task analysis
visualisation 39
web
all 39
book
conference
journal 39
other
all 39
catarci 18
dix 1
jones 3
shneiderman 21
smith 0
wilson 7
types
HiBrowse (iv)
refining selection updates counts in real time
all 45
book 6
conference
journal 39
other
all 45
catarci 19
dix 1
jones 5
shneiderman 24
smith 0
wilson 8
digital libraries
HCI 45
formal models
interaction
task analysis
visualisation 45
web
starfield (i)
scatter plot for two attributes
colour/shape codes for more
adjust rest with sliders
dots appear/disappear as slider values change
dynamic filtering
starfield (ii)
when few enough points more details appear
Influence Explorer (i)
developed for engineering models
like Starfield ...
but sliders show histogram
how many in category (like HiBrowse)
... and how many ‘just miss’
red = full match
black = all but one attribute
greys = fewer matching attr’s
Influence Explorer (ii)
some versions highlight individual items
in each histogram
similar technique has
been used to match
multiple taxonomic
classifications
Information Scent
Starfield
shows what is selected
• explore using trial and error
HiBrowse and Influence Explorer
show what happen
Pirolli et al. call this Information Scent
– things in the interface that help you know what
actions to take to find the information you want
very large datasets
too many points/lines to see
solutions ...
space-filling single-pixel per item
Keim’s VisD
random selection
(see Geoff Ellis’ thesis)
clustering
visualise groups not individuals

Information Visualisation – an introduction

  • 1.
  • 2.
    example Map your moves whereNew Yorkers move (10 years data) distorted map circle = moves for one zip code red – out blue – in overlaid http://moritz.stefaner.eu/projects/map%20your%20moves/
  • 3.
    example Map your moves interactive: selectinga zip code shows where movements to/from also hiding: what you don’t show also important http://moritz.stefaner.eu/projects/map%20your%20moves/
  • 5.
    what is visualistion? makingdata easier to understand using direct sensory experience especially visual! but can have aural, tactile ‘visualisation’
  • 6.
    direct sensory experience N.B.sensory rather than linguisitic sort of right/left brain stuff! but ... may include text, numbers, etc.
  • 7.
    visualising in text alignment- numbers think purpose! which is biggest? 532.56 179.3 256.317 15 73.948 1035 3.142 497.6256
  • 8.
    visualising in text alignment- numbers visually: long number = big number align decimal points or right align integers 627.865 1.005763 382.583 2502.56 432.935 2.0175 652.87 56.34
  • 9.
    visualising in text TableLens likea ‘spreadsheet’ ... ... but some rows squashed to one pixel high numbers become small histogram bars
  • 10.
    visualising in text TableLens N.B.also an example of focus+context context whole dataset can also be seen in overview focus some rows in full detail
  • 11.
    especially visual visual cortexis 50% of the brain! ... but disability, context, etc., may mean non-visual forms needed
  • 13.
    why visualisation? for thedata analyst scientist, statistician, possibly you! for the data consumer audience, client, reader, end-user
  • 14.
    why visualisation? consumer understanding rhetoric focus onwell understood, simple representations
  • 15.
    why visualisation? consumer understanding rhetoric to helpothers see what the analyst has already seen infographics data journalism http://www.guardian.co.uk/news/datablog/2010/oct/18/deficit-debt-government-borrowing-data
  • 16.
    why visualisation? consumer understanding rhetoric to persuadereaders of particular point (and not others!) lies, damn lies, and graphs the business plan hockey stick!
  • 17.
  • 18.
    why visualisation? consumer understanding exploration to makemore clear particular aspects of data confirming hypotheses e.g. box plots in stats noticing exceptions graph from: Measurement of the neutrino velocity with the OPERA detector in the CNGS beam
  • 19.
    why visualisation? consumer understanding exploration to findnew things that have not been previously considered seeking the unknown avoiding the obvious wary of happenstance
  • 21.
    a brief historyof visualisation from 2500 BC to 2012
  • 22.
    a brief history... static visualisation – the first 2500 years interactive visualisation – the glorious ’90s and now? – web and mass data – visual analytics
  • 23.
    static visualisation from claytablets to Tufte Mesopotamian tablets
  • 24.
    static visualisation from claytablets to Tufte Mesopotamian tablets 10th Century time line
  • 25.
    static visualisation from claytablets to Tufte Mesopotamian tablets 10th Century time line 1855 Paris-Lyon train timetable
  • 27.
    static visualisation from claytablets to Tufte Mesopotamian tablets 10th Century time line 1855 Paris-Lyon train timetable Excel etc.
  • 28.
    static visualisation read Tufte’sbooks ... – The Visual Display of Quantitative Information – Envisioning Information – Visual Explanations
  • 29.
    interactive visualisation early 1990sgrowing graphics power – 3D graphics – complex visualisations – real-time interaction possible
  • 30.
    ... and now loadsof data web visualisation data journalism http://www.guardian.co.uk/news/datablog/2010/oct/18/deficit-debt-government-borrowing-data http://www-958.ibm.com/software/data/cognos/manyeyes/
  • 31.
  • 33.
  • 34.
  • 35.
  • 36.
    world organisational social & political context direct interaction datavisualisation decision action processing the big picture
  • 38.
  • 39.
    choosing representations visualisation factors –visual ‘affordances’ • what we can see – objectives, goals and tasks • what we need to see – aesthetics • what we like to see what we can see what we need to see what we like to see
  • 40.
    trade-off visualisation factors – visualaffordances – objectives, goals and tasks – aesthetics static representation  trade-off interaction reduces trade-off –stacking histogram, overview vs. detail, etc. etc. interaction reduces trade-off – stacking histogram, overview vs. detail, etc. etc.
  • 41.
    relaxing constraints normal stackedhistogram good for: –overall trend –relative proportions –trend in bottom category bad for others –what is happening to bananas? ?
  • 42.
    make your own(iii) relaxing constraints interactive stacking histograms ... or ... dancing histograms normal histogram except ... normal histogram except ... dancing histograms
  • 43.
    make your own(iii) relaxing constraints interactive stacking histograms ... or ... dancing histograms normal histogram except ... hover over cell to show detail
  • 44.
    make your own(iii) relaxing constraints interactive stacking histograms ... or ... dancing histograms normal histogram except ... hover over cell to reveal detail click on legend to change baseline demonstration
  • 45.
    kinds of interaction highlightingand focus drill down and hyperlinks overview and context changing parameters changing representations temporal fusion
  • 46.
    Shneiderman’s visualisation mantra overview first, zoomand filter, then details on demand http://www.sapdesignguild.org/community/book_people/visualization/controls/FilmFinder.htm overview zoom and filter using sliders details on demand
  • 48.
  • 49.
    displaying groups/clusters numeric attributes –use average or region categorical attributes – show values of attributes common to cluster text, images, sound – no sensible ‘average’ to display – use typical documents/images – central to cluster ... or spread within cluster
  • 50.
    using clusters the scatter/gatherbrowser take a collection of documents scatter: – group into fixed number of clusters – displays clusters to user gather: – user selects one or more clusters – system collects these together scatter: – system clusters this new collection ...
  • 51.
    displaying clusters scatter-gather browser keywords(created by clustering algorithm) ‘typical’ documents (with many cluster keywords)
  • 52.
    hierarchical data hierarchies areeverywhere! – file systems – organisation charts – taxonomies – classification trees – ontologies – xml
  • 53.
    problems with trees... width grows rapidlyhard to fit text labels overlapping low level nodes
  • 54.
    use 3D? cone tree –use stacked circles of subtrees
  • 55.
    good use of3D still have occlusion ... but ‘normal’ in 3D shadows help to disambiguate but text labels difficult
  • 56.
    cone trees cam trees horizontal layout makes labels readable small things matter!
  • 57.
    x x/a – 4 x/b– 2 y y/c – 1 y/d – 1 y/e – 1 disect 2D space - treemaps takes tree of items with some ‘size’ – e.g. file hierarchy, financial accounts alternatively divides space horizontally/vertically for each level, proportionate to total size x [6] y [3] x/a [4] x/b [2] y/e [1] y/c [1] y/d [1] http://www.cs.umd.edu/hcil/treemap-history/
  • 58.
    treemaps (2) later variantsimproved the shape and appearance of maps
  • 59.
    treemaps (3) plus algorithmsfor vast data sets, for thumbnail images, etc. etc.
  • 60.
    distort space ... treebranching factor b: – number of nodes at depth d = bd Euclidean 2D space: – amount of space at radius r = 2πr – not enough space! non-Euclidean hyperbolic space: – exponential space at radius r hyperbolic browser – lays out tree in hyperbolic space – then uses 2D representation of hyperbolic space
  • 61.
    multiple attributes often dataitems have several attributes e.g. document: – type (journal, conference, book) – date of publication – author(s) – multiple keywords (perhaps in taxonomy) – citation count – popularity
  • 62.
    traditional approach ... booleanqueries >new query ?type=‘journal’ and keyword=‘visualisation’ =query processing complete - 2175 results list all (Y/N) >N >refine query refine: type=‘journal’ and keyword=‘visualisation’ +author=‘smith’ =query processing complete - 0 results
  • 63.
    faceted browsing e.g. HiBrowse(one of the earliest) multiple selection boxes – ‘or’ within box - ‘and’ between boxes digital libraries HCI 173 formal models interaction 157 task analysis visualisation 39 web keywords all 173 catarci 53 dix 9 jones 17 shneiderman 153 smith 0 wilson 22 authors all 173 book conference journal 173 other types
  • 64.
    digital libraries HCI 173 formalmodels interaction 157 task analysis visualisation 39 web keywords all 173 catarci 53 dix 9 jones 17 shneiderman 153 smith 0 wilson 22 authors all 173 book conference journal 173 other types HiBrowse (ii) shows how many items with particular value – e.g. 39 documents with keyword=‘visualisation’ and type=‘journal’e.g. 39 documents with keyword=‘visualisation’ and type=‘journal’
  • 65.
    digital libraries HCI 173 formalmodels interaction 157 task analysis visualisation 39 web keywords all 173 catarci 53 dix 9 jones 17 shneiderman 153 smith 0 wilson 22 authors all 173 book conference journal 173 other types HiBrowse (iii) can predict the effect of refining selection – e.g. selecting ‘smith’ would give empty resulte.g. selecting ‘smith’ would give empty result
  • 66.
    digital libraries HCI 173 formalmodels interaction 157 task analysis visualisation 39 web all 173 catarci 53 dix 9 jones 17 shneiderman 153 smith 0 wilson 22 all 173 book conference journal 173 other keywords authors digital libraries HCI 39 formal models interaction task analysis visualisation 39 web all 39 book conference journal 39 other all 39 catarci 18 dix 1 jones 3 shneiderman 21 smith 0 wilson 7 types HiBrowse (iv) refining selection updates counts in real time all 45 book 6 conference journal 39 other all 45 catarci 19 dix 1 jones 5 shneiderman 24 smith 0 wilson 8 digital libraries HCI 45 formal models interaction task analysis visualisation 45 web
  • 67.
    starfield (i) scatter plotfor two attributes colour/shape codes for more adjust rest with sliders dots appear/disappear as slider values change dynamic filtering
  • 68.
    starfield (ii) when fewenough points more details appear
  • 69.
    Influence Explorer (i) developedfor engineering models like Starfield ... but sliders show histogram how many in category (like HiBrowse) ... and how many ‘just miss’ red = full match black = all but one attribute greys = fewer matching attr’s
  • 70.
    Influence Explorer (ii) someversions highlight individual items in each histogram similar technique has been used to match multiple taxonomic classifications
  • 71.
    Information Scent Starfield shows whatis selected • explore using trial and error HiBrowse and Influence Explorer show what happen Pirolli et al. call this Information Scent – things in the interface that help you know what actions to take to find the information you want
  • 72.
    very large datasets toomany points/lines to see solutions ... space-filling single-pixel per item Keim’s VisD random selection (see Geoff Ellis’ thesis) clustering visualise groups not individuals