Data Visualization - An introduction
Prof Jan Aerts
Biodata Visualization and Analysis
ESAT/SCD
University of Leuven
Belgium

twitter: @jandot
Google+: +Jan Aerts
jan.aerts@esat.kuleuven.be
http://biovizanlab.wordpress.com
http://saaientist.blogspot.com
1. What is data visualization?
“A good sketch is better than a long speech” (Napoleon)
“A good sketch is better than a long speech” (Napoleon)




shows: size of the army, geographical coordinates, direction that the army
was traveling, location of the army with respect to certain dates, temperature
along the path of the retreat
John Snow - cholera map
Shape of Songs: “Like a Prayer” (Madonna)
                        Martin Wattenberg
http://multimedia.mcb.harvard.edu/anim_innerlife.html
What I use as a definition:


“computer-based visualization systems providing visual representations of
datasets intended to help people carry out some task more effectively.” (T
Munzner)
cognition <=> perception
cognitive task => perceptive task

      “eyes beat memory”
Why do we visualize data?
• record information

   • blueprints, photographs,
     seismographs, ...

• analyze data to support reasoning

   • develop & assess hypotheses

   • discover errors in data

   • expand memory

   • find patterns (see Snow’s cholera map)

• communicate information

   • share & persuade

   • collaborate & revise
exploration     explanation



pictorial superiority effect

      “information”


           72hr




  “informa”        “i”
     65%           1%
2. Exploration <-> explanation
exploration   explanation
exploration   explanation

 visual
               infographics
analytics
exploration   explanation

 visual
               infographics
analytics
exploration   explanation

 visual
               infographics
analytics



 hypothesis
 generation
exploration           explanation




“visual analytics”




               => identify unexpected patterns
exploration                explanation




              J van Wijk
Anscombe’s quartet



• uX = 9.0
• uY = 7.5
• sigma X = 3.317
• sigma Y = 2.03
• Y = 3 + 0.5X
• R2 = 0.67
A concrete example: hive plots
same network




     Martin Krzewinsky
different networks!

                      Martin Krzewinsky
3D, anyone?
3D, anyone?




         occlusion
   interaction complexity
   perspective distortion
        text legibility
Functions in linux operation system:
                            “function A calls function B”




Gene interaction data:
“gene A regulates gene B”
regulator




workhorse
                        manager
3. Why specifically learn about dataviz?
Isn’t it all just about using common sense?
• huge space of design alternatives => many tradeoffs

• many possibilities known to be ineffective

   • avoid random walk through parameter space

   • avoid some of our past mistakes

   • extensive experimentation has already been done

• guidelines continue to evolve

   • we reflect on lessons learned in design studies

   • iterative refinement usually wise
4. Stages of data visualization
How do we get from data to visualization? We need to understand:

• properties of the data

• properties of the image

• the rules mapping data to image
4.1. Properties of the data
S Stevens “On the theory of scales and measurements” (1946)
4.2. Properties of the image - perception
Semiology of graphics

• Jacques Bertin, Gauthier-Villars 1967, EHESS 1998

• semiology = study of signs and sign processes, likeness, analogy, metaphor,
  symbolism, signification, and communication (Wikipedia)

• visual encoding:

   • what - points, lines, areas (, patterns, trees/networks, grids)

   • where - positional: XY (1D, 2D, 3D)

   • how - retinal: Z (size, lightness, texture, colour, orientation, shape)

   • when - temporal: animation
“marks” - geometric primitives




         H

         V

         S




    “channels” - control appearance of marks
Gestalt laws - interplay between parts and the
whole (Kurt Koffka)

   series of principles




                          Election results Florida:

                           • black = Bush
                           • white = Gore
Gestalt - Principle of Simplicity

 Every pattern we see is seen such that we see a structure that is as simple as
 possible.
Gestalt - Principle of Proximity

 Things that are close to each other are seen as belonging together (=>
 clusters)
Gestalt - Principle of Similarity

 Things that are similar in some way are perceived as belonging together.
Gestalt - Principle of Closure

 You will try to complete a pattern.
Gestalt - Principle of Connectedness

 Things that are connected are perceived as belonging together. This encoding
 is stronger than similarity, shape, colour, and size.
Gestalt - Principle of Good Continuation

 Objects that are arranged in a straight or smooth line tend to be seen as a
 unit.
Gestalt - Principle of Common Fate

 Objects that move in the same direction tend to be seen as a unit.
Gestalt - Principle of Familiarity
Gestalt - Principle of Symmetry

 Symmetrical areas tend to be seen as figures against asymmetrical
 backgrounds.
Context affects perceptual tasks
Pre-attentive vision

= ability of low-level human visual system to rapidly identify certain basic visual
properties

• some features “pop out”

• used for:

   • target detection

   • boundary detection

   • counting/estimation

   • ...

• visual system takes over => all cognitive power available for interpreting the
  figure, rather than needing part of it for processing the figure
Really fast; see http://www.csc.ncsu.edu/faculty/healey/PP/
Limitations of preattentive vision

1. Combining pre-attentive features does not always work => would need to
resort to “serial search” (most channel pairs; all channel triplets)
e.g. is there a red square in this picture




  2. Speed depends on which channel (use one that is good for
  categorical; see further (“accuracy”))
4.3. Mapping data to image: visual encoding
Language of graphics

• graphics = sign system:


  • each mark (point, line, area) represents a data element


  • choose visual variables to encode relationships between data elements


     • difference, similarity, order, proportion


     • only position supports all relationships (see later)


  • huge range of alternatives for data with many attributes


     • find images that express & effectively convey the information
Which encoding should I use?

• From huge list of possibilities, you have to choose the best one.


• Principle of Consistency


   • properties of the representation should match properties of the data (e.g.
     pie chart: area vs radius)


• Principle of Importance Ordering


   • encode the most important piece of information in the most “effective”
     way (i.e. spatial position)
Steven’s psychophysical law

 = proposed relationship between the magnitude of a physical stimulus and its
 perceived intensity or strength
Accuracy of quantitative perceptual tasks
           how much (quantitative)   what/where (qualitative)




                                                      McKinlay
Accuracy of quantitative perceptual tasks
           how much (quantitative)   what/where (qualitative)




                                                      McKinlay
Accuracy of quantitative perceptual tasks
           how much (quantitative)    what/where (qualitative)




                                                       McKinlay
                    “power of the plane”
Accuracy of quantitative perceptual tasks
           how much (quantitative)               what/where (qualitative)




                    grouping: see Gestalt laws




                                                                  McKinlay
COLOUR
COLOUR ... is tricky, and often used wrong
Colour space

• = mathematical model to talk about colour


• RGB (red-green-blue)


  • most common, but less useful


• HSV (hue-saturation-value)


  • more useful
colorbrewer2.org




in R: please use RColorBrewer!
Context affects colour perception
Context affects colour perception
Dangers of Depth (3D)

• We do NOT see in 3D; we see in 2.05D.


• occlusion


• interaction complexity


• perspective distortion
3D example
Lie factor




                      size of effect shown in graphic
     “lie factor” =
                           size of effect in data
3D scatter plots are better as series of 2D projections
Dynamic data

• animation is good sometimes, but often not:


  • we can only follow 3-4 visual cues simultaneously


  • change in “mental map”


• change blindness (e.g. http://nivea.psycho.univ-paris5.fr/CBMovies/
  BarnTrackFlickerMovie.gif)
http://vimeo.com/2035117
5. Interaction
Overview, zoom and filter, details on demand
(Schneiderman’s Information Seeking Mantra)
Operations on the data

• sorting


• filtering


• browsing/exploring


• comparison


• characterizing trends & distributions


• finding anomalies & outliers


• ...
Techniques to support these operations

• re-orderable matrices


• brushing


• linked views


• overview & detail


• focus & context


• ...
6. Validation
Evaluate the right thing




                           Munzner, 2009
Slide/picture acknowledgments

• Jeffrey Heer


• Tamara Munzner


• Jessie Kennedy


• Nils Gehlenborg


• Miriah Meyer
“I think this presentation went quite well...”

Intro to data visualization

  • 1.
    Data Visualization -An introduction Prof Jan Aerts Biodata Visualization and Analysis ESAT/SCD University of Leuven Belgium twitter: @jandot Google+: +Jan Aerts [email protected] http://biovizanlab.wordpress.com http://saaientist.blogspot.com
  • 2.
    1. What isdata visualization?
  • 3.
    “A good sketchis better than a long speech” (Napoleon)
  • 4.
    “A good sketchis better than a long speech” (Napoleon) shows: size of the army, geographical coordinates, direction that the army was traveling, location of the army with respect to certain dates, temperature along the path of the retreat
  • 5.
    John Snow -cholera map
  • 6.
    Shape of Songs:“Like a Prayer” (Madonna) Martin Wattenberg
  • 7.
  • 9.
    What I useas a definition: “computer-based visualization systems providing visual representations of datasets intended to help people carry out some task more effectively.” (T Munzner)
  • 11.
    cognition <=> perception cognitivetask => perceptive task “eyes beat memory”
  • 12.
    Why do wevisualize data? • record information • blueprints, photographs, seismographs, ... • analyze data to support reasoning • develop & assess hypotheses • discover errors in data • expand memory • find patterns (see Snow’s cholera map) • communicate information • share & persuade • collaborate & revise
  • 13.
    exploration explanation pictorial superiority effect “information” 72hr “informa” “i” 65% 1%
  • 14.
    2. Exploration <->explanation
  • 15.
    exploration explanation
  • 16.
    exploration explanation visual infographics analytics
  • 17.
    exploration explanation visual infographics analytics
  • 18.
    exploration explanation visual infographics analytics hypothesis generation
  • 19.
    exploration explanation “visual analytics” => identify unexpected patterns
  • 20.
    exploration explanation J van Wijk
  • 21.
    Anscombe’s quartet • uX= 9.0 • uY = 7.5 • sigma X = 3.317 • sigma Y = 2.03 • Y = 3 + 0.5X • R2 = 0.67
  • 24.
  • 25.
    same network Martin Krzewinsky
  • 26.
    different networks! Martin Krzewinsky
  • 27.
  • 28.
    3D, anyone? occlusion interaction complexity perspective distortion text legibility
  • 29.
    Functions in linuxoperation system: “function A calls function B” Gene interaction data: “gene A regulates gene B”
  • 30.
  • 31.
    3. Why specificallylearn about dataviz?
  • 32.
    Isn’t it alljust about using common sense?
  • 33.
    • huge spaceof design alternatives => many tradeoffs • many possibilities known to be ineffective • avoid random walk through parameter space • avoid some of our past mistakes • extensive experimentation has already been done • guidelines continue to evolve • we reflect on lessons learned in design studies • iterative refinement usually wise
  • 34.
    4. Stages ofdata visualization
  • 35.
    How do weget from data to visualization? We need to understand: • properties of the data • properties of the image • the rules mapping data to image
  • 36.
  • 37.
    S Stevens “Onthe theory of scales and measurements” (1946)
  • 38.
    4.2. Properties ofthe image - perception
  • 39.
    Semiology of graphics •Jacques Bertin, Gauthier-Villars 1967, EHESS 1998 • semiology = study of signs and sign processes, likeness, analogy, metaphor, symbolism, signification, and communication (Wikipedia) • visual encoding: • what - points, lines, areas (, patterns, trees/networks, grids) • where - positional: XY (1D, 2D, 3D) • how - retinal: Z (size, lightness, texture, colour, orientation, shape) • when - temporal: animation
  • 40.
    “marks” - geometricprimitives H V S “channels” - control appearance of marks
  • 41.
    Gestalt laws -interplay between parts and the whole (Kurt Koffka) series of principles Election results Florida: • black = Bush • white = Gore
  • 43.
    Gestalt - Principleof Simplicity Every pattern we see is seen such that we see a structure that is as simple as possible.
  • 44.
    Gestalt - Principleof Proximity Things that are close to each other are seen as belonging together (=> clusters)
  • 45.
    Gestalt - Principleof Similarity Things that are similar in some way are perceived as belonging together.
  • 46.
    Gestalt - Principleof Closure You will try to complete a pattern.
  • 47.
    Gestalt - Principleof Connectedness Things that are connected are perceived as belonging together. This encoding is stronger than similarity, shape, colour, and size.
  • 48.
    Gestalt - Principleof Good Continuation Objects that are arranged in a straight or smooth line tend to be seen as a unit.
  • 49.
    Gestalt - Principleof Common Fate Objects that move in the same direction tend to be seen as a unit.
  • 50.
    Gestalt - Principleof Familiarity
  • 54.
    Gestalt - Principleof Symmetry Symmetrical areas tend to be seen as figures against asymmetrical backgrounds.
  • 55.
  • 56.
    Pre-attentive vision = abilityof low-level human visual system to rapidly identify certain basic visual properties • some features “pop out” • used for: • target detection • boundary detection • counting/estimation • ... • visual system takes over => all cognitive power available for interpreting the figure, rather than needing part of it for processing the figure
  • 57.
    Really fast; seehttp://www.csc.ncsu.edu/faculty/healey/PP/
  • 58.
    Limitations of preattentivevision 1. Combining pre-attentive features does not always work => would need to resort to “serial search” (most channel pairs; all channel triplets) e.g. is there a red square in this picture 2. Speed depends on which channel (use one that is good for categorical; see further (“accuracy”))
  • 59.
    4.3. Mapping datato image: visual encoding
  • 60.
    Language of graphics •graphics = sign system: • each mark (point, line, area) represents a data element • choose visual variables to encode relationships between data elements • difference, similarity, order, proportion • only position supports all relationships (see later) • huge range of alternatives for data with many attributes • find images that express & effectively convey the information
  • 61.
    Which encoding shouldI use? • From huge list of possibilities, you have to choose the best one. • Principle of Consistency • properties of the representation should match properties of the data (e.g. pie chart: area vs radius) • Principle of Importance Ordering • encode the most important piece of information in the most “effective” way (i.e. spatial position)
  • 63.
    Steven’s psychophysical law = proposed relationship between the magnitude of a physical stimulus and its perceived intensity or strength
  • 64.
    Accuracy of quantitativeperceptual tasks how much (quantitative) what/where (qualitative) McKinlay
  • 65.
    Accuracy of quantitativeperceptual tasks how much (quantitative) what/where (qualitative) McKinlay
  • 66.
    Accuracy of quantitativeperceptual tasks how much (quantitative) what/where (qualitative) McKinlay “power of the plane”
  • 67.
    Accuracy of quantitativeperceptual tasks how much (quantitative) what/where (qualitative) grouping: see Gestalt laws McKinlay
  • 68.
  • 69.
    COLOUR ... istricky, and often used wrong
  • 70.
    Colour space • =mathematical model to talk about colour • RGB (red-green-blue) • most common, but less useful • HSV (hue-saturation-value) • more useful
  • 71.
  • 72.
  • 73.
  • 74.
    Dangers of Depth(3D) • We do NOT see in 3D; we see in 2.05D. • occlusion • interaction complexity • perspective distortion
  • 75.
  • 76.
    Lie factor size of effect shown in graphic “lie factor” = size of effect in data
  • 77.
    3D scatter plotsare better as series of 2D projections
  • 78.
    Dynamic data • animationis good sometimes, but often not: • we can only follow 3-4 visual cues simultaneously • change in “mental map” • change blindness (e.g. http://nivea.psycho.univ-paris5.fr/CBMovies/ BarnTrackFlickerMovie.gif)
  • 79.
  • 81.
  • 82.
    Overview, zoom andfilter, details on demand (Schneiderman’s Information Seeking Mantra)
  • 83.
    Operations on thedata • sorting • filtering • browsing/exploring • comparison • characterizing trends & distributions • finding anomalies & outliers • ...
  • 84.
    Techniques to supportthese operations • re-orderable matrices • brushing • linked views • overview & detail • focus & context • ...
  • 85.
  • 86.
    Evaluate the rightthing Munzner, 2009
  • 87.
    Slide/picture acknowledgments • JeffreyHeer • Tamara Munzner • Jessie Kennedy • Nils Gehlenborg • Miriah Meyer
  • 88.
    “I think thispresentation went quite well...”