The document discusses the visualization of multi-dimensional data, emphasizing the importance of transforming symbolic information into geometric representations. It covers various methods for visualizing small, large, and big data, outlining processes to acquire, encode, and render data effectively. Additionally, it highlights the necessity of interactivity and different visualization techniques, such as scatter plots, treemaps, and parallel coordinates, in interpreting complex datasets.
Introduction by Amit Kapoor on visualizing multi-dimensional data, referencing Edwin Abbot's 'Flatland' and discussions on dimensions with examples like squares and spheres.
Highlights the significance of visual processing: 70% of sensory receptors are in the eyes, emphasizing visualization transformation from symbolic to geometric.
Categorizes data into Small, Large, Big, and Wide Data with examples of area sales data across different regions.
Describes parsing variables and encoding data for visualization, including axes and scales with area sales data.
Details various visualization systems like points, lines, and bars, as well as coordinate systems—Cartesian and Polar—for plotting data.
Defines the data visualization process for Small and Large Data, emphasizing data acquisition, encoding, rendering, and refining data.
Explores Big Data visualization, the need for histograms, and the processes involved in handling large datasets including acquisition, encoding, and filtering.
Lists methods for multi-dimensional visualization, including pixel-based, glyph, and geometric transforms to enhance data interaction and interpretation.
Analyzes a diamonds dataset with over 50K observations, discussing related variables like price, carat, cut, color, clarity, and their dimensions.
Discusses various chart options for 1D and 2D data visualizations, along with interactivity techniques like zooming and annotation.
Introduces advanced techniques for visualizing 4D to 6D data, including bubble plots, trellis, parallel coordinates, and interactive features.
Details the procedure for visualizing wide data, emphasizing thoughtful encoding, interactivity, and dimensionality reduction.
Provides resources like GitHub code for examples and concludes with a quote highlighting the value of visual insights.
Area Sales
North 5
East25
West 15
South 20
Central 10
x y
1 5
2 25
3 15
4 20
5 10
x (C) = Area
y (Q) = Sales
Parse
Variables
Acquire
Data
14.
Area Sales
North 5
East25
West 15
South 20
Central 10
x y
1 5
2 25
3 15
4 20
5 10
x (C) = Area
y (Q) = Sales
x y
20
60
100
140
180
Encode Shape
& Select Scales
Parse
Variables
Acquire
Data
x - position, y - bar
scale - 200 x 200
15.
Area Sales
North 5
East25
West 15
South 20
Central 10
x y
1 5
2 25
3 15
4 20
5 10
x (C) = Area
y (Q) = Sales
x - position, y - bar
scale - 200 x 200
x y
20
60
100
140
180
Parse
Variables
Acquire
Data
cartesian
Render with
Coordinates
Encode Shape
& Select Scales
16.
Points Line Bar
Bar- Stacked Bar - Stagger Coordinates
System
Create Visualisations
Visualise Big
Data
x,y =>1,000,000
Comparable to the
Number of Pixels
on my MacBook Air
1400 x 900
Data
30.
Data Sample
Sampling canbe
effective (with
overweighting
unusual values)
Require multiple
plots or careful
tuning parameters
31.
Data Sample
Model
Models aregreat as
they scale nicely.
But, visualisation is
required as
“I don’t know, what I
don’t know.”
32.
Data Sample
ModelBinning
Binning cansolve a
lot of these
challenges
“Bin - Summarize - Smooth: A
framework for visualising big
data” - Hadley Wickam (2013)
“imMens: Real-time Visual
Querying of Big Data” - Liu,
Jiang, Heer (2013)
Multi Dimensional Viz
Standard
2d/3d
PixelBased
Approach
Glyph
Approach
Geometric
Transforms
Stacking
Approach
Scatterplot
SPLOM
Trellis / Facets
Multiple View
Star plots
Stick Figure
Chernoff Faces
Color Icons
Parallel Coord
Table lens
Star Coords
Tours
Space Filling
Pixel Bar Chart
Spiral Technique
Treemaps
Dimensional
Stacking
Hierarchical
Axis
40.
Multi Dimensional Viz
Standard
2d/3d
PixelBased
Approach
Glyph
Approach
Geometric
Transforms
Stacking
Approach
Scatterplot
SPLOM
Trellis / Facets
Multiple View
Star plots
Stick Figure
Chernoff Faces
Color Icons
Parallel Coord
Table lens
Star Coords
Tours
Space Filling
Pixel Bar Chart
Treemaps
Dimensional
Stacking
Hierarchical
Axis
Need for Interaction
Ease of Interpretation
Spiral Technique
Diamonds dataset
50K+ observationsof 10 dimensions
Price of
diamonds is
related to
the 4C’s
price in US$
carat weight (⅕ of a gram)
cut 5 levels [Fair to ideal]
colour 7 levels [J to D]
clarity 8 levels [I1 to IF]
43.
Diamonds dataset
50K+ observationsof 10 dimensions
z
depth
table width
z
y
x
x length mm
y width mm
z height mm
depth z depth %
table table width %
44.
Diamonds dataset
price caratcut color clarity x y z depth table
326 0.23 Ideal E SI2 3.95 3.98 2.43 61.5 55
326 0.21 Premium E SI1 3.89 3.84 2.31 59.8 61
327 0.23 Good E VS1 4.05 4.07 2.31 56.9 65
334 0.29 Premium I VS2 4.2 4.23 2.63 62.4 58
335 0.31 Good J SI2 4.34 4.35 2.75 63.3 58
336 0.24 Very Good J VVS2 3.94 3.96 2.48 62.8 57
50K+ observations of 10 dimensions
Spiral Pixel CurvePixel Bar Chart
Pixel Bar Chart - KeimVisDB - Keim
Pixel Based Approach
82.
Data Viz Process
(WideData)
Acquire
Data
Encode
Shape
Select
Scales
Render
Algorithm
Parse
Variables
Filter
Data
Aggregate
Data
Make
Views
Add
Interactivity
83.
Data Viz Process
(WideData)
Acquire
Data
Encode
Shape
Select
Scales
Render
Algorithm
Parse
Variables
Filter
Data
Aggregate
Data
Make
Views
Add
Interactivity
1. Encode wisely
2. Use space and multiples
3. Add interactivity
4. Reduce dimensions