Information Visualization Fundamentals

What is Information Visualization? Why Use It?

“The use of computer-supported, interactive, visual representations of abstract data to amplify cognition.”

Key Concepts

💻 Computer-Based
👁️ Visual Representation
📊 Abstract Data
🔄 Interactive
🧠 Amplify Cognition

Abstract Data

Data with no obvious/natural visual representation

Abstract Data

Data with no obvious/natural visual representation

Interactive

Users can change what is visualized and how it is visualized.

TaxiVis - Ferreira et al., IEEE TVCG 2013

Amplify Cognition

Solve problems with data with less effort, in a shorter time, and more accurately.
… or even be able to do things it would be impossible to do without a computer and a graphical representation.

Cognitive artifacts: tools that help us think!

Try to multiply 34 x 72 using exclusively your mind …
… now do it again using pen and paper.

Why is it easier?

🧠 Memory in Head

Limited capacity (~7 items)
Decays quickly
Requires effort to maintain
Error-prone

📝 Memory in World

Unlimited storage
Persistent information
External processing
Reliable reference

Key Insight: Visualization offloads cognitive work from your mind to the visual system

Let’s play the “game of 15” …

The “pieces” for the game are the nine digits: 1, 2, 3, 4, 5, 6, 7, 8, 9. Each player takes a digit in turn. Once a digit is taken, it cannot be used by the other player. The first player to get three digits that sum to 15 wins.
Here is a sample game: Player A takes 8. Player B takes 2. Then A takes 4, and B takes 3. A takes 5.
Question 1: Suppose you are now to step in and play for B. What move would you make?

Let’s play a different game: tic-tac-toe

Players alternately place an O or a X in one of nine spaces arranged in a rectangular array. Once a space has been taken, it cannot be changed by either player. The first player to get three symbols in a straight line wins. Suppose player A is X and B is O, and the game has reached the state on the right.

Question 2: Suppose you are now to step in and play an O for B. What move would you make?

Problem Isomorphs

Herbert Simon’s Insight

The two problems are isomorphic - structurally identical!
Same problem, different representation
The tic-tac-toe representation makes the solution obvious
Key insight: The right representation can dramatically simplify problem solving

Learn more about Herbert Simon

Herbert A. Simon (1916-2001) - Nobel Prize in Economics (1978), Turing Award (1975)

Pioneer in cognitive psychology, artificial intelligence, and decision-making

Why use visualization?

📢 Explanatory

Show insights to others

Present findings
Tell data stories
Communicate clearly

🔍 Exploratory

Find patterns yourself

Discover unknowns
Generate hypotheses
Understand data

✅ Confirmatory

Validate hypotheses

Test assumptions
Verify patterns
Support decisions

The Power of Visualization: Discovery

John Snow’s Cholera Map (1854)

Mapped cholera deaths in London
Revealed cluster around Broad Street water pump
Visual evidence stopped the outbreak

Tip

Takeaway: Visualization is a powerful tool for discovery and finding patterns invisible in raw data.

John Snow's Cholera Map — John Snow’s Cholera Map

The Power of Visualization: Storytelling

Charles Minard’s Map of Napoleon’s March (1869)

Widely considered one of the best statistical graphics ever created
Shows six variables simultaneously:
- Army size
- Location & direction
- Temperature
- Distance & time

Tip

Takeaway: Visualization is a powerful medium for dense, high-impact storytelling.

Charles Minard's Map of Napoleon's March — Charles Minard’s Map of Napoleon’s March

The Power of Visualization: Exploration

NYT: “How Y’all, Youse and You Guys Talk” (2013)

Modern, interactive visualization
Built with web technologies (like D3.js!)
Allows personal exploration of dialect data
Engages users through personalized results

Try it yourself: NYT Dialect Quiz

Tip

Takeaway: Visualization can be a dynamic interface for personal data exploration.

New York Times Dialect Map — NYT Dialect Map

Great Explanatory Visualizations

NYT: https://flowingdata.com/tag/new-york-times/
Washington Post: http://postgraphics.tumblr.com/
Gregor Aisch: https://driven-by-data.net/
Nicky Case/Explorable Explanations: http://explorabl.es/
Polygraph: http://polygraph.cool/ & https://pudding.cool/
ProPublica: https://www.propublica.org/

Why use a graphical representation?

Large parts of our brain are devoted to spatial processing

Why use a computer to visualize data?

📊 Scale

Handle millions of data points efficiently

🔄 Interactivity

Explore data dynamically with zoom, filter, details

⚡ Real-time

Live updates as data changes continuously

🧮 Computation

Complex calculations and transformations on-the-fly

Why use interaction?

Each visualization can only answer a subset of questions.
With interaction the user can change what is visualized and how to answer a multitude of questions.
Also one cannot visualize everything at once.

How do you assess the quality of a visualization?

Isn’t it subjective? Some people like A, whereas some others like B.

Some visual representations are better than others at solving particular problems …

Digression: Graphical Perception

Graphical Perception Experiment

Graphical Perception Results

Designing effective visualizations requires

🎨 Knowing the design space
What visual encodings and techniques are available?
⚖️ Being able to compare solutions
Which design best serves the intended purpose?
👁️ Understanding human perception
How do people actually see and interpret visual information?

Data Types

The first ingredient in effective visualization is the input data. Data values can represent different forms of measurement.
What kinds of comparisons do those measurements support?
What kinds of visual encodings then support those comparisons?

Nominal (N) or Categorical (C)

Nominal data — also called categorical data — consist of category names.
With nominal data we can compare the equality of values: is value A the same or different than value B? (A = B), supporting statements like “A is equal to B” or “A is not equal to B”.
When visualizing nominal data we should readily perceive if values are the same or different: position, color hue (blue, red, green, etc.), and shape are all reasonable options.

Ordinal (O)

Ordinal data consist of values that have a specific ordering.
With ordinal data we can compare the rank-ordering of values: does value A come before or after value B? (A < B), supporting statements like “A is less than B” or “A is greater than B”.
When visualizing ordinal data, we should perceive a sense of rank-order. Position, size, or color value (brightness) might be appropriate, whereas color hue (which is not perceptually ordered) would be less appropriate.

Quantitative (Q)

With quantitative data we can measure numerical differences among values.
There are multiple sub-types of quantitative data:
- For interval data we can measure the distance between points: (A - B).
- For ratio data we can also measure proportions or scale factors: (A / B).
Quantitative values can be visualized using position, size, or color value, among other channels. An axis with a zero baseline is essential for proportional comparisons of ratio values, but can be safely omitted for interval comparisons.

Temporal (T)

Temporal values measure time points or intervals. This type is a special case of quantitative values (timestamps) with rich semantics and conventions (i.e., the Gregorian calendar).
Example temporal values include date strings such as “2019-01-04” and “Jan 04 2019”, as well as standardized date-times such as the ISO date-time format: “2019-01-04T17:50:35.643Z”. There are no temporal values in our global development dataset above, as the year field is encoded as an integer.

Spatial (S)

Data that can be shown in a map
Also known as geospatial data, refers to information that identifies the geographic location and characteristics of natural or constructed features and boundaries on the Earth. https://atlan.com/spatial-data/

Data Types Summary

These data types are not mutually exclusive, but rather form a hierarchy: ordinal data support nominal (equality) comparisons, while quantitative data support ordinal (rank-order) comparisons.
Moreover, these data types do not provide a fixed categorization. For example, just because a data field is represented using a number doesn’t mean we have to treat it as a quantitative type! We might interpret a set of ages (10 years old, 20 years old, etc.) as nominal (underage or overage), ordinal (grouped by year), or quantitative (calculate average age).

Fundamental Charts

Widely adopted, effective, useful.
Solve very large percentage of vis problems.
Training ground for more sophisticated graphs.

Bar Chart

📊 Definition

Visualizes how a quantity distributes across categories

When to Use

Compare values across groups
Show rankings or order
Display part-to-whole relationships

Key Features

Length encodes value
Categories on one axis
Best for 5-20 categories

Line Chart

📈 Definition

Shows how quantities change in relation to another variable (typically time)

When to Use

Track trends over time
Compare multiple time series
Identify patterns and cycles

Key Features

Position encodes value
Lines connect data points
Emphasizes continuity

Scatter Plot

🔵 Definition

Shows how one quantity relates to another quantity

When to Use

Show correlations
Identify clusters or outliers
Compare distributions

Key Features

X/Y position encode values
Each point = one observation
Reveals patterns in data

Matrix

🔲 Definition

Shows how quantities distribute across two categories

When to Use

Show relationships between categories
Display correlation matrices
Visualize adjacency/similarity

Key Features

Color/size encode values
Row/column structure
Compact representation

Symbol Map

📍 Definition

Shows how quantities distribute across spatial coordinates

When to Use

Display geographic data
Show spatial distributions
Compare locations

Key Features

Position = location
Size/color = magnitude
Geographic context

Fundamental Graphs Summary

Scatter Plots + Faceting (without)

Scatter Plots + Faceting (with)

Tidy Data

Goal: organizing data to make visualization easier

Link to paper

Tidy Data

Tidy Data: Definition

In tidy data:

Each variable forms a column
Each observation forms a row
Each type of observational unit forms a table

Tidy Data: Example #1

Tidy Data

Tidy Data: Example #2

Tidy Data

Graphical Encoding

Every visualization can be described in terms of:

its basic graphical components
mapping strategy between data and graphics
more precisely, a set of mappings between:
- data items — visual marks
- data attributes — visual channels

Graphical Marks

Visual Encoding Channels

Visualization Decoding

Marks — Data Items
Channels — Data Attributes

Examples

NYT link

Expressiveness Principle

The visual representation should express the type of information that exists in the data.

Ordered data should not appear as unordered.
Unordered data should not appear as ordered.

Effectiveness Principle

Relevance of information should match the effectiveness of the channels used.

Represent important information with more effective channels

Effectiveness Effect

Summary

Visual Encoding/Decoding
Graphical Marks and Channels
Expressiveness and Effectiveness
Channels Appropriateness and Ranking
Evaluation and Design
Contextual Components
- Labels, legends and annotations
- Axes, grids and trend lines

Neo: Interactive Confusion Matrices

Great example of research in VisML!

video link