Information Visualization Fundamentals

CS-GY 9223 - Visualization for Machine Learning - Fall 2025

Claudio Silva

NYU Tandon School of Engineering

2025-09-08

What is Information Visualization? Why Use It?

“The use of computer-supported, interactive, visual representations of abstract data to amplify cognition.”

Key Concepts

  • 💻 Computer-Based
  • 👁️ Visual Representation
  • 📊 Abstract Data
  • 🔄 Interactive
  • 🧠 Amplify Cognition

Abstract Data

Data with no obvious/natural visual representation

Abstract Data

Data with no obvious/natural visual representation

Interactive

Users can change what is visualized and how it is visualized.

TaxiVis - Ferreira et al., IEEE TVCG 2013

Amplify Cognition

  • Solve problems with data with less effort, in a shorter time, and more accurately.

  • … or even be able to do things it would be impossible to do without a computer and a graphical representation.

Cognitive artifacts: tools that help us think!

  • Try to multiply 34 x 72 using exclusively your mind …
  • … now do it again using pen and paper.

Why is it easier?

🧠 Memory in Head

  • Limited capacity (~7 items)
  • Decays quickly
  • Requires effort to maintain
  • Error-prone

📝 Memory in World

  • Unlimited storage
  • Persistent information
  • External processing
  • Reliable reference

Key Insight: Visualization offloads cognitive work from your mind to the visual system

Let’s play the “game of 15” …

  • The “pieces” for the game are the nine digits: 1, 2, 3, 4, 5, 6, 7, 8, 9. Each player takes a digit in turn. Once a digit is taken, it cannot be used by the other player. The first player to get three digits that sum to 15 wins.
  • Here is a sample game: Player A takes 8. Player B takes 2. Then A takes 4, and B takes 3. A takes 5.
  • Question 1: Suppose you are now to step in and play for B. What move would you make?

Let’s play a different game: tic-tac-toe

Players alternately place an O or a X in one of nine spaces arranged in a rectangular array. Once a space has been taken, it cannot be changed by either player. The first player to get three symbols in a straight line wins. Suppose player A is X and B is O, and the game has reached the state on the right.

Question 2: Suppose you are now to step in and play an O for B. What move would you make?

Problem Isomorphs

Herbert Simon’s Insight

  • The two problems are isomorphic - structurally identical!
  • Same problem, different representation
  • The tic-tac-toe representation makes the solution obvious
  • Key insight: The right representation can dramatically simplify problem solving

Learn more about Herbert Simon

Herbert A. Simon (1916-2001) - Nobel Prize in Economics (1978), Turing Award (1975)

Pioneer in cognitive psychology, artificial intelligence, and decision-making

Why use visualization?

📢 Explanatory

Show insights to others

  • Present findings
  • Tell data stories
  • Communicate clearly

🔍 Exploratory

Find patterns yourself

  • Discover unknowns
  • Generate hypotheses
  • Understand data

✅ Confirmatory

Validate hypotheses

  • Test assumptions
  • Verify patterns
  • Support decisions

The Power of Visualization: Discovery

John Snow’s Cholera Map (1854)

  • Mapped cholera deaths in London
  • Revealed cluster around Broad Street water pump
  • Visual evidence stopped the outbreak

Tip

Takeaway: Visualization is a powerful tool for discovery and finding patterns invisible in raw data.

John Snow's Cholera Map

John Snow’s Cholera Map

The Power of Visualization: Storytelling

Charles Minard’s Map of Napoleon’s March (1869)

  • Widely considered one of the best statistical graphics ever created
  • Shows six variables simultaneously:
    • Army size
    • Location & direction
    • Temperature
    • Distance & time

Tip

Takeaway: Visualization is a powerful medium for dense, high-impact storytelling.

Charles Minard's Map of Napoleon's March

Charles Minard’s Map of Napoleon’s March

The Power of Visualization: Exploration

NYT: “How Y’all, Youse and You Guys Talk” (2013)

  • Modern, interactive visualization
  • Built with web technologies (like D3.js!)
  • Allows personal exploration of dialect data
  • Engages users through personalized results

Try it yourself: NYT Dialect Quiz

Tip

Takeaway: Visualization can be a dynamic interface for personal data exploration.

New York Times Dialect Map

NYT Dialect Map

Great Explanatory Visualizations

  • NYT: https://flowingdata.com/tag/new-york-times/

  • Washington Post: http://postgraphics.tumblr.com/

  • Gregor Aisch: https://driven-by-data.net/

  • Nicky Case/Explorable Explanations: http://explorabl.es/

  • Polygraph: http://polygraph.cool/ & https://pudding.cool/

  • ProPublica: https://www.propublica.org/

Why use a graphical representation?

  • Large parts of our brain are devoted to spatial processing

Why use a computer to visualize data?

📊 Scale

Handle millions of data points efficiently

🔄 Interactivity

Explore data dynamically with zoom, filter, details

⚡ Real-time

Live updates as data changes continuously

🧮 Computation

Complex calculations and transformations on-the-fly

Why use interaction?

  • Each visualization can only answer a subset of questions.

  • With interaction the user can change what is visualized and how to answer a multitude of questions.

  • Also one cannot visualize everything at once.

How do you assess the quality of a visualization?

  • Isn’t it subjective? Some people like A, whereas some others like B.
  • Some visual representations are better than others at solving particular problems …

Digression: Graphical Perception

Graphical Perception Experiment

Graphical Perception Results

Designing effective visualizations requires

  • 🎨 Knowing the design space
    What visual encodings and techniques are available?

  • ⚖️ Being able to compare solutions
    Which design best serves the intended purpose?

  • 👁️ Understanding human perception
    How do people actually see and interpret visual information?

Data Types

  • The first ingredient in effective visualization is the input data. Data values can represent different forms of measurement.

  • What kinds of comparisons do those measurements support?

  • What kinds of visual encodings then support those comparisons?

Nominal (N) or Categorical (C)

  • Nominal data — also called categorical data — consist of category names.

  • With nominal data we can compare the equality of values: is value A the same or different than value B? (A = B), supporting statements like “A is equal to B” or “A is not equal to B”.

  • When visualizing nominal data we should readily perceive if values are the same or different: position, color hue (blue, red, green, etc.), and shape are all reasonable options.

Ordinal (O)

  • Ordinal data consist of values that have a specific ordering.

  • With ordinal data we can compare the rank-ordering of values: does value A come before or after value B? (A < B), supporting statements like “A is less than B” or “A is greater than B”.

  • When visualizing ordinal data, we should perceive a sense of rank-order. Position, size, or color value (brightness) might be appropriate, whereas color hue (which is not perceptually ordered) would be less appropriate.

Quantitative (Q)

  • With quantitative data we can measure numerical differences among values.

  • There are multiple sub-types of quantitative data:

    • For interval data we can measure the distance between points: (A - B).
    • For ratio data we can also measure proportions or scale factors: (A / B).
  • Quantitative values can be visualized using position, size, or color value, among other channels. An axis with a zero baseline is essential for proportional comparisons of ratio values, but can be safely omitted for interval comparisons.

Temporal (T)

  • Temporal values measure time points or intervals. This type is a special case of quantitative values (timestamps) with rich semantics and conventions (i.e., the Gregorian calendar).

  • Example temporal values include date strings such as “2019-01-04” and “Jan 04 2019”, as well as standardized date-times such as the ISO date-time format: “2019-01-04T17:50:35.643Z”. There are no temporal values in our global development dataset above, as the year field is encoded as an integer.

Spatial (S)

  • Data that can be shown in a map

  • Also known as geospatial data, refers to information that identifies the geographic location and characteristics of natural or constructed features and boundaries on the Earth. https://atlan.com/spatial-data/

Data Types Summary

  • These data types are not mutually exclusive, but rather form a hierarchy: ordinal data support nominal (equality) comparisons, while quantitative data support ordinal (rank-order) comparisons.

  • Moreover, these data types do not provide a fixed categorization. For example, just because a data field is represented using a number doesn’t mean we have to treat it as a quantitative type! We might interpret a set of ages (10 years old, 20 years old, etc.) as nominal (underage or overage), ordinal (grouped by year), or quantitative (calculate average age).

Fundamental Charts

  • Widely adopted, effective, useful.

  • Solve very large percentage of vis problems.

  • Training ground for more sophisticated graphs.

Bar Chart

📊 Definition

Visualizes how a quantity distributes across categories

When to Use

  • Compare values across groups
  • Show rankings or order
  • Display part-to-whole relationships

Key Features

  • Length encodes value
  • Categories on one axis
  • Best for 5-20 categories

Example: Sales by Product Category

Line Chart

📈 Definition

Shows how quantities change in relation to another variable (typically time)

When to Use

  • Track trends over time
  • Compare multiple time series
  • Identify patterns and cycles

Key Features

  • Position encodes value
  • Lines connect data points
  • Emphasizes continuity

Example: Stock Price Over Time

Scatter Plot

🔵 Definition

Shows how one quantity relates to another quantity

When to Use

  • Show correlations
  • Identify clusters or outliers
  • Compare distributions

Key Features

  • X/Y position encode values
  • Each point = one observation
  • Reveals patterns in data

Example: Height vs Weight Correlation

Matrix

🔲 Definition

Shows how quantities distribute across two categories

When to Use

  • Show relationships between categories
  • Display correlation matrices
  • Visualize adjacency/similarity

Key Features

  • Color/size encode values
  • Row/column structure
  • Compact representation

Example: Correlation Matrix

Symbol Map

📍 Definition

Shows how quantities distribute across spatial coordinates

When to Use

  • Display geographic data
  • Show spatial distributions
  • Compare locations

Key Features

  • Position = location
  • Size/color = magnitude
  • Geographic context

Example: City Population Distribution

Fundamental Graphs Summary

Scatter Plots + Faceting (without)

Scatter Plots + Faceting (with)

Tidy Data

Goal: organizing data to make visualization easier

Link to paper

Tidy Data

Tidy Data

Tidy Data

Tidy Data: Definition

In tidy data:

  • Each variable forms a column

  • Each observation forms a row

  • Each type of observational unit forms a table

Tidy Data: Example #1

Tidy Data

Tidy Data: Example #2

Tidy Data

Graphical Encoding

Every visualization can be described in terms of:

  • its basic graphical components

  • mapping strategy between data and graphics

  • more precisely, a set of mappings between:

    • data items — visual marks
    • data attributes — visual channels

Graphical Marks

Visual Encoding Channels

Visualization Decoding

  • Marks — Data Items

  • Channels — Data Attributes

Examples

NYT link

Expressiveness Principle

The visual representation should express the type of information that exists in the data.

  • Ordered data should not appear as unordered.
  • Unordered data should not appear as ordered.

Effectiveness Principle

Relevance of information should match the effectiveness of the channels used.

  • Represent important information with more effective channels

Effectiveness Effect

Summary

  • Visual Encoding/Decoding
  • Graphical Marks and Channels
  • Expressiveness and Effectiveness
  • Channels Appropriateness and Ranking
  • Evaluation and Design
  • Contextual Components
    • Labels, legends and annotations
    • Axes, grids and trend lines

Neo: Interactive Confusion Matrices

Great example of research in VisML!

video link