CS-GY 9223 - Visualization for Machine Learning - Fall 2025
NYU Tandon School of Engineering
2025-09-08
“The use of computer-supported, interactive, visual representations of abstract data to amplify cognition.”
Slides based on material from Prof. Enrico Bertini
Slides based on material from Prof. Enrico Bertini
Data with no obvious/natural visual representation
Slides based on material from Prof. Enrico Bertini
Data with no obvious/natural visual representation
Slides based on material from Prof. Enrico Bertini
Users can change what is visualized and how it is visualized.
TaxiVis - Ferreira et al., IEEE TVCG 2013
Slides based on material from Prof. Enrico Bertini
Solve problems with data with less effort, in a shorter time, and more accurately.
… or even be able to do things it would be impossible to do without a computer and a graphical representation.
Slides based on material from Prof. Enrico Bertini
Slides based on material from Prof. Enrico Bertini
Key Insight: Visualization offloads cognitive work from your mind to the visual system
Slides based on material from Prof. Enrico Bertini
Slides based on material from Prof. Enrico Bertini
Players alternately place an O or a X in one of nine spaces arranged in a rectangular array. Once a space has been taken, it cannot be changed by either player. The first player to get three symbols in a straight line wins. Suppose player A is X and B is O, and the game has reached the state on the right.
Question 2: Suppose you are now to step in and play an O for B. What move would you make?
Slides based on material from Prof. Enrico Bertini
Pioneer in cognitive psychology, artificial intelligence, and decision-making
Slides based on material from Prof. Enrico Bertini
Show insights to others
Find patterns yourself
Validate hypotheses
Slides based on material from Prof. Enrico Bertini
Tip
Takeaway: Visualization is a powerful tool for discovery and finding patterns invisible in raw data.
Tip
Takeaway: Visualization is a powerful medium for dense, high-impact storytelling.
Try it yourself: NYT Dialect Quiz
Tip
Takeaway: Visualization can be a dynamic interface for personal data exploration.
NYT: https://flowingdata.com/tag/new-york-times/
Washington Post: http://postgraphics.tumblr.com/
Gregor Aisch: https://driven-by-data.net/
Nicky Case/Explorable Explanations: http://explorabl.es/
Polygraph: http://polygraph.cool/ & https://pudding.cool/
ProPublica: https://www.propublica.org/
Slides based on material from Prof. Enrico Bertini
Via Wikipedia, By OpenStax College - Anatomy & Physiology, Connexions Web site., Jun 19, 2013., CC BY 3.0
Handle millions of data points efficiently
Explore data dynamically with zoom, filter, details
Live updates as data changes continuously
Complex calculations and transformations on-the-fly
Slides based on material from Prof. Enrico Bertini
Each visualization can only answer a subset of questions.
With interaction the user can change what is visualized and how to answer a multitude of questions.
Also one cannot visualize everything at once.
Slides based on material from Prof. Enrico Bertini
Slides based on material from Prof. Enrico Bertini
Slides based on material from Prof. Enrico Bertini
Slides based on material from Prof. Enrico Bertini
Slides based on material from Prof. Enrico Bertini
🎨 Knowing the design space
What visual encodings and techniques are available?
⚖️ Being able to compare solutions
Which design best serves the intended purpose?
👁️ Understanding human perception
How do people actually see and interpret visual information?
Slides based on material from Prof. Enrico Bertini
The first ingredient in effective visualization is the input data. Data values can represent different forms of measurement.
What kinds of comparisons do those measurements support?
What kinds of visual encodings then support those comparisons?
Slides based on material from Prof. Jeffrey Heer link
Nominal data — also called categorical data — consist of category names.
With nominal data we can compare the equality of values: is value A the same or different than value B? (A = B), supporting statements like “A is equal to B” or “A is not equal to B”.
When visualizing nominal data we should readily perceive if values are the same or different: position, color hue (blue, red, green, etc.), and shape are all reasonable options.
Slides based on material from Prof. Jeffrey Heer link
Ordinal data consist of values that have a specific ordering.
With ordinal data we can compare the rank-ordering of values: does value A come before or after value B? (A < B), supporting statements like “A is less than B” or “A is greater than B”.
When visualizing ordinal data, we should perceive a sense of rank-order. Position, size, or color value (brightness) might be appropriate, whereas color hue (which is not perceptually ordered) would be less appropriate.
Slides based on material from Prof. Jeffrey Heer link
With quantitative data we can measure numerical differences among values.
There are multiple sub-types of quantitative data:
Quantitative values can be visualized using position, size, or color value, among other channels. An axis with a zero baseline is essential for proportional comparisons of ratio values, but can be safely omitted for interval comparisons.
Slides based on material from Prof. Jeffrey Heer link
Temporal values measure time points or intervals. This type is a special case of quantitative values (timestamps) with rich semantics and conventions (i.e., the Gregorian calendar).
Example temporal values include date strings such as “2019-01-04” and “Jan 04 2019”, as well as standardized date-times such as the ISO date-time format: “2019-01-04T17:50:35.643Z”. There are no temporal values in our global development dataset above, as the year field is encoded as an integer.
Slides based on material from Prof. Jeffrey Heer link
Data that can be shown in a map
Also known as geospatial data, refers to information that identifies the geographic location and characteristics of natural or constructed features and boundaries on the Earth. https://atlan.com/spatial-data/
These data types are not mutually exclusive, but rather form a hierarchy: ordinal data support nominal (equality) comparisons, while quantitative data support ordinal (rank-order) comparisons.
Moreover, these data types do not provide a fixed categorization. For example, just because a data field is represented using a number doesn’t mean we have to treat it as a quantitative type! We might interpret a set of ages (10 years old, 20 years old, etc.) as nominal (underage or overage), ordinal (grouped by year), or quantitative (calculate average age).
Slides based on material from Prof. Jeffrey Heer link
Widely adopted, effective, useful.
Solve very large percentage of vis problems.
Training ground for more sophisticated graphs.
Slides based on material from Prof. Enrico Bertini
Visualizes how a quantity distributes across categories
Slides based on material from Prof. Enrico Bertini
Shows how quantities change in relation to another variable (typically time)
Slides based on material from Prof. Enrico Bertini
Shows how one quantity relates to another quantity
Slides based on material from Prof. Enrico Bertini
Shows how quantities distribute across two categories
Slides based on material from Prof. Enrico Bertini
Shows how quantities distribute across spatial coordinates
Slides based on material from Prof. Enrico Bertini
Slides based on material from Prof. Enrico Bertini
Slides based on material from Prof. Enrico Bertini
Slides based on material from Prof. Enrico Bertini
Goal: organizing data to make visualization easier
Slides based on material from Hadley Wickham
Slides based on material from Hadley Wickham
Slides based on material from Hadley Wickham
Slides based on material from Hadley Wickham
In tidy data:
Each variable forms a column
Each observation forms a row
Each type of observational unit forms a table
Slides based on material from Hadley Wickham
Slides based on material from Hadley Wickham
Slides based on material from Hadley Wickham
Slides based on material from Hadley Wickham
Slides based on material from Hadley Wickham
Every visualization can be described in terms of:
its basic graphical components
mapping strategy between data and graphics
more precisely, a set of mappings between:
Slides based on material from Hadley Wickham
Slides based on material from Prof. Enrico Bertini
Slides based on material from Prof. Enrico Bertini
Marks — Data Items
Channels — Data Attributes
Slides based on material from Prof. Enrico Bertini
Slides based on material from Prof. Enrico Bertini
The visual representation should express the type of information that exists in the data.
Relevance of information should match the effectiveness of the channels used.
Slides based on material from Prof. Enrico Bertini
Great example of research in VisML!