CS-GY 6313 Information Visualization
New York University
2025-09-12
This lecture incorporates materials from:
Items (also called records, cases, objects): - Definition: Objects/entities you want to visualize - Examples:
Attributes (also called variables, dimensions, fields): - Definition: Properties of these objects/entities - Examples:
From “Visualization Analysis and Design”, Chapter 2
Fill in examples for each type:
From “On the Theory of Scales of Measurement” in Science
Complete the table with examples:
Type | Operations | Your Examples |
---|---|---|
Nominal | =, ≠ | |
Ordinal | =, ≠, <, > | |
Quantitative | =, ≠, <, >, +, - |
One domain question can lead to many data questions:
“Low-Level Components of Analytic Activity in Information Visualization”
Value Tasks
Pattern Tasks
“The eyes have it: A task by data type taxonomy for information visualization”
Level | Description | Operations | Examples |
---|---|---|---|
Nominal | Categories, no order | =, ≠ | Colors, names, IDs |
Ordinal | Ordered categories | =, ≠, <, > | Rankings, grades |
Interval | Numeric, no true zero | =, ≠, <, >, - | Temperature (°C), dates |
Ratio | Numeric, true zero | =, ≠, <, >, -, % | Height, weight, count |
Data Model - 32.5, 54.0, -17.3, … - Floating point numbers
Conceptual Model - Temperature (°C)
Data Type - Burned vs. Not-Burned (N) - Hot, Warm, Cold (O) - Temperature Value (Q-interval)
Not a strict distinction. The same variable may be treated either way depending on the task.
SELECT
): select a set of columnsWHERE
): remove unwanted rowsORDER BY
): order recordsGROUP BY
, SUM
, MIN
, MAX
): partition rows into groups + summarizeJOIN
, UNION
): integrate data from multiple tablesInput:
day | stock | price |
---|---|---|
10/3 | AMZN | 957.10 |
10/3 | MSFT | 74.26 |
10/4 | AMZN | 965.45 |
10/4 | MSFT | 74.69 |
→
Output:
day | stock |
---|---|
10/3 | AMZN |
10/3 | MSFT |
10/4 | AMZN |
10/4 | MSFT |
Input:
day | stock | price |
---|---|---|
10/3 | AMZN | 957.10 |
10/3 | MSFT | 74.26 |
10/4 | AMZN | 965.45 |
10/4 | MSFT | 74.69 |
→
Output:
day | stock | price |
---|---|---|
10/3 | AMZN | 957.10 |
10/4 | AMZN | 965.45 |
Input:
day | stock | price |
---|---|---|
10/3 | AMZN | 957.10 |
10/3 | MSFT | 74.26 |
10/4 | AMZN | 965.45 |
10/4 | MSFT | 74.69 |
→
Output:
stock | min(price) |
---|---|
AMZN | 957.10 |
MSFT | 74.26 |
Goal: Structuring data to make visualization and analysis easier
In tidy data:
“I spend more than half of my time integrating, cleansing and transforming data without doing any actual analysis. Most of the time I’m lucky if I get to do any ‘analysis’ at all.”
— Anonymous Data Scientist (2012 interview study)
“The first sign that a visualization is good is that it shows you a problem in your data. Every successful visualization that I’ve been involved with has had this stage where you realize, ‘Oh my God, this data is not what I thought it would be!’ So already, you’ve discovered something.”
— Martin Wattenberg (ACM Queue ’09)
# Examples of derived attributes
df['profit_margin'] = df['profit'] / df['revenue']
df['year'] = pd.to_datetime(df['date']).dt.year
df['is_profitable'] = df['profit'] > 0
df['age_group'] = pd.cut(df['age'],
bins=[0, 18, 65, 100],
labels=['child', 'adult', 'senior'])
Raw taxi trip data with these fields:
Work in small groups to design a transformation pipeline:
// JSON: JavaScript Object Notation
[
{"year":1850,"age":0,"marital_status":0,"sex":1,"people":1483789},
{"year":1850,"age":5,"marital_status":0,"sex":1,"people":1411067}
]