CS-GY 6313 Information Visualization
New York University
2025-09-12
This lecture incorporates materials from:
Items (also called records, cases, objects): - Definition: Objects/entities you want to visualize - Examples:
Attributes (also called variables, features, fields): - Definition: Properties of these objects/entities - Examples:
From “Visualization Analysis and Design”, Chapter 2
Fill in examples for each type:
Complete the table with examples:
Level | Description | Operations | Examples |
---|---|---|---|
Nominal | Categories, no order | =, ≠ | |
Ordinal | Ordered categories | =, ≠, <, > | |
Interval (Quantitative) | Numeric, no true zero | =, ≠, <, >, - | |
Ratio (Quantitative) | Numeric, true zero | =, ≠, <, >, -, % |
Types provide guidance in selecting appropriate graphical encoding strategies…
“The eyes have it: A task by data type taxonomy for information visualization”
Not a strict distinction. The same variable may be treated either way depending on the task. For example - Weather if described as - Hot/Cold(N) vs Temparature Values(Q)
Data models ask “How is it stored?” Conceptual models ask “What does it mean?”
Example: Temperature Data
Key Point: Same data, different analytical possibilities depending on measurement scale
One domain question can lead to many data questions:
“Low-Level Components of Analytic Activity in Information Visualization”
Value Tasks
Pattern Tasks
“I spend more than half of my time integrating, cleansing and transforming data without doing any actual analysis. Most of the time I’m lucky if I get to do any ‘analysis’ at all.”
— Anonymous Data Scientist (2012 interview study)
“The first sign that a visualization is good is that it shows you a problem in your data. Every successful visualization that I’ve been involved with has had this stage where you realize, ‘Oh my God, this data is not what I thought it would be!’ So already, you’ve discovered something.”
— Martin Wattenberg (ACM Queue ’09)
SELECT
): select a set of columnsWHERE
): remove unwanted rowsORDER BY
): order recordsGROUP BY
, SUM
, MIN
, MAX
): partition rows into groups + summarizeJOIN
, UNION
): integrate data from multiple tablesInput:
day | stock | price |
---|---|---|
10/3 | AMZN | 957.10 |
10/3 | MSFT | 74.26 |
10/4 | AMZN | 965.45 |
10/4 | MSFT | 74.69 |
→
Output:
day | stock |
---|---|
10/3 | AMZN |
10/3 | MSFT |
10/4 | AMZN |
10/4 | MSFT |
Input:
day | stock | price |
---|---|---|
10/3 | AMZN | 957.10 |
10/3 | MSFT | 74.26 |
10/4 | AMZN | 965.45 |
10/4 | MSFT | 74.69 |
→
Output:
day | stock | price |
---|---|---|
10/3 | AMZN | 957.10 |
10/4 | AMZN | 965.45 |
Input:
day | stock | price |
---|---|---|
10/3 | AMZN | 957.10 |
10/3 | MSFT | 74.26 |
10/4 | AMZN | 965.45 |
10/4 | MSFT | 74.69 |
→
Output:
stock | min(price) |
---|---|
AMZN | 957.10 |
MSFT | 74.26 |
# Examples of derived attributes
df['profit_margin'] = df['profit'] / df['revenue']
df['year'] = pd.to_datetime(df['date']).dt.year
df['is_profitable'] = df['profit'] > 0
df['age_group'] = pd.cut(df['age'],
bins=[0, 18, 65, 100],
labels=['child', 'adult', 'senior'])
Goal: Structuring data to make visualization and analysis easier
In tidy data:
Raw taxi trip data with these fields:
Work in small groups to design a transformation pipeline:
// JSON: JavaScript Object Notation
[
{"year":1850,"age":0,"marital_status":0,"sex":1,"people":1483789},
{"year":1850,"age":5,"marital_status":0,"sex":1,"people":1411067}
]