Fundamental Graphs and Data Transformation

CS-GY 6313 - Information Visualization - Fall 2025

Claudio Silva

NYU Tandon School of Engineering

2025-09-12

Today’s Agenda

Learning Objectives: - Understand the two-step visualization process - Master the five fundamental graph types - Apply expressiveness and effectiveness principles - Learn essential data transformation techniques - Recognize the impact of scales and axes choices

The Visualization Process

The fundamental question: How do I visualize this data?
  • Step 1: What to visualize? (Data selection & transformation)
  • Step 2: How to visualize? (Visual encoding & design)

Step 1: What to Visualize?

Data Selection & Transformation:

  • Which attributes matter for your question?
  • What level of detail is appropriate?
  • How should data be aggregated or filtered?
  • What derived attributes might be useful?

Example: Domain questions to data transformation

Step 2: How to Visualize?

Visual Encoding & Design:

  • Which visual channels best represent your data?
  • How do you map data attributes to visual properties?
  • What design choices enhance clarity?
  • How do you avoid misleading representations?

Example: Same data, different visual encodings

The Five Fundamental Graphs

Bar Chart Line Chart

Scatter Plot

Matrix Symbol Map

Bar Chart: Categorical Comparisons

Distribution across categories

Purpose: Compare quantities across categories

Data Types: - Categorical/Ordinal + Quantitative - Example: Sales by product category

Best For: - Rankings and comparisons - Part-to-whole relationships

Line Chart: Trends Over Time

Change over continuous dimension

Purpose: Show trends and changes over time

Data Types: - Temporal + Quantitative - Example: Stock prices over months

Best For: - Trends and patterns - Multiple series comparison

Scatter Plot: Relationships

Correlation between two quantitative variables

Purpose: Explore relationships between variables

Data Types: - Quantitative + Quantitative - Example: Height vs. weight

Best For: - Correlation analysis - Outlier detection - Pattern recognition

Matrix (Heatmap): Two-Way Comparisons

Quantity across two categorical dimensions

Purpose: Compare across two categorical dimensions

Data Types: - Categorical + Categorical + Quantitative - Example: Sales by month and region

Best For: - Cross-tabulations - Correlation matrices - Dense data display

Symbol Map: Spatial Distribution

Geographic distribution of quantities

Purpose: Show spatial distribution of data

Data Types: - Spatial coordinates + Quantitative - Example: Population by city

Best For: - Geographic patterns - Location-based analysis - Spatial clustering

Principles: Expressiveness & Effectiveness

Expressiveness: Show all and only the facts in the data

Effectiveness: Information should be readily perceived

Marks and channels determine effectiveness

Expressiveness Violations

Line chart with categorical data implies false ordering

Problem: Ordered visual channel (line) with unordered data

Bar chart ordering suggests ranking where none exists

Problem: Arbitrary ordering implies non-existent relationship

Effectiveness: A Quick Experiment

Compare these different encodings

Which value is larger?

Results show position is more effective

Which value is larger?

Result: Length comparison is faster and more accurate than color comparison

Channel Effectiveness Rankings

Ranking of visual channels by data type

Applying Channel Rankings

More Effective:

Bar chart using position enables accurate comparison

Position encoding enables accurate comparison

Less Effective:

Pie chart using area/angle makes comparison harder

Area and angle are harder to compare accurately

Data Transformation: The First Step

Data transformation pipeline
  • Raw data rarely answers your questions directly
  • Transformation creates the information you need
  • Different transformations reveal different insights
  • This step is often more important than visual design choices

Aggregation: Summarizing for Insight

Raw Data:

Sales transactions: 50,000 records
Date, Product, Amount, Customer...

Questions: - Which products sell best? - How do sales vary by month? - What’s the geographic distribution?

Aggregated Data:

By Product: Sum(Amount)
By Month: Avg(Amount) 
By Region: Count(Transactions)

Example: Flight data transformation

Common Transformation Types

Temporal Aggregation: - Year → Month → Week → Day → Hour

Spatial Aggregation: - Country → State → City → Neighborhood

Binning: - Age: 18-25, 26-35, 36-45, etc. - Income: Low, Medium, High

Derived Attributes: - Growth rate from sequential values - Ratios and percentages

Interactive Exercise: Transformation Choices

Dataset: E-commerce transactions - Product, Date, Amount, Customer

Questions to explore: 1. How do you identify seasonal trends? 2. Which customer segments are most valuable? 3. How has product popularity changed over time?

Your turn: What transformations would you apply for each question?

  • Temporal aggregation?
  • Customer segmentation?
  • Product ranking over time?

Scales and Axes: The Foundation

Scale: A function mapping data domain to visual range

Data Domain → Scale Function → Visual Range

  • Linear scales: Equal data differences = equal visual differences
  • Logarithmic scales: Equal ratios = equal visual differences
  • Ordinal scales: Preserve order, not magnitude

Linear vs. Logarithmic Scales

Linear Scale Example

Linear Scale: - Absolute differences - Additive changes - Most common choice

Log Scale Example

Logarithmic Scale: - Relative differences - Multiplicative changes - Wide data ranges

The Zero Baseline Rule

Truncated Axis Example

Truncated Axis: Exaggerates small differences

Zero Baseline Example

Zero Baseline: Accurate magnitude representation

When to Break the Rules

Log scales instead of zero baseline: - Data spans multiple orders of magnitude - Ratios matter more than absolute values

Truncated axes for line charts: - Small changes in large values - When trend matters more than magnitude

Appropriate Truncation Example

Context determines when breaking rules is acceptable

Design Exercise: Scale Choices

Scenario: Visualizing country populations

Data range: 1,000 (Vatican) to 1.4 billion (China)

Questions: 1. What scale would you choose? 2. How would you handle the extreme range? 3. What alternatives might you consider?

Consider: Log scale vs. filtering vs. grouping

Putting It All Together

The Visualization Design Process:

  1. Understand your question
  2. Transform data appropriately
  3. Choose effective visual encodings
  4. Select appropriate scales
  5. Test and iterate

Workflow: Question → Transform → Encode → Scale → Iterate

Design Process Workflow

Common Pitfalls to Avoid

  • Skipping data exploration: Visualize without understanding the data
  • Chart junk: Adding visual elements that don’t encode information
  • Color overuse: Using color when position would be more effective
  • Ignoring scale effects: Not considering how scale choices affect perception
  • 3D when 2D suffices: Adding dimensions that don’t encode information

Best Practices Summary

Expressiveness: Match visual properties to data properties

Effectiveness: Use the most effective encoding for your most important data

Transformation: Prepare data to answer your specific questions

Scales: Choose scales that honestly represent relationships

Iteration: Test your designs with real users when possible

Interactive Quiz

Question 1: For comparing sales across product categories, which encoding is most effective?

  1. Color saturation
  2. Bar length
  3. Symbol size
  4. Line style

Answer: B) Bar length (position along common scale)

Why? Position is the most effective visual channel for quantitative comparison.

Interactive Quiz

Question 2: You have website traffic data spanning 5 years. For showing long-term growth trends, you should:

  1. Use a zero baseline always
  2. Use a log scale if growth is exponential
  3. Show only the most recent year
  4. Use a pie chart for each year

Answer: B) Use a log scale if growth is exponential

Why? Log scales reveal multiplicative relationships and growth rates.

Next Steps

For next class: - Read Munzner Chapter 7 (Arrange Tables) - Practice with the fundamental charts using your own data - Complete Lab Exercise: Build all five chart types with sample dataset

Looking ahead: - Interactive visualization techniques - Multi-dimensional data representation - Advanced transformation methods

Key Takeaways

  • Two-step process: What to show, then how to show it
  • Five fundamental charts solve most visualization problems
  • Expressiveness and effectiveness guide design decisions
  • Data transformation is often more important than visual design
  • Scale choices dramatically affect perception and interpretation

Questions & Discussion

Think about: - What visualization challenges do you face in your work/research? - How might these principles apply to your domain? - What questions do you have about applying these techniques?

Next class: Interactive visualization techniques and advanced encodings

Thank you