Urban Visualization I

Flows, Time & Interactivity (2D + Time) - CS-GY 6313 - Fall 2025

Claudio Silva

2025-10-31

From Static Maps to Dynamic Urban Systems

The Yellow Blob Problem

What happens when we apply static techniques to massive urban datasets?

This is 140 million NYC taxi trips visualized as a static heatmap.

What can you learn from this?

Nothing. It’s a “yellow blob.”

Static heatmap of 140 million NYC taxi trips

Today’s Goal

Learn to visualize urban dynamics—movement and time—using interactive, linked views

  • Challenge: Urban data is massive, dense, and dynamic
  • Solution: Turn visualization into an exploration tool
  • Case Study: TaxiVis—a system for exploring NYC taxi flows

We’ll learn how to move from this “yellow blob” to meaningful insights about urban mobility patterns.

Part 1: The Urban Context

Why This Matters

The Urban Challenge:

  • Over 50% of world’s population in cities
  • +2.5 billion more by 2050
  • Cities = centers of innovation
  • But also: sprawl, pollution, inequality

How do we make better decisions about urban development?

The Data Opportunity:

  • We can now collect, store, and open massive urban datasets
  • Benefits:
    • Government: Operations & planning
    • Science: Discovery
    • Residents: Participation
    • Industry: Innovation

What Makes Urban Data Unique?

Three Interacting Components:

  1. Residents - People and their behavior
  2. Infrastructure - Physical systems and policies
  3. Environment - Natural and built surroundings

To understand a city, we must explore how these interact over space and time.

Four Key Characteristics:

  1. Scale
    • Human-level to metro-level
    • Millions of individual events
  2. Density
    • Massive overplotting
    • Traditional techniques fail
  3. Complexity
    • Interconnected systems
    • Transit, traffic, social networks
  4. Dynamism
    • Cities defined by time & movement
    • Static snapshots miss the essence

Hägerstrand’s Space-Time Cube (1970)

The Classic Framework for Movement Analysis

  • X, Y dimensions: Geographic space
  • Z dimension: Time
  • Each person’s life: a line through the cube

The Urban Challenge:

Understanding millions of these paths, all interacting simultaneously

Hägerstrand’s Space-Time Cube showing individual trajectories

Part 2: Why Traditional Analysis Fails

The Old Workflow (Confirmatory Data Analysis)

Traditional analysis pipeline

The Process:

  1. Domain experts formulate hypotheses
  2. Data scientists select data
  3. Run analyses (SQL, R, Python)
  4. Domain experts inspect results
  5. Repeat…

The Problems with Traditional Workflows

Cognitive overload - Generates overwhelming numbers of plots/tables

Batch-oriented - No exploration, just predetermined queries

Distances experts from data - Requires intermediary (data scientist)

Cannot scale - Modern data volumes overwhelm traditional tools

Result: The “yellow blob” problem

Even simple questions require generating dozens of individual plots, each manually programmed

The Problem with Current Tools

Current practice for domain experts:

  • Load small slices into R, MATLAB, Stata, ArcGIS, Excel
  • Write SQL queries or code to analyze subsets
  • Generate individual plots manually
  • Repeat for each new question

This is tedious, slow, and limiting

Why this fails:

❌ Tools can’t handle 170M trips

❌ Requires programming/database expertise

❌ Time-consuming and frustrating

❌ Distances experts from the data

❌ Hard to explore, compare, or follow up on patterns

Part 3: The Need for Interactivity

The Paradigm Shift

Static Visualization

Role: Presentation

  • Answers questions the designer anticipated
  • Fixed perspective
  • One-way communication

Limitation: Cannot handle 140 million data points

Interactive Visualization

Role: Exploration

  • Enables users to ask their own questions
  • Dynamic perspectives
  • “Dialogue with data”

Power: Query and filter to reveal patterns

The Visual Information Seeking Mantra

“Overview first, Zoom and Filter, then Details-on-Demand”

— Ben Shneiderman (1996)

1. Overview

Overview of all data

Start with the big picture (even if it’s a “yellow blob”)

2. Zoom & Filter

Filtered subset of data

Focus on items of interest

3. Details-on-Demand

Detailed information display

Get specifics when needed

Why Interactivity Matters

  1. Performance is critical
    • Even 500ms latency significantly reduces exploration, observations, and hypothesis generation
    • Sub-second response enables iterative analysis
  2. Perception over cognition
    • Well-designed systems let you see patterns rather than calculate them
    • Visual queries are faster than SQL
  3. Empowers domain experts
    • No programming required
    • Direct manipulation of visualizations
    • Experts can explore data themselves

Research shows that even a half-second delay dramatically impacts analysis quality.

Part 4: Enter TaxiVis

The NYC Taxi Dataset

The Data:

  • 13,000 taxis
  • 500,000 trips/day
  • 170 million trips/year

Each trip includes:

  • Pickup/dropoff locations & times
  • Distance traveled
  • Fare amount
  • Tip amount

What can taxi data tell us?

  • Economic activity and human behavior
  • Mobility patterns across the city
  • Response to major events (hurricanes, holidays)
  • Social inequalities in service

Challenge: How do we make sense of this data?

Taxi Patterns and Anomalies

Taxi activity patterns showing regularity and anomalies

  • Regular patterns: Thanksgiving, Christmas drops in activity
  • Anomalies: Hurricane Irene, Hurricane Sandy disruptions
  • Events: Five Boro Bike Tour (taxis disappeared along 6th Avenue)

Part 5: Design Requirements

What We Need (From Domain Experts)

Query Needs:

  1. Understand dynamics
    • “How do patterns vary over space and time?”
  2. Explore events
    • “What happened during Hurricane Sandy?”
  3. Compare regions
    • “Midtown vs. Harlem taxi frequency?”
  4. Study movement
    • “Where do people go from JFK?”

System Requirements:

  • Interactive - Sub-second response times
  • Expressive - Support complex spatio-temporal queries
  • Usable - No SQL or programming required
  • Scalable - Handle all 170M trips, not samples
  • Comparative - Easy to compare regions and time periods

Part 6: The Visual Query Model

Core Idea: Direct Manipulation

Let users query data through direct manipulation of visualizations

Spatial

Where?

  • Pickup regions
  • Dropoff regions
  • Draw polygons on map

Temporal

When?

  • Time ranges
  • Recurrence patterns
  • Day of week

Attribute

What?

  • Fare amount
  • Distance
  • Trip duration

Instead of writing SELECT * FROM trips WHERE..., you draw on a map

Visual Representation of Queries

TaxiVis interface showing visual query components

  • Blue polygons on map = pickup regions
  • Orange polygons = dropoff regions
  • Arrows = origin-destination queries
  • Time widgets = temporal constraints
  • Histograms = attribute constraints

Part 7: TaxiVis Interface

The Complete System

Full TaxiVis interface showing linked views

Interface Components

TaxiVis interface with labeled components

1. Map View (Left)

  • Geographic visualization
  • Interactive region selection
  • Origin-destination flows

2. Control Panel (Top)

  • Time range selection
  • Query type controls
  • Data aggregation settings

3. Temporal Views (Right)

  • Time series plots
  • Histograms
  • Daily/weekly patterns

Part 8: Visual Queries in Action

Example: Airport Comparison

The Question:

“How do trips to JFK vs. LGA differ on Sundays vs. Mondays?”

The Visual Query:

  1. Draw region around Lower Manhattan (pickup)
  2. Draw regions around JFK and LGA (dropoffs)
  3. Connect with arrows (directional constraints)
  4. Select Sunday vs. Monday (temporal constraints)

Airport comparison query results

The Results & Discovery

Side-by-side comparison of Sunday vs Monday airport trips

  • Side-by-side map comparison
  • Scatter plots: hour of day vs. trip duration
  • Discovery: Monday trips 3-5PM take much longer (rush hour!)
  • Implication: Creates economic disincentive for drivers to accept airport trips

Part 9: Query Expressiveness

Peuquet’s Triad Framework

All three fundamental spatio-temporal query types:

when + where → what

“What taxis were in Midtown at rush hour?”

when + what → where

“Where were high-fare trips on New Year’s Eve?”

where + what → when

“When do trips to airports peak?”

Plus: Query Composition

  • Queries can be refined, combined, compared
  • Results can be visualized multiple ways
  • Supports both atomic queries and complex queries (unions)

Part 10: Making It Interactive

The Performance Challenge

The Problem:

  • 170M trips = traditional databases too slow
  • PostgreSQL: 24 seconds for 100k-trip query
  • SQLite: 85 seconds for same query
  • Goal: Sub-second response

The TaxiVis Solution:

  • Custom k-d tree index
    • 30GB vs. 200GB for PostgreSQL
    • Build time: 28 min vs. 13 hours
  • Query time:
    • 2 seconds for 100k trips
    • 0.2s for 1k trips
  • Adaptive level-of-detail rendering
  • Smart heat maps and aggregation

Part 11: Visualization Techniques

The “Yellow Blob” Rendering Problem

The Challenge:

  • 500,000 trips/day as point cloud = complete clutter
  • Can’t see patterns, just noise
  • Traditional scatter plots fail at this scale

Completely cluttered map with all points

We need multiple visualization strategies

Solution 1: Adaptive Level of Detail (LOD)

Strategy: Render only what you can see

How it works:

  • Z-order curve hierarchical sampling
  • Sort points spatially, build binary tree
  • First n elements = hierarchical subsample of size n
  • n scales with zoom level

Result: Clear visualization at every zoom level

Level of detail rendering in action

As you zoom in, you see more detail. As you zoom out, you see a representative sample.

Solution 2: Heat Maps

Continuous Heat Maps

Pixel-based density heat map
  • Pixel-based density
  • Darker = more activity
  • Shows overall distribution patterns

Grid Maps

Grid-based aggregation map
  • Aggregate by meaningful regions
  • Neighborhoods, zip codes, boroughs
  • Hover for exact counts

When to use: Heat maps for overview and patterns, LOD for specific trip details, Grid maps for comparing defined regions

Solution 3: Multiple Coordinated Views

The Comparison Problem:

  • “How do Sundays differ from Mondays?”
  • “JFK vs. LGA patterns?”
  • “This year vs. last year?”

Solution:

  • Side-by-side views
  • Each view = one query (color-coded)
  • Synchronized spatial extent
  • Linked plots and summaries
  • Interactive refinement

Sunday vs Monday airport comparison

Part 12: Linked Views & Brushing

Core Concept: Brushing and Linking

Actions in one view are reflected in all other views

This creates a “dialogue” where you can ask questions by interacting with any visualization component, and all views update to answer your question.

Example: Spatial Selection → Temporal Pattern

“What is the temporal pattern for trips from JFK Airport?”

Linked Views in Action: Step 1

Default View: All Data

TaxiVis showing all taxi trips

The map shows the “yellow blob”—all trips. The time series shows aggregate patterns for the entire city.

Linked Views in Action: Step 2

User Brushes a Region (JFK Airport)

User selecting JFK region on the map

By clicking and dragging, the user selects a geographic region. In this case, the area around JFK Airport.

Linked Views in Action: Step 3

All Views Update Automatically

Time series and charts update to show only JFK data

The time series and histograms now show the temporal pattern for only trips from the JFK area.

The Power of Bidirectional Linking

The linking works in both directions

  • Spatial → Temporal: Select a region → see temporal patterns
  • Temporal → Spatial: Select a time range → see spatial patterns

This bidirectional dialogue enables exploratory analysis that would be impossible with static visualizations.

Part 13: Temporal Queries

Temporal Slicing: Time → Space

The Question:

“Where do trips go during morning rush hour?”

The Interaction:

  1. Select time range on the time series (8am-10am)
  2. Map updates to show only trips from that time period

The Insight:

Reveals spatial patterns specific to that time slice

Time series with highlighted time range

Map showing spatial pattern for selected time

Advanced Temporal Queries: Recurrent Selection

The Challenge:

What if I want to see a pattern, not just a single time slice?

The Solution: Recurrent Selection

Select recurring time periods:

  • All Mondays, 8am-10am
  • Every Saturday night
  • Weekday rush hours only

This reveals periodic behavior—the heartbeat of the city

Recurrent Selection interface with day-of-week checkboxes

Recurrent Selection Example

Question: “How do weekend nights differ from weekday mornings?”

Weekday Mornings (Mon-Fri, 7-9am)

Map showing weekday morning trip patterns

Inbound commuter patterns

Weekend Nights (Sat-Sun, 10pm-2am)

Map showing weekend night trip patterns

Entertainment district activity

Recurrent selection reveals systematic differences in urban activity patterns.

Part 14: Spatial Queries & Grouping

Solving MAUP: Interactive Region Merging

Remember the Modifiable Areal Unit Problem from Week 8?

The default boundaries might be wrong for your analysis.

TaxiVis Solution:

Merging Regions - Users can interactively select and merge multiple regions into a custom area

Example: Create your own “Midtown” by merging adjacent census tracts

Default census tract boundaries

Custom merged region for “Midtown”

Merging Regions: Step-by-Step

Step 1: Select Multiple Regions

Double-click to select multiple adjacent regions

Double-click or Ctrl+click to select multiple adjacent regions on the map

Merging Regions: Step 2 & 3

Step 2: Press Merge

Merge button in control panel

Click the “Merge” button in the control panel

Step 3: All Views Update

Map and charts showing merged region as single entity

The merged region is now treated as a single entity in all visualizations

The tool adapts to your analysis. You define the boundaries that make sense for your question.

Part 15: Origin-Destination Queries

The Most Powerful Query: Flows

Asking about movement between specific locations

Traditional Approach:

SELECT * FROM trips
WHERE origin = 'JFK'
AND destination = 'LGA'
AND time BETWEEN '8:00' AND '10:00'

Complex, requires knowing SQL and field names

TaxiVis Approach:

Draw an arrow

Simple, visual, intuitive

This is what we mean by “visual query”—you draw your question, the system answers.

The Arrow Tool: Step 1

Select the Arrow Tool

Arrow tool selected in toolbar

The arrow tool lets you create origin-destination (OD) queries by drawing directly on the map.

The Arrow Tool: Step 2

Draw Arrow from Origin to Destination

User drawing arrow from JFK to LGA

Example: Draw an arrow from JFK Airport to LaGuardia Airport to ask:

“Show me all trips that went from JFK to LGA”

The Arrow Tool: Step 3

All Views Update to Show Only That Flow

Dashboard showing only JFK→LGA trips

  • Map highlights the origin-destination pair
  • Time series shows when these trips occur
  • Histograms reveal patterns in this specific flow

Visual Queries: Why They Matter

  1. Lower cognitive load
    • No need to remember field names or syntax
    • Direct manipulation of the data representation
  2. Immediate feedback
    • See results instantly as you interact
    • Iterate quickly through hypotheses
  3. Support exploration
    • Encourages “what if” questions
    • Makes serendipitous discovery possible
  4. Democratize analysis
    • Analysts without SQL/programming skills can explore
    • Domain experts can directly investigate questions

The visualization is the interface.

Part 16: Case Study 1 - Social Inequality

Question: “Are some neighborhoods underserved by taxis?”

The Analysis:

  • Compare taxi activity across neighborhoods
  • Midtown, Upper East Side, Greenwich Village, Harlem
  • Look at pickups and dropoffs over one week

Taxi activity comparison across neighborhoods

The Discovery: Over 10x Difference

Harlem vs other neighborhoods taxi activity

  • Harlem has very few pickups despite many dropoffs
  • People can take taxis TO Harlem but can’t get one FROM there
  • Over one order of magnitude difference from Midtown

Follow-Up Investigation

The exploration followed a natural path:

  1. Initial pattern: Harlem has fewer pickups
  2. Hypothesis: Is this an economic issue?
  3. Investigation 1: “Are tips different in Harlem?”
    • Discovery: Yes! Higher tips
  4. Investigation 2: “Is fare/mile different?”
    • Discovery: Yes! Lower fare/mile
  5. Insight: Less economic incentive for drivers to go to Harlem, despite higher tips

Tips and fare analysis for Harlem

Part 17: Case Study 2 - Transportation Hubs

Question: “How do people move through NYC’s transportation infrastructure?”

The Setup:

  • Compare JFK, LGA, Penn Station, Grand Central
  • Use grouping to combine regions
  • Examine pickup patterns over one week

Transportation hubs comparison

Key Findings

  1. More pickups at LGA than JFK (most days)

  2. Train stations >> airports for pickups

  3. Weekday pattern: Train station pickups constant Mon-Thu, drop Fri-Sat

    • Reflects commuter behavior
  4. Rush hour problem: Airport trips take much longer 3-5PM

    • Creates economic disincentive for drivers
    • Explains why taxis illegally refuse airport trips

Part 18: Case Study 3 - Temporal Exploration

Time-Space Exploration

Feature:

  • Select multiple time slices automatically
  • Compare same time across different days/weeks/months
  • Each slice gets its own map and plot line (color-coded)

Example: Memorial Day Analysis

  • All Mondays in May 2011 and May 2012

Grid of maps showing each Monday in May

Discovery: Memorial Day Pattern

Memorial Day vs regular Mondays

  • Discovery: Memorial Day has significantly fewer trips than regular Mondays
  • Implication: Could reduce fleet size on holidays to save costs

Part 19: Case Study 4 - Hurricane Sandy

Question: “How did Hurricane Sandy affect NYC?”

The Analysis:

  • One week of taxi activity
  • Sunday before through Saturday after
  • Heat maps for each day
  • Compare spatial patterns

The Timeline:

  • Sunday (before): Normal activity
  • Monday (hurricane hits): Virtually no taxis citywide
  • Tuesday-Friday: Activity returns everywhere EXCEPT Lower Manhattan
  • Saturday: Finally returns to normal

The Story the Data Tells

Daily heat maps showing Hurricane Sandy impact

Why? Lower Manhattan had a 5-day power outage

You can literally see the power outage as a dark region on the map.

Comparison to Hurricane Irene

Hurricane Irene impact on taxi trips

  • Shorter disruption but more complete
  • Only 1,076 trips on hurricane day (vs. average 500,000)
  • Faster recovery

Part 20: What We Learned

Design Insights from Building TaxiVis

  1. Visual queries work
    • Domain experts could use it without training
    • No SQL, no programming required
    • Direct manipulation is intuitive
  2. Performance is non-negotiable
    • Sub-second response enables exploration
    • Custom indexing beat general databases by 10x
    • Adaptive rendering essential for large results
  3. Multiple views are essential
    • Comparison is core to analysis
    • Side-by-side queries, synchronized views
    • Linked plots and maps
  4. Different visualizations for different questions
    • LOD for details, heat maps for patterns, grid maps for regions
  5. Query composition is powerful
    • Build complex queries from simple ones
    • Grouping, refinement, generalization
    • Each result is a new dataset to explore

Part 21: Real-World Impact

Who’s Using TaxiVis?

Users:

  • NYC Department of Transportation
  • NYC Taxi & Limousine Commission
  • Traffic engineers and urban planners
  • Economists studying urban mobility

What They’ve Learned:

  • Social inequalities in taxi service (Harlem)
  • Economic incentives affecting driver behavior
  • Impact of major events on city mobility
  • Transportation hub usage patterns

Beyond Taxis:

  • Model applies to other origin-destination data
  • Generalizes to other spatio-temporal datasets
  • Principles useful for any urban data exploration

Part 22: Key Takeaways

The Big Ideas

  1. Urban data is fundamentally spatio-temporal
    • Space, time, and attributes all matter
    • Need to explore interactions, not just individual dimensions
  2. Static visualization is not enough
    • Interactivity transforms presentation into exploration
    • “Dialogue with data” through visual queries
  3. Design for domain experts, not data scientists
    • Visual operations instead of code
    • Direct manipulation over programming
    • But don’t sacrifice expressiveness
  4. Performance enables exploration
    • Sub-second response changes how people think
    • Specialized systems beat general solutions
    • Trade generality for interactivity
  5. Multiple coordinated views for comparison
    • Urban analysis is inherently comparative
    • Same model, different slices
    • Build complex understanding from simple queries

Part 23: From TaxiVis to Urban Analytics

The Broader Vision

TaxiVis is One Example

Other urban data:

  • Bikeshare systems
  • 311 service calls
  • Building permits
  • Transit ridership
  • Crime reports
  • Traffic sensors

Same challenges: Scale, complexity, spatio-temporal nature

The Visual Analytics Framework

Visual analytics pipeline
  1. Visualization: Multiple representations and query models
  2. Data Analysis: Topology, ML, pattern detection
  3. Data Management: Specialized indices, GPU acceleration

Moving to 3D

Cities are vertical, not just horizontal

  • Shadow analysis
  • Views and sight lines
  • Sky exposure
  • Building massing

Next Lecture:

We’ll extend these ideas to 3D urban visualization and the Urbane framework

Same principles apply: interactive, visual, scalable

Summary

The Visual Analytics Pipeline for Urban Data

What We’ve Covered Today:

The Problem:

  • Urban data is big, complex, spatio-temporal
  • Traditional tools don’t scale
  • Confirmation vs. exploration gap

The Solution:

  • Visual query models for direct manipulation
  • Interactive performance through specialized systems
  • Multiple visualizations for different questions
  • Comparison through coordinated views

The Result:

  • Domain experts can explore without programming
  • Discoveries about social inequality, economics, events
  • Real impact on city operations and policy

Try It Yourself - Exploration Exercise

If you had access to TaxiVis, what would you explore?

Think about:

  • A neighborhood you’re curious about
  • A time pattern (weekday vs. weekend, holidays, events)
  • A comparison (this year vs. last year, two locations)
  • A hypothesis about urban behavior

Discussion:

  • What question would you ask?
  • What spatial regions would you select?
  • What time slices would you compare?
  • What would you expect to find?

Questions?

Next week: 3D & Immersive Urban Visualization

Preview:

  • Urbane framework for 3D urban planning
  • Interactive impact analysis (shadows, views, sky exposure)
  • Performance-driven architectural design
  • When to use 3D (and when not to)