Urban Visualization I

Flows, Time & Interactivity (2D + Time) - CS-GY 6313 - Fall 2025

Claudio Silva

2025-10-31

From Static Maps to Dynamic Urban Systems

The Yellow Blob Problem

What happens when we apply static techniques to massive urban datasets?

This is 140 million NYC taxi trips visualized as a static heatmap.

What can you learn from this?

Nothing. It’s a “yellow blob.”

Static heatmap of 140 million NYC taxi trips

Today’s Goal

Learn to visualize urban dynamics—movement and time—using interactive, linked views

Challenge: Urban data is massive, dense, and dynamic
Solution: Turn visualization into an exploration tool
Case Study: TaxiVis—a system for exploring NYC taxi flows

We’ll learn how to move from this “yellow blob” to meaningful insights about urban mobility patterns.

Part 1: The Urban Context

Why This Matters

The Urban Challenge:

Over 50% of world’s population in cities
+2.5 billion more by 2050
Cities = centers of innovation
But also: sprawl, pollution, inequality

How do we make better decisions about urban development?

The Data Opportunity:

We can now collect, store, and open massive urban datasets
Benefits:
- Government: Operations & planning
- Science: Discovery
- Residents: Participation
- Industry: Innovation

What Makes Urban Data Unique?

Three Interacting Components:

Residents - People and their behavior
Infrastructure - Physical systems and policies
Environment - Natural and built surroundings

To understand a city, we must explore how these interact over space and time.

Four Key Characteristics:

Scale
- Human-level to metro-level
- Millions of individual events
Density
- Massive overplotting
- Traditional techniques fail
Complexity
- Interconnected systems
- Transit, traffic, social networks
Dynamism
- Cities defined by time & movement
- Static snapshots miss the essence

Hägerstrand’s Space-Time Cube (1970)

The Classic Framework for Movement Analysis

X, Y dimensions: Geographic space
Z dimension: Time
Each person’s life: a line through the cube

The Urban Challenge:

Understanding millions of these paths, all interacting simultaneously

Hägerstrand’s Space-Time Cube showing individual trajectories

Part 2: Why Traditional Analysis Fails

The Old Workflow (Confirmatory Data Analysis)

The Process:

Domain experts formulate hypotheses
Data scientists select data
Run analyses (SQL, R, Python)
Domain experts inspect results
Repeat…

The Problems with Traditional Workflows

❌ Cognitive overload - Generates overwhelming numbers of plots/tables

❌ Batch-oriented - No exploration, just predetermined queries

❌ Distances experts from data - Requires intermediary (data scientist)

❌ Cannot scale - Modern data volumes overwhelm traditional tools

❌ Result: The “yellow blob” problem

Even simple questions require generating dozens of individual plots, each manually programmed

The Problem with Current Tools

Current practice for domain experts:

Load small slices into R, MATLAB, Stata, ArcGIS, Excel
Write SQL queries or code to analyze subsets
Generate individual plots manually
Repeat for each new question

This is tedious, slow, and limiting

Why this fails:

❌ Tools can’t handle 170M trips

❌ Requires programming/database expertise

❌ Time-consuming and frustrating

❌ Distances experts from the data

❌ Hard to explore, compare, or follow up on patterns

Part 3: The Need for Interactivity

The Paradigm Shift

Static Visualization

Role: Presentation

Answers questions the designer anticipated
Fixed perspective
One-way communication

Limitation: Cannot handle 140 million data points

Interactive Visualization

Role: Exploration

Enables users to ask their own questions
Dynamic perspectives
“Dialogue with data”

Power: Query and filter to reveal patterns

The Visual Information Seeking Mantra

“Overview first, Zoom and Filter, then Details-on-Demand”

— Ben Shneiderman (1996)

1. Overview

Start with the big picture (even if it’s a “yellow blob”)

2. Zoom & Filter

Focus on items of interest

3. Details-on-Demand

Get specifics when needed

Why Interactivity Matters

Performance is critical
- Even 500ms latency significantly reduces exploration, observations, and hypothesis generation
- Sub-second response enables iterative analysis
Perception over cognition
- Well-designed systems let you see patterns rather than calculate them
- Visual queries are faster than SQL
Empowers domain experts
- No programming required
- Direct manipulation of visualizations
- Experts can explore data themselves

Research shows that even a half-second delay dramatically impacts analysis quality.

Part 4: Enter TaxiVis

The NYC Taxi Dataset

The Data:

13,000 taxis
500,000 trips/day
170 million trips/year

Each trip includes:

Pickup/dropoff locations & times
Distance traveled
Fare amount
Tip amount

What can taxi data tell us?

Economic activity and human behavior
Mobility patterns across the city
Response to major events (hurricanes, holidays)
Social inequalities in service

Challenge: How do we make sense of this data?

Taxi Patterns and Anomalies

Taxi activity patterns showing regularity and anomalies

Regular patterns: Thanksgiving, Christmas drops in activity
Anomalies: Hurricane Irene, Hurricane Sandy disruptions
Events: Five Boro Bike Tour (taxis disappeared along 6th Avenue)

Part 5: Design Requirements

What We Need (From Domain Experts)

Query Needs:

Understand dynamics
- “How do patterns vary over space and time?”
Explore events
- “What happened during Hurricane Sandy?”
Compare regions
- “Midtown vs. Harlem taxi frequency?”
Study movement
- “Where do people go from JFK?”

System Requirements:

Interactive - Sub-second response times
Expressive - Support complex spatio-temporal queries
Usable - No SQL or programming required
Scalable - Handle all 170M trips, not samples
Comparative - Easy to compare regions and time periods

Part 6: The Visual Query Model

Core Idea: Direct Manipulation

Let users query data through direct manipulation of visualizations

Spatial

Where?

Pickup regions
Dropoff regions
Draw polygons on map

Temporal

When?

Time ranges
Recurrence patterns
Day of week

Attribute

What?

Fare amount
Distance
Trip duration

Instead of writing SELECT * FROM trips WHERE..., you draw on a map

Visual Representation of Queries

TaxiVis interface showing visual query components

Blue polygons on map = pickup regions
Orange polygons = dropoff regions
Arrows = origin-destination queries
Time widgets = temporal constraints
Histograms = attribute constraints

Part 7: TaxiVis Interface

The Complete System

Full TaxiVis interface showing linked views

Interface Components

TaxiVis interface with labeled components

1. Map View

Geographic visualization
Interactive region selection
Origin-destination flows

2. Control Panel

Time range selection
Query type controls
Data aggregation settings

3. Temporal Views

Time series plots
Histograms
Daily/weekly patterns

Part 8: Visual Queries in Action

Example: Airport Comparison

The Question:

“How do trips to JFK vs. LGA differ on Sundays vs. Mondays?”

The Visual Query:

Draw region around Lower Manhattan (pickup)
Draw regions around JFK and LGA (dropoffs)
Connect with arrows (directional constraints)
Select Sunday vs. Monday (temporal constraints)

The Results & Discovery

Side-by-side comparison of Sunday vs Monday airport trips

Side-by-side map comparison
Scatter plots: hour of day vs. trip duration
Discovery: Monday trips 3-5PM take much longer (rush hour!)
Implication: Creates economic disincentive for drivers to accept airport trips

Part 9: Query Expressiveness

Peuquet’s Triad Framework

All three fundamental spatio-temporal query types:

when + where → what

“What taxis were in Midtown at rush hour?”

when + what → where

“Where were high-fare trips on New Year’s Eve?”

where + what → when

“When do trips to airports peak?”

Plus: Query Composition

Queries can be refined, combined, compared
Results can be visualized multiple ways
Supports both atomic queries and complex queries (unions)

Part 10: Making It Interactive

The Performance Challenge

The Problem:

170M trips = traditional databases too slow
PostgreSQL: 24 seconds for 100k-trip query
SQLite: 85 seconds for same query
Goal: Sub-second response

The TaxiVis Solution:

Custom k-d tree index
- 30GB vs. 200GB for PostgreSQL
- Build time: 28 min vs. 13 hours
Query time:
- 2 seconds for 100k trips
- 0.2s for 1k trips
Adaptive level-of-detail rendering
Smart heat maps and aggregation

Part 11: Visualization Techniques

The “Yellow Blob” Rendering Problem

The Challenge:

500,000 trips/day as point cloud = complete clutter
Can’t see patterns, just noise
Traditional scatter plots fail at this scale

Completely cluttered map with all points

We need multiple visualization strategies

Solution 1: Adaptive Level of Detail (LOD)

Strategy: Render only what you can see

How it works:

Z-order curve hierarchical sampling
Sort points spatially, build binary tree
First n elements = hierarchical subsample of size n
n scales with zoom level

Result: Clear visualization at every zoom level

As you zoom in, you see more detail. As you zoom out, you see a representative sample.

Solution 2: Heat Maps

Continuous Heat Maps

Pixel-based density
Darker = more activity
Shows overall distribution patterns

Grid Maps

Aggregate by meaningful regions
Neighborhoods, zip codes, boroughs
Hover for exact counts

When to use: Heat maps for overview and patterns, LOD for specific trip details, Grid maps for comparing defined regions

Solution 3: Multiple Coordinated Views

The Comparison Problem:

“How do Sundays differ from Mondays?”
“JFK vs. LGA patterns?”
“This year vs. last year?”

Solution:

Side-by-side views
Each view = one query (color-coded)
Synchronized spatial extent
Linked plots and summaries
Interactive refinement

Part 12: Linked Views & Brushing

Core Concept: Brushing and Linking

Actions in one view are reflected in all other views

This creates a “dialogue” where you can ask questions by interacting with any visualization component, and all views update to answer your question.

Example: Spatial Selection → Temporal Pattern

“What is the temporal pattern for trips from JFK Airport?”

Linked Views in Action: Step 1

Default View: All Data

TaxiVis showing all taxi trips

The map shows the “yellow blob”—all trips. The time series shows aggregate patterns for the entire city.

Linked Views in Action: Step 2

User Brushes a Region (JFK Airport)

User selecting JFK region on the map

By clicking and dragging, the user selects a geographic region. In this case, the area around JFK Airport.

Linked Views in Action: Step 3

All Views Update Automatically

Time series and charts update to show only JFK data

The time series and histograms now show the temporal pattern for only trips from the JFK area.

The Power of Bidirectional Linking

The linking works in both directions

Spatial → Temporal: Select a region → see temporal patterns
Temporal → Spatial: Select a time range → see spatial patterns

This bidirectional dialogue enables exploratory analysis that would be impossible with static visualizations.

Part 13: Temporal Queries

Temporal Slicing: Time → Space

The Question:

“Where do trips go during morning rush hour?”

The Interaction:

Select time range on the time series (8am-10am)
Map updates to show only trips from that time period

The Insight:

Reveals spatial patterns specific to that time slice

Map showing spatial pattern for selected time

Advanced Temporal Queries: Recurrent Selection

The Challenge:

What if I want to see a pattern, not just a single time slice?

The Solution: Recurrent Selection

Select recurring time periods:

All Mondays, 8am-10am
Every Saturday night
Weekday rush hours only

This reveals periodic behavior—the heartbeat of the city

Recurrent Selection interface with day-of-week checkboxes

Recurrent Selection Example

Question: “How do weekend nights differ from weekday mornings?”

Weekday Mornings (Mon-Fri, 7-9am)

Map showing weekday morning trip patterns

Inbound commuter patterns

Weekend Nights (Sat-Sun, 10pm-2am)

Entertainment district activity

Recurrent selection reveals systematic differences in urban activity patterns.

Part 15: Origin-Destination Queries

The Most Powerful Query: Flows

Asking about movement between specific locations

Traditional Approach:

SELECT * FROM trips
WHERE origin = 'JFK'
AND destination = 'LGA'
AND time BETWEEN '8:00' AND '10:00'

Complex, requires knowing SQL and field names

TaxiVis Approach:

Draw an arrow

Simple, visual, intuitive

This is what we mean by “visual query”—you draw your question, the system answers.

The Arrow Tool: Step 1

Select the Arrow Tool

Arrow tool selected in toolbar

The arrow tool lets you create origin-destination (OD) queries by drawing directly on the map.

The Arrow Tool: Step 2

Draw Arrow from Origin to Destination

User drawing arrow from JFK to LGA

Example: Draw an arrow from JFK Airport to LaGuardia Airport to ask:

“Show me all trips that went from JFK to LGA”

The Arrow Tool: Step 3

All Views Update to Show Only That Flow

Dashboard showing only JFK→LGA trips

Map highlights the origin-destination pair
Time series shows when these trips occur
Histograms reveal patterns in this specific flow

Visual Queries: Why They Matter

Lower cognitive load
- No need to remember field names or syntax
- Direct manipulation of the data representation
Immediate feedback
- See results instantly as you interact
- Iterate quickly through hypotheses
Support exploration
- Encourages “what if” questions
- Makes serendipitous discovery possible
Democratize analysis
- Analysts without SQL/programming skills can explore
- Domain experts can directly investigate questions

The visualization is the interface.

Question: “Are some neighborhoods underserved by taxis?”

The Analysis:

Compare taxi activity across neighborhoods
Midtown, Upper East Side, Greenwich Village, Harlem
Look at pickups and dropoffs over one week

Taxi activity comparison across neighborhoods

The Discovery: Over 10x Difference

Harlem vs other neighborhoods taxi activity

Harlem has very few pickups despite many dropoffs
People can take taxis TO Harlem but can’t get one FROM there
Over one order of magnitude difference from Midtown

Follow-Up Investigation

The exploration followed a natural path:

Initial pattern: Harlem has fewer pickups
Hypothesis: Is this an economic issue?
Investigation 1: “Are tips different in Harlem?”
- Discovery: Yes! Higher tips
Investigation 2: “Is fare/mile different?”
- Discovery: Yes! Lower fare/mile
Insight: Less economic incentive for drivers to go to Harlem, despite higher tips

Tips and fare analysis for Harlem

Part 17: Case Study 2 - Transportation Hubs

Question: “How do people move through NYC’s transportation infrastructure?”

The Setup:

Compare JFK, LGA, Penn Station, Grand Central
Use grouping to combine regions
Examine pickup patterns over one week

Key Findings

More pickups at LGA than JFK (most days)
Train stations >> airports for pickups
Weekday pattern: Train station pickups constant Mon-Thu, drop Fri-Sat
- Reflects commuter behavior
Rush hour problem: Airport trips take much longer 3-5PM
- Creates economic disincentive for drivers
- Explains why taxis illegally refuse airport trips

Part 18: Case Study 3 - Temporal Exploration

Time-Space Exploration

Feature:

Select multiple time slices automatically
Compare same time across different days/weeks/months
Each slice gets its own map and plot line (color-coded)

Example: Memorial Day Analysis

All Mondays in May 2011 and May 2012

Discovery: Memorial Day Pattern

Memorial Day vs regular Mondays

Discovery: Memorial Day has significantly fewer trips than regular Mondays
Implication: Could reduce fleet size on holidays to save costs

Part 19: Case Study 4 - Hurricane Sandy

Question: “How did Hurricane Sandy affect NYC?”

The Analysis:

One week of taxi activity
Sunday before through Saturday after
Heat maps for each day
Compare spatial patterns

The Timeline:

Sunday (before): Normal activity
Monday (hurricane hits): Virtually no taxis citywide
Tuesday-Friday: Activity returns everywhere EXCEPT Lower Manhattan
Saturday: Finally returns to normal

The Story the Data Tells

Daily heat maps showing Hurricane Sandy impact

Why? Lower Manhattan had a 5-day power outage

You can literally see the power outage on the map.

Comparison to Hurricane Irene

Hurricane Irene impact on taxi trips

Shorter disruption but more complete
Only 1,076 trips on hurricane day (vs. average 500,000)
Faster recovery

Part 20: What We Learned

Design Insights from Building TaxiVis

Visual queries work
- Domain experts could use it without training
- No SQL, no programming required
Performance is non-negotiable
- Sub-second response enables exploration
- Adaptive rendering essential for large results
Multiple views are essential
- Comparison is core to analysis
- Side-by-side queries, synchronized views
- Linked plots and maps
Query composition is powerful
- Build complex queries from simple ones
- Grouping, refinement, generalization

Part 21: Real-World Impact

Who’s Using TaxiVis?

Users:

NYC Department of Transportation
NYC Taxi & Limousine Commission
Traffic engineers and urban planners
Economists studying urban mobility

What They’ve Learned:

Social inequalities in taxi service (Harlem)
Economic incentives affecting driver behavior
Impact of major events on city mobility
Transportation hub usage patterns

Beyond Taxis:

Model applies to other origin-destination data
Generalizes to other spatio-temporal datasets
Principles useful for any urban data exploration

Part 22: Key Takeaways

The Big Ideas

Urban data is fundamentally spatio-temporal
- Space, time, and attributes all matter
- Need to explore interactions, not just individual dimensions
Static visualization is not enough
- Interactivity transforms presentation into exploration
- “Dialogue with data” through visual queries
Design for domain experts, not data scientists
- Visual operations instead of code
- Direct manipulation over programming
- But don’t sacrifice expressiveness
Performance enables exploration
- Sub-second response changes how people think
- Specialized systems beat general solutions
- Trade generality for interactivity

Part 23: From TaxiVis to Urban Analytics

The Broader Vision

TaxiVis is One Example

Other urban data:

Bikeshare systems
311 service calls
Building permits
Transit ridership
Crime reports
Traffic sensors

Same challenges: Scale, complexity, spatio-temporal nature

The Visual Analytics Framework

Visualization: Multiple representations and query models
Data Analysis: Topology, ML, pattern detection
Data Management: Specialized indices, GPU acceleration

Moving to 3D

Cities are vertical, not just horizontal

Shadow analysis
Views and sight lines
Sky exposure
Building massing

Next Lecture:

We’ll extend these ideas to 3D urban visualization and the Urbane framework

Same principles apply: interactive, visual, scalable

Summary

The Visual Analytics Pipeline for Urban Data

What We’ve Covered Today:

The Problem:

Urban data is big, complex, spatio-temporal
Traditional tools don’t scale
Confirmation vs. exploration gap

The Solution:

Visual query models for direct manipulation
Interactive performance through specialized systems
Multiple visualizations for different questions
Comparison through coordinated views

The Result:

Domain experts can explore without programming
Discoveries about social inequality, economics, events
Real impact on city operations and policy

Try It Yourself - Exploration Exercise

If you had access to TaxiVis, what would you explore?

Think about:

A neighborhood you’re curious about
A time pattern (weekday vs. weekend, holidays, events)
A comparison (this year vs. last year, two locations)
A hypothesis about urban behavior

Discussion:

What question would you ask?
What spatial regions would you select?
What time slices would you compare?
What would you expect to find?

Questions?

Next week: 3D & Immersive Urban Visualization

Preview:

Urbane framework for 3D urban planning
Interactive impact analysis (shadows, views, sky exposure)
Performance-driven architectural design
When to use 3D (and when not to)