Visualizing Network Data

CS-GY 6313 - Fall 2025

Claudio Silva

2025-11-26

Introduction

Based on materials by Enrico Bertini NYU Tandon School of Engineering

Part 1: Networks

Objects, relationships, and visualization approaches

Network Data Fundamentals

Network Data = Objects + Relationships + Values

  • Objects: Nodes (people, cities, genes)
  • Relationships: Edges (friendships, routes, interactions)
  • Values: Node attributes (age, size) & edge attributes (weight, type)

Application Domains: Social networks, biological systems, transportation, web, infrastructure

Example: Friendship Network

  • Nodes: People with attributes (name, age)
  • Edges: Friendship connections
  • Attributes: Could encode strength, duration, message count

Fundamental Approaches

1. Node-Link Diagrams

  • Nodes = dots/markers
  • Edges = connecting lines
  • Two types: Force-directed (algorithmic) & Fixed (meaningful positions)

2. Adjacency Matrices

  • Rows & columns = nodes
  • Cells = edges
  • No line crossings, but less intuitive

Force-Directed

Matrix

Force-Directed Layouts

Physics simulation for automatic positioning

Force-Directed: How It Works

Physical Analogy:

  • Edge Attraction: Connected nodes pull together (springs)
  • Node Repulsion: All nodes push apart (charged particles)
  • System evolves until forces balance

Result: Clusters emerge, bridges visible, hubs toward center

Force-Directed Algorithm

Steps:

  1. Initialize: Random positions
  2. Calculate: Net force on each node (attraction + repulsion)
  3. Move: Displace nodes by force × step_size
  4. Iterate: Repeat until convergence (50-500 iterations)

Complexity: O(n²) per iteration (expensive for large networks)

Step 1: Random

Step 4: Converged

Visual Encoding on Nodes & Edges

Node Encodings:

  • Size: Degree, importance, value
  • Color: Category, cluster, metric
  • Shape: Type (circles, squares, triangles)

Edge Encodings:

  • Thickness: Weight, strength, traffic
  • Pattern: Type (solid, dashed, dotted)
  • Color: Category, direction

Encoded network

Example: Size=age, Color=gender, Thickness=messages, Pattern=old/new friendship

Fixed Layouts

Meaningful node positions instead of algorithmic

Fixed Layouts: When & Why

When to use fixed instead of force-directed:

  1. Nodes have meaningful attributes for positioning (geography, time, hierarchy)
  2. Emphasizing edges/flows more than clustering
  3. Need stable, reproducible layouts

Common patterns: Circular, linear, grid, spatial (geographic)

Circular Layouts & Edge Bundling

Circular Layout:

  • Nodes evenly distributed around circle
  • Critical decision: Node ordering (alphabetical? by cluster? to minimize crossings?)

Edge Bundling:

  • Route similar edges along common paths
  • Reduces clutter, reveals flow patterns

Hierarchical Edge Bundling

Spatial Networks & Bundling

Geographic Networks: Nodes at real locations (airports, cities, servers)

Problem: Edge clutter!

Solution: Edge bundling reveals corridors, hubs, regional patterns

Tools: qGIS, Gephi, D3.js

Before: Cluttered

After: Bundled

Adjacency Matrices

Table representation of networks

Matrices: Concept & Trade-offs

Encoding:

  • Rows & columns = nodes
  • Cell (i,j) = edge from node i to j
  • Color/symbol = edge weight/presence

Advantages: ✓ All nodes visible, ✓ No crossings, ✓ Scalable to denser networks

Disadvantages: ✗ Less intuitive, ✗ Needs reordering, ✗ n² space, ✗ Hard to trace paths

Network → Matrix

Les Misérables

The Hairball Problem & Matrix Ordering

“Hairball”: Dense networks as node-link diagrams = unreadable

Solution: Switch to matrix OR apply clutter reduction

Critical: Matrix ordering reveals patterns!

Hairball

Random vs Clustered ordering

Directed Graphs & Alternatives

Directed Matrices:

  • Asymmetric: (i,j) ≠ (j,i)
  • Above diagonal = one direction
  • Below diagonal = opposite direction
  • Easy to see reciprocity

Alternative: Parallel axes (bipartite-like view)

Directed Matrix

Parallel Axes

Clutter Reduction Strategies

Five main techniques:

  1. Edge Bundling: Route similar edges together
  2. Clustering: Group nodes into super-nodes
  3. Filtering: Show subset (threshold, top-k, backbone)
  4. On-Demand: Show edges only on hover/click
  5. Motif Simplification: Replace patterns with glyphs (cliques, stars → symbols)

Filtering

Quiz: Fixed vs Force-Directed

Question: The main advantage of a fixed layout over force-directed is:

A. Less cluttered visualizations B. Node positions encode meaningful data C. Faster computation

Answer: B - Position can carry information (geography, time, category)

Part 2: Trees (Hierarchies)

Specialized networks with no cycles

Trees: Definition & Applications

Tree: Network with hierarchical structure, no cycles

Properties:

  • One root node
  • Parent-child relationships
  • Leaves: Nodes with no children
  • Unique path between any two nodes

Real-world: File systems, org charts, evolutionary trees, taxonomies, syntax trees

Two Approaches for Trees

1. Node-Link

  • Explicit parent-child connections
  • Structure very visible
  • Familiar, intuitive
  • Limitation: Doesn’t scale (exponential width growth)

2. Space-Filling (Containment)

  • Nesting shows hierarchy
  • No explicit edges
  • Space-efficient, can show size
  • Limitation: Structure harder to see

Node-Link Trees: Examples

Top-Down

File systems, org charts

Radial

More space-efficient

Indented List

Most compact

Issues: Scalability (1D growth), labeling, limited encoding channels

Special Trees: Dendrograms

Dendrogram: Tree showing hierarchical clustering results

Algorithm (Agglomerative): 1. Start: Each point = own cluster 2. Find two closest clusters 3. Merge them (height = distance) 4. Repeat until one cluster

Properties: - Binary tree structure - Branch height = dissimilarity at merge - Cutting at height defines # of clusters

Used in: Gene expression, customer segmentation, document clustering

With Heatmap

Special Trees: Decision Trees

Decision Tree: Each node = decision point

Two contexts:

  1. Human decision-making (flowcharts, election scenarios)
  2. Machine learning (learned classification models)

Why visualize: Interpretability, debugging, trust, bias detection

NYT Election Tree

ML Decision Tree

Treemaps

Space-filling approach for large hierarchies

Treemaps: Origin & Encoding

Origin (Ben Shneiderman, 1990): “My hard disk is full - what’s using space?”

Encoding:

  • Area: Quantitative value (size, revenue, count)
  • Color: Category OR secondary metric
  • Nesting: Hierarchical structure

Key innovation: Shows BOTH hierarchy AND size

Treemap Algorithms: Squarified vs Slice-and-Dice

Problem: Slice-and-Dice creates thin rectangles (bad aspect ratios)

Solution: Squarified algorithm optimizes for square-like shapes

Trade-off: Squarified is more readable but less stable (layout changes with data updates)

Squarified (better readability)

Comparison

Treemap Examples

File Systems

Disk usage tools

Finance

Stock market heat maps

Code Analysis

Linux kernel by file type

Applications: Business dashboards, news (Newsmap), analytics, sports

Treemap Trade-offs

Advantages:

  • ✓ Scalability (thousands of nodes)
  • ✓ All nodes visible
  • ✓ Encodes size + category
  • ✓ Space-efficient

Disadvantages:

  • ✗ Size less accurate than position
  • ✗ Structure harder to see
  • ✗ Layout algorithm affects readability

Sunburst & Icicle Plots

Middle ground between node-link and treemaps:

  • Sunburst: Radial (concentric rings)
  • Icicle: Linear (horizontal bands)

Space efficiency: Treemap > Icicle > Sunburst Hierarchy perception: Icicle ≈ Sunburst > Treemap Familiarity: Treemap > Icicle > Sunburst

Sunburst

Icicle

Summary & Recommendations

Visualization Method Summary

Networks (General Graphs):

  • Force-Directed: Unknown structure, exploration, <100 nodes
  • Fixed Layout: Known groupings, spatial data, semantic positioning
  • Matrix: Dense networks, analytical tasks, expert users

Trees (Hierarchies):

  • Node-Link: Small trees (<100), structure focus, intuitive
  • Treemaps: Large trees (1000s), size focus, space-efficient
  • Sunburst/Icicle: Balance of structure + size, moderate depth

Key principle: No single “best” method - depends on data, task, audience

Design Guidelines & Trade-offs

Key dimensions to consider:

  1. Clutter: How crowded?
  2. Scalability: How many nodes/edges?
  3. Structure visibility: How clear is topology?
  4. Familiarity: How intuitive for audience?
  5. Reordering needs: Preprocessing required?

Decision process: Try multiple approaches → prototype → user test → iterate

Programming Exercises (suggestion!)

Create variants of each technique:

  1. Force-directed layout: Implement algorithm OR use D3.js, experiment with encodings
  2. Circular/chord diagram: Fixed layout with edge bundling
  3. Node-link tree: Top-down OR radial, with collapse/expand
  4. Treemap: Squarified algorithm with zoom/drill-down

Explore:

  • Visual encodings (size, color, thickness)
  • Interaction (zoom, filter, details-on-demand)
  • Real datasets (social, citations, file systems, hierarchical clustering)

Tools: D3.js, Gephi, Cytoscape, NetworkX (Python), igraph (R)

Resources & References

Essential readings:

  • Shirley Wu: Understanding the Force
  • Holten (2006): Hierarchical Edge Bundling - IEEE TVCG
  • Shneiderman (1992): Tree Visualization with Tree-maps - ACM TOG
  • Bruls et al. (2000): Squarified Treemaps - Springer

Advanced topics: Motif simplification (Dunne & Shneiderman), semantic substrates, time-varying networks, multilayer networks

Interactive examples: Observable (D3.js), Gephi tutorials, Graph visualization survey papers

Thank You!

Next class: Week 14 - Final Project Presentations

Questions?