CS-GY 9223: Selected Topics in CS - Visualization for Machine Learning

Instructor: Claudio Silva (csilva@nyu.edu)
Teaching Assistant: Parikshit Solunke (pss442@nyu.edu)
Meeting Time: Mondays 5:00 PM - 7:30 PM
Classroom: Jacobs Hall, 6 Metrotech Room 473, Brooklyn Campus
Make-up Class: Tuesday, October 14 (for Fall Break)

Announcements

⚠️ Please Note: Course schedule, assignments, and materials are tentative and subject to updates during the semester. Students will be notified of any changes via Discord and course announcements.

Course materials will be posted as the semester progresses

Upcoming Classes

Week 1 (Sept 2) - Labor Day

No Class - Labor Day Holiday

Week 2 (Sept 8)

Topics: Course Introduction, Syllabus, Introduction to Visualization, Hands-on Vega-Lite
Materials:
Assignment: TBD

Week 3 (Sept 15)

Topics: Perception for Design, Color Theory for Visualization
Materials:

Week 4 (Sept 22)

Topics: Model Assessment and Evaluation
Materials:
- Model Assessment and Evaluation
- Lab: Materials to be posted
Recommended Readings:
- Squares: Supporting Interactive Performance Analysis for Multiclass Classifiers (Ren et al., 2016)
- Neo: Generalizing Confusion Matrix Visualization to Hierarchical and Multi-Output Labels (Görtler et al., 2022)
Content:
- Confusion Matrices and ROC Curves
- Visual Analytics Systems for Model Performance
- Calibration Theory and Practice

Week 5 (Sept 29)

Topics: Visualization for White-box Machine Learning Models
Materials:
- White-box Model Interpretation
- Lab: Interpretable ML Methods
Recommended Readings:
- A Partition-Based Framework for Building and Validating Regression Models (Mühlbacher & Piringer, 2013) - Best Paper Award, IEEE VAST 2013
- Gamut: A Design Probe to Understand How Data Scientists Understand Machine Learning Models (Hohman et al., 2019)
- BaobabView: Interactive Construction and Analysis of Decision Trees (van den Elzen & van Wijk, 2011)
Content:
- Linear Regression and Visual Analytics Systems
- Generalized Additive Models (GAMs) and Explainable Boosting Machines
- Tree-based Models and Visualization Techniques
- Decision Rules and Global Surrogate Models

Week 6 (Oct 6)

Topics: Black-box Model Interpretation, Project Discussion
Materials:
Recommended Readings:
- “Why Should I Trust You?” Explaining the Predictions of Any Classifier (Ribeiro et al., 2016, KDD)
- SHAP Book: A Unified Approach to Interpreting Model Predictions (Molnar, 2024)
Content:
- Partial Dependence Plots (PDP)
- Local Interpretable Model-agnostic Explanations (LIME)
- SHAP (SHapley Additive exPlanations)
- Project Ideas and Guidelines

Week 7 (Oct 14 - Tuesday Make-up Class)

Topics: Clustering and Dimensionality Reduction, Default Project Details
Materials:
Recommended Readings:
- Wolfram Clustering Tutorial - Required
- A Tutorial on Principal Component Analysis (Shlens, 2014)
- Scikit-learn PCA User Guide
- Mapping the walk: A scalable computer vision approach for generating sidewalk network datasets from aerial imagery (Hosseini et al., 2023)
Content:
- Introduction to Unsupervised Learning
- K-means Clustering and DBSCAN
- The Manifold Hypothesis and Intrinsic Dimensionality
- Principal Component Analysis (PCA)
- Eigenvectors, Eigenvalues, and Covariance Matrices
- Singular Value Decomposition (SVD)
- Local Linear Embedding (LLE)
- Default Project Overview and Ideas

Week 8 (Oct 20)

Topics: Dimensionality Reduction (continued)
Materials:
- Dimensionality Reduction (continued)
Recommended Readings:
- How to Use t-SNE Effectively (Wattenberg, Viégas, Johnson, 2016) - Required
- Visualizing Data using t-SNE (van der Maaten & Hinton, 2008)
- UMAP: Uniform Manifold Approximation and Projection (McInnes, Healy, Melville, 2018)
- Understanding UMAP (Coenen & Pearce)
Content:
- t-SNE: Theory and Pitfalls
- UMAP: Uniform Manifold Approximation and Projection
- Topomap: Topologically-Constrained Dimensionality Reduction
- Interactive Dimensionality Reduction Techniques

Week 9 (Oct 27)

Topics: Deep Learning Visualization Fundamentals
Materials:
- Deep Learning Visualization
Recommended Readings:
- Understanding Deep Learning (Prince, 2023) - Chapters 2-4
- TensorFlow Playground - Interactive neural network visualization
- CNN Explainer - Interactive CNN visualization
Content:
- Deep Learning Terminology and Foundations
- Linear Models and Loss Functions
- Shallow Neural Networks and Activation Functions
- Deep Neural Networks and Composition
- Interactive Visualization Tools

Week 10 (Nov 3)

Topics: Visualization for NLP and Large Language Models
Materials:
- NLP and LLM Visualization
- Lab: Materials to be posted
Recommended Readings:
- Speech and Language Processing
- Efficient estimation of word representations in vector space
- Attention is All You Need (Vaswani et al., 2017) - Foundational
- BertViz: A Tool for Visualizing Multi-Head Self-Attention (Vig, 2019)
- LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in RNNs (Strobelt et al., 2017)
- Language Models are Few-Shot Learners (Brown et al., 2020) - GPT-3 Paper
- Transformer Explainer - Interactive Transformer visualization
Content:
- NLP basics
- General Text Visualization
- Model agnostic explanation
- Examples of RNN Visualization
- Examples of LLM Visualization

Week 11 (Nov 10)

Topics: Topological Data Analysis
Materials:
- Topological Data Analysis
- Lab
Recommended Readings:
- An Introduction to Topological Data Analysis: Fundamental and Practical Aspects for Data Scientists (Chazal & Michel, 2021) - Required
- Computational Topology: An Introduction (Edelsbrunner & Harer, 2022)
- Topological Data Analysis for Machine Learning (Rieck, 2020 Tutorial)
Content:
- Introduction to Topology and Betti Numbers
- Persistence Diagrams and Homology
- Simplicial Complexes and Vietoris-Rips Construction
- The Mapper Algorithm
- Applications in Biology, Chemistry, and Machine Learning
- TDA Software: GUDHI, scikit-tda, Ripser, KeplerMapper

Assignments

Weekly Assignments (50% of grade)

Assignments will be posted as the semester progresses
Programming exercises will be given throughout the first half of the semester

Research Project (45% of grade)

Team formation - Week 3
Project Proposal (4-page writeup) - Week 5 - 10%
Project Updates (1-page writeup) - Week 8 - 10%
Final Project (8-page writeup + presentation) - Weeks 14-15 - 25%

Class Participation (5% of grade)

Quick Links

Discord: Join Course Discord
Brightspace: [Course materials and submissions]
Office Hours: TBD

Course Description

This course explores the intersection of visualization and machine learning, focusing on how visualization techniques can help understand, debug, and improve machine learning models. Students will learn to create visual analytics systems for model assessment, feature analysis, and result interpretation. Topics include visualization for model performance, feature importance, clustering, dimensionality reduction, deep learning architectures, and interpretable AI.

Prerequisites

Solid programming skills (Python and JavaScript)
Basic knowledge of machine learning concepts
Familiarity with web technologies (HTML, CSS) helpful but not required