Spring 2024
Topology studies the shape of mathematical objects
Unlike Geometry, it is not concerned with sizes, angles, nor coordinates
It is concerned with the connectivity (or lack of) between different “parts” of the object
Two objects are topologically equivalent if we can transform one into another with continuous transformations (without tearing the object)
Since topology analyzes the shape while discarding possibly unnecessary information, it is frequently used for analyzing high-dimensional objects
Topological data analysis (TDA) uses techniques from topology to analyze datasets
To do this, it is necessary to construct topological representations of the dataset’s points
Slides based on Raphaël Tinarrage’s From Algebraic Topology to Data Analysis
Slides based on Raphaël Tinarrage’s From Algebraic Topology to Data Analysis
Slides based on Raphaël Tinarrage’s From Algebraic Topology to Data Analysis
Slides based on Raphaël Tinarrage’s From Algebraic Topology to Data Analysis
The \(d^{th}\) Betti number counts the number of \(d\)-dimensional holes. It can be used to distinguish between spaces.
\(\beta_0(X)\) Connected components in X
\(\beta_1(X)\) Tunnels or holes in X
\(\beta_2(X)\) Voids in X
Slides based on Bastian Rieck’s Topological Data Analysis for Machine Learning
What is a peak?
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
What is a peak?
First idea: Using local maximum
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
What is a peak?
Second idea: flooding
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
What is a peak?
Second idea: flooding
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
What is a peak?
Second idea: flooding
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
What is a peak?
Second idea: flooding
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
What is a peak?
Second idea: flooding
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
What is a peak?
Second idea: flooding
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
What is a peak?
Second idea: flooding
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
The island \(I\) appears at sea level \(b\) (its birth time)…
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
The island \(I\) appears at sea level \(b\) (its birth time)…
and disapears at seas level \(d\) (its death time) at local maximum \(x\).
The point \(x\) is a peak if the persistence\(:= d-b\) of the island \(I\) is larger than \(91\)m (=\(300\)ft).
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
The persistence diagram (PD) of the elevation function is the collection of the points \((b,d)\), where \((b,d)\) corresponds to the birth/death of an island.
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
The persistence diagram (PD) of the elevation function is the collection of the points \((b,d)\), where \((b,d)\) corresponds to the birth/death of an island.
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
If we have more than one persistence diagram, how do we measure the distance between them?
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
We place both in the same diagram
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
We place both in the same diagram
Get the optimal pair matching between the points (including the diagonal)
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
We place both in the same diagram
Get the optimal pair matching between the points (including the diagonal)
The bottleneck distance between them will be the largest pair distance
Slides based on Vincent Divol’s CDS Lunch Seminar presentation
Definition
We call a non-empty family of sets \(K\) with a collection of non-empty subsets \(S\) an abstract simplicial complex if:
Terminology
The elements of a simplicial complex \(K\) are called simplices. A \(k\)-simplex consists of \(k+1\) verticies.
Slides based on Bastian Rieck’s Topological Data Analysis for Machine Learning
Example
Simplicial complexes can be decomposed into their skeletons, which only contain simplices of a certain dimension.
Slides based on Bastian Rieck’s Topological Data Analysis for Machine Learning
Example
Simplicial complexes can be decomposed into their skeletons, which only contain simplices of a certain dimension.
Slides based on Bastian Rieck’s Topological Data Analysis for Machine Learning
Example
Simplicial complexes can be decomposed into their skeletons, which only contain simplices of a certain dimension.
Slides based on Bastian Rieck’s Topological Data Analysis for Machine Learning
Example
Simplicial complexes can be decomposed into their skeletons, which only contain simplices of a certain dimension.
Slides based on Bastian Rieck’s Topological Data Analysis for Machine Learning
Non-example
This is not a simplicial complex because some higher-dimensional simplices do not intersect in a lower dimensional one!
Slides based on Bastian Rieck’s Topological Data Analysis for Machine Learning
Example
Notice that \(K\) does not contain the 2-simplex \(\{a,b,c\}\)
Slides based on Bastian Rieck’s Topological Data Analysis for Machine Learning
Strategy 1: Triangulation
Constructing the data’s simplicial complex
Definition
Given a set of points \(\mathcal{X} = \{x_1,\dots,x_n\}\) and a metric \(dist\) (such as the Euclidean distance), pick a threshold \(\epsilon\) and build the Vietoris-Rips complex \(\mathcal{V}_\epsilon\) defined as:
\[\mathcal{V}_\epsilon := \{\sigma \subseteq \mathcal{X} | \forall u,v \in \sigma : dist(u,v) \leq \epsilon\}\]
Equivalently, \(\mathcal{V}_\epsilon\) contains all simplices whose diameter is less than or equal to \(\epsilon\).
Slides based on Bastian Rieck’s Topological Data Analysis for Machine Learning
Example
Draw Euclidean balls (circles) of diameter \(\epsilon\) and create a \(k\)-simplex \(\sigma\) for each subset of \(k+1\) points that intersect pairwise.
Slides based on Bastian Rieck’s Topological Data Analysis for Machine Learning
This construction dates back to a 1927 article by Leopold Vietoris\(^1\)
A 2010 paper by Afra Zomorodian\(^2\) describes several construction algorithms
The basic idea is to build higher-dimensional simplices inductively from lower-dimensional ones
In the worst case, the Vietoris-Rips complex will contain all \(2^n\) subsets of its underluing point clous \(\mathcal{X}\)!
\(^1\) Vietoris, Leopold. “Über den höheren Zusammenhang kompakter Räume und eine Klasse von zusammenhangstreuen Abbildungen.” Mathematische Annalen 97.1 (1927): 454-472.
\(^2\) Zomorodian, Afra. “Fast construction of the Vietoris-Rips complex.” Computers & Graphics 34.3 (2010): 263-271.
Slides based on Bastian Rieck’s Topological Data Analysis for Machine Learning
Example
Slides based on Bastian Rieck’s Topological Data Analysis for Machine Learning
How to pick \(\epsilon\)?
There might not be one ‘correct’ value for \(\epsilon\).
Computationally inefficient
Slides based on Bastian Rieck’s Topological Data Analysis for Machine Learning
Intuition: Go through all scales and track the topological features
Slides based on Bastian Rieck’s Topological Data Analysis for Machine Learning
Intuition: Go through all scales and track the topological features
Slides based on Bastian Rieck’s Topological Data Analysis for Machine Learning
Intuition: Go through all scales and track the topological features
Slides based on Bastian Rieck’s Topological Data Analysis for Machine Learning
Intuition: Go through all scales and track the topological features
Slides based on Bastian Rieck’s Topological Data Analysis for Machine Learning
Intuition: Go through all scales and track the topological features
Slides based on Bastian Rieck’s Topological Data Analysis for Machine Learning
Strategy 2: Data skeleton
The Mapper Algortihm
We start in “Math World”, where
We will not need either of these assumptions once we are in the “Data World”.
Even more importantly, data in the real world is never like this.
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
We get “easy” understanding of the localizations of quantities of interest.
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
We get “easy” understanding of the localizations of quantities of interest.
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
We get “easy” understanding of the localizations of quantities of interest.
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
We get “easy” understanding of the localizations of quantities of interest.
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
We get “easy” understanding of the localizations of quantities of interest.
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
We get “easy” understanding of the localizations of quantities of interest.
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
We get “easy” understanding of the localizations of quantities of interest.
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
We get “easy” understanding of the localizations of quantities of interest.
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
We need to adjust the “Math World” view to bring the algorithm into the “Data World”
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
Slides based on Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
For more information about the persistence diagram, see Vincent Divol’s CDS Lunch Seminar presentation “Quantifying the topology of datasets using Topological Data Analysis”
For more information about the mapper algorithm, see Anthony Bak’s Stanford Seminar - Topological Data Analysis presentation
To learn more about TDA for Machine Learning, see Bastian Rieck’s Topological Data Analysis for Machine Learning tutorial (around 4 hours total)
For a course in TDA, see Raphaël Tinarrage’s Topological Data Analysis with Persistent Homology summer course
Some books of interest:
Edelsbrunner, Herbert, and John L. Harer. Computational topology: an introduction. American Mathematical Society, 2022.
Dey, Tamal Krishna, and Yusu Wang. Computational topology for data analysis. Cambridge University Press, 2022.
Carlsson, Gunnar, and Mikael Vejdemo-Johansson. Topological data analysis with applications. Cambridge University Press, 2021.
Introductory paper:
TDA in Python:
For an open-source library and software collection for topological data analysis and visualization: Topology ToolKit
For more examples of TDA aplications: