Week 9: Topological Data Analysis

- Homework 4 (on dimensionality reduction) will be released next week. It will NOT be mandatory - complete it by April 19th for extra credit
- Homework 3 is being graded
**Project proposal due 4/4**- please submit the names of your project team here by the end of today! Only one person from each team needs to complete the form.- I will be holding
**office hours at 2pm Tuesday and Thursday**next week - reach out if you want to discuss your proposal and neither time works for you

- Data has shape, and that shape matters when we analyze it
- A way of extracting meaning from high dimensional data
- Slides adapted from Shaw Talebi’s TDA video trilogy.

- TDA is about looking at the shape of data
- All polygons can be broken down into the simplest shape - a triangle
- If we want to analyze the shape of our data, we can break it down into triangles - this is the idea behind persistent homology

- But most datasets have more than two dimensions, so 2D triangles would not capture them
- Instead we use simplexes - triangles generalized to any number of dimensions

- We can use collections of simplexes (simplicial complexes) to describe our data

- How do we compare these complex shapes?

- How do we compare these complex shapes?
- These shapes can be characterized by their holes - we can compare them by counting their holes

- The concept of “hole” can also be generalized to any dimension:

- The concept of “hole” can also be generalized to any dimension (in 1 dimension we count connected components):

- To perform persistent homology, we first draw n-dimensional circles around each point:

- We can then create 1-simplexes by connecting the data points whose circles overlap:

- Now we have two shapes: our initial 0-simplex (point cloud) and a shape made of 0- and 1-simplexes

- We can compare their homologies by counting the number of connected components:

- We repeat this step, increasing the radius of the circles
- We see 2-simplexes start to appear and the number of connected components decreases:

- We can do this for every choice of radius between 0 and the radius which yields only one connected component:

- This allows us to analyze which topological features of our data are significant based on how long they “persist” during the circle-growing process (longer persistence = more significant)

- Points close to y=x are noise, points far from y=x are significant

The original Mapper paper

Course notes for the 2021 EMAp summer course Topological Data Analysis with Persistent Homology

Rami Kraft’s Illustrations of Data Analysis Using the Mapper Algorithm and Persistent Homology

Shaw Talebi’s TDA video trilogy (this is a good starting point if you’re confused about anything)