Lab 7: Clustering and Dimensionality Reduction

Lab 7: Clustering and Dimensionality Reduction

Slides

The slides I showed this week can be found here.

Miscellaneous Notes

  • Homework 3 has been posted and is due next Friday (3/15) at 11:59pm
  • Homework 2 is being graded, grades will likely be posted next week

Topics Covered

  • We discussed various dimensionality reduction techniques, which are used to project high-dimensional data into a low-dimensional space while preserving the clusters from the high-dimensional space. These included:
    • Principal Component Analysis (PCA)
    • Multidimensional Scaling (MDS)
    • Sparse Random Projection
    • Locally Linear Embedding
    • t-Distributed Stochastic Neighbor Embedding (t-SNE)
    • Uniform Manifold Approximation and Projection (UMAP)
  • We also applied each of these methods to the MNIST dataset of hand-drawn digits, projecting the 784-dimensional MNIST vectors into both 2 and 3 dimensions and visualizing the results. The code we used to create these visualizations can be found here.
  • We discussed common pitfalls that can lead to misreadings of t-SNE plots

Further Reading

  • This article contains an in-depth explanation of the interactive “MNIST Cube” visualization we discussed, as well as some animations of other clustering techniques
  • This article breaks down the examples of common t-SNE plot pitfalls we looked at in more detail.
  • This article compares t-SNE and UMAP with interactive visualizations.