Spring 2024
Etienne Bernard: “… the goal of clustering is to separate a set of examples into groups called clusters”
# Code source: Gaël Varoquaux
# Modified for documentation by Jaques Grobler
# License: BSD 3 clause
#
import matplotlib.pyplot as plt
from sklearn import datasets
iris = datasets.load_iris()
_, ax = plt.subplots()
scatter = ax.scatter(iris.data[:, 2], iris.data[:, 1])
ax.set(xlabel=iris.feature_names[2], ylabel=iris.feature_names[1])
_ = ax.legend(
scatter.legend_elements()[0], iris.target_names, loc="lower right", title="Classes"
)
Required https://www.wolfram.com/language/introduction-machine-learning/clustering/ link
https://en.wikipedia.org/wiki/Cluster_analysis
https://en.wikipedia.org/wiki/K-means_clustering
https://en.wikipedia.org/wiki/DBSCAN
Input data may have thousands or millions of dimensions!
Dimensionality Reduction represents data with fewer dimensions
What operations did we perform? What’s the intrinsic dimensionality?
Here the underlying manifold is non-linear
array([[ 0., 0., 5., 13., 9., 1., 0., 0.],
[ 0., 0., 13., 15., 10., 15., 5., 0.],
[ 0., 3., 15., 2., 0., 11., 8., 0.],
[ 0., 4., 12., 0., 0., 8., 8., 0.],
[ 0., 5., 8., 0., 0., 9., 8., 0.],
[ 0., 4., 11., 0., 1., 12., 7., 0.],
[ 0., 2., 14., 5., 10., 12., 0., 0.],
[ 0., 0., 6., 13., 10., 0., 0., 0.]])
array([[ 0., 0., 0., 12., 13., 5., 0., 0.],
[ 0., 0., 0., 11., 16., 9., 0., 0.],
[ 0., 0., 3., 15., 16., 6., 0., 0.],
[ 0., 7., 15., 16., 16., 2., 0., 0.],
[ 0., 0., 1., 16., 16., 3., 0., 0.],
[ 0., 0., 1., 16., 16., 6., 0., 0.],
[ 0., 0., 1., 16., 16., 6., 0., 0.],
[ 0., 0., 0., 11., 16., 10., 0., 0.]])
Computing Random projection embedding...
Computing Truncated SVD embedding...
Computing Linear Discriminant Analysis embedding...
Computing Isomap embedding...
Computing Standard LLE embedding...
Computing Modified LLE embedding...
Computing Hessian LLE embedding...
Computing LTSA LLE embedding...
Computing MDS embedding...
Computing Random Trees embedding...
Computing Spectral embedding...
Computing t-SNE embedding...
Computing NCA embedding...
Given a \(d \times d\) matrix \(A\), a pair \((\lambda, u)\) that satisfies
\(A u = \lambda u\)
is called eigenvalue \(\lambda\) and corresponding eigenvector \(u\) of \(A\).
https://www.science.org/doi/10.1126/science.290.5500.2323
HERE is an excellent talk by t-SNE creator: video link
https://distill.pub/2016/misread-tsne/
https://pair-code.github.io/understanding-umap/