This document summarizes a presentation on cluster stability estimation and determining the optimal number of clusters in a dataset. The presentation proposes a method that draws random samples from the dataset and compares the partitions obtained from each sample to estimate cluster stability. It quantifies the consistency between partitions using minimal spanning trees and the Friedman-Rafsky test statistic. Experiments on synthetic and real-world datasets show that the method can accurately determine the true number of clusters by finding the partition that maximizes cluster stability.
Related topics: