Member-only story
Interview question: How do you define the number of clusters in a clustering algorithm?
Clusters are typically defined as collections or groups of items with similar or different characteristics. In machine learning, Examples are often grouped as a first step to understand the dataset. Grouping of unlabeled samples is called clustering, which is unsupervised learning in machine learning.
One question that comes up is how to determine the correct number of clusters in a dataset. Determining the number of clusters in a dataset is a frequent problem in data clustering.
For a certain class of clustering algorithms, such as k-means, k-medoids, and expectation-maximisation algorithm, there is a parameter commonly referred to as k that specifies the number of clusters to detect. Others algorithms, such as DBSCAN and OPTICS, do not required specification of the k parameter, while hierarchical clustering avoids it altogether.
There are a number of ways to determine the correct number of clusters, or k, and this blog post will discuss them, being:-
- Elbow method
- X-means clustering
- Information criterion approach
- Information theoretic approach
- Silhouette method
- Cross-validation