Member-only story
How to use sklearn’s semi-supervised LabelPropagation function
Label propagation is a semi-supervised machine learning algorithm that assigns labels to previously unlabelled data points. To use this algorithm in machine learning, a small subset of examples have labels, or classifications. These labels are propagated to the unlabelled data points during the modelling, fitting and predicting process of the algorithm.
Label propagation is a fast algorithm for finding communities in a graph. It detects these communities using network structure alone as its guide, and doesn’t require a predefined objective function or prior information about the communities. Label propagation works by propagating labels throughout the network and forming communities based on this process of label propagation.
Labels that are close together will generally be given the same label. The intuition behind the algorithm is that a single label can become dominant in a densely connected group of nodes, but will have trouble crossing a sparsely connected region. Labels will become trapped inside a densely connected group of nodes, and those nodes that end up with the same label when the algorithm is finished can be considered part of the same community. This algorithm utilises graph theory, which can be seen below:-