Mini-batch K-means Clustering¶
Some of the docstrings for this module have been automatically extracted from the scikit-learn library and are covered by their respective licenses.
- 
class node_clustering.MiniBatchKMeansClustering[source]¶
- Variant of the KMeans algorithm which uses mini-batches to reduce the computation time - Configuration: - n_clusters - The number of clusters to form as well as the number of centroids to generate. 
- max_iter - Maximum number of iterations over the complete dataset before stopping independently of any early stopping criterion heuristics. 
- max_no_improvement - Control early stopping based on the consecutive number of mini batches that does not yield an improvement on the smoothed inertia. - To disable convergence detection based on inertia, set max_no_improvement to None. 
- batch_size - Size of the mini batches. 
- init_size - Number of samples to randomly sample for speeding up the initialization (sometimes at the expense of accuracy): the only algorithm is initialized by running a batch KMeans on a random subset of the data. This needs to be larger than n_clusters. 
- n_init - Number of random initializations that are tried. In contrast to KMeans, the algorithm is only run once, using the best of the - n_initinitializations as measured by inertia.
- init - Method for initialization, defaults to ‘k-means++’: - ‘k-means++’ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details. - ‘random’: choose k observations (rows) at random from data for the initial centroids. - If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers. 
- compute_labels - Compute label assignment and inertia for the complete dataset once the minibatch optimization has converged in fit. 
- reassignment_ratio - Control the fraction of the maximum number of counts for a center to be reassigned. A higher value means that low count centers are more easily reassigned, which means that the model will take longer to converge, but should converge in a better clustering. 
- tol - Control early stopping based on the relative center changes as measured by a smoothed, variance-normalized of the mean center squared position changes. This early stopping heuristics is closer to the one used for the batch variant of the algorithms but induces a slight computational and memory overhead over the inertia heuristic. - To disable convergence detection based on normalized center change, set tol to 0.0 (default). 
- random_state - If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. 
 - Attributes: - cluster_centers_ - Coordinates of cluster centers 
- labels_ - Labels of each point (if compute_labels is set to True). 
- inertia_ - The value of the inertia criterion associated with the chosen partition (if compute_labels is set to True). The inertia is defined as the sum of square distances of samples to their nearest neighbor. 
 - Inputs: - Outputs: - model : model
- Model