Mini-batch K-means Clustering
 
Variant of the KMeans algorithm which uses mini-batches to reduce the computation time
| Configuration: | 
n_clusters The number of clusters to form as well as the number of
centroids to generate.max_iter Maximum number of iterations over the complete dataset before
stopping independently of any early stopping criterion heuristics.max_no_improvement Control early stopping based on the consecutive number of mini
batches that does not yield an improvement on the smoothed inertia. To disable convergence detection based on inertia, set
max_no_improvement to None.batch_size Size of the mini batches.init_size Number of samples to randomly sample for speeding up the
initialization (sometimes at the expense of accuracy): the
only algorithm is initialized by running a batch KMeans on a
random subset of the data. This needs to be larger than n_clusters.n_init Number of random initializations that are tried.
In contrast to KMeans, the algorithm is only run once, using the
best of the n_initinitializations as measured by inertia.init Method for initialization, defaults to ‘k-means++’: ‘k-means++’ : selects initial cluster centers for k-mean
clustering in a smart way to speed up convergence. See section
Notes in k_init for more details. ‘random’: choose k observations (rows) at random from data for
the initial centroids. If an ndarray is passed, it should be of shape (n_clusters, n_features)
and gives the initial centers.compute_labels Compute label assignment and inertia for the complete dataset
once the minibatch optimization has converged in fit.reassignment_ratio Control the fraction of the maximum number of counts for a
center to be reassigned. A higher value means that low count
centers are more easily reassigned, which means that the
model will take longer to converge, but should converge in a
better clustering.tol Control early stopping based on the relative center changes as
measured by a smoothed, variance-normalized of the mean center
squared position changes. This early stopping heuristics is
closer to the one used for the batch variant of the algorithms
but induces a slight computational and memory overhead over the
inertia heuristic. To disable convergence detection based on normalized center
change, set tol to 0.0 (default).random_state Determines random number generation for centroid initialization and
random reassignment. Use an int to make the randomness deterministic.
See random_state. | 
|---|
| Attributes: | 
cluster_centers_ Coordinates of cluster centerslabels_ Labels of each point (if compute_labels is set to True).inertia_ The value of the inertia criterion associated with the chosen
partition (if compute_labels is set to True). The inertia is
defined as the sum of square distances of samples to their nearest
neighbor. | 
|---|
| Inputs: |  | 
|---|
| Outputs: | 
model : modelModel | 
|---|
- Output ports:
- 
- Configuration:
- 
- n_clusters
- The number of clusters to form as well as the number of
centroids to generate.
- max_iter
- Maximum number of iterations over the complete dataset before
stopping independently of any early stopping criterion heuristics.
- max_no_improvement
- Control early stopping based on the consecutive number of mini
batches that does not yield an improvement on the smoothed inertia. - To disable convergence detection based on inertia, set
max_no_improvement to None. 
- batch_size
- Size of the mini batches.
- init_size
- Number of samples to randomly sample for speeding up the
initialization (sometimes at the expense of accuracy): the
only algorithm is initialized by running a batch KMeans on a
random subset of the data. This needs to be larger than n_clusters.
- n_init
- Number of random initializations that are tried.
In contrast to KMeans, the algorithm is only run once, using the
best of the n_initinitializations as measured by inertia.
- init
- Method for initialization, defaults to ‘k-means++’: - ‘k-means++’ : selects initial cluster centers for k-mean
clustering in a smart way to speed up convergence. See section
Notes in k_init for more details. - ‘random’: choose k observations (rows) at random from data for
the initial centroids. - If an ndarray is passed, it should be of shape (n_clusters, n_features)
and gives the initial centers. 
- compute_labels
- Compute label assignment and inertia for the complete dataset
once the minibatch optimization has converged in fit.
- reassignment_ratio
- Control the fraction of the maximum number of counts for a
center to be reassigned. A higher value means that low count
centers are more easily reassigned, which means that the
model will take longer to converge, but should converge in a
better clustering.
- tol
- Control early stopping based on the relative center changes as
measured by a smoothed, variance-normalized of the mean center
squared position changes. This early stopping heuristics is
closer to the one used for the batch variant of the algorithms
but induces a slight computational and memory overhead over the
inertia heuristic. - To disable convergence detection based on normalized center
change, set tol to 0.0 (default). 
- random_state
- Determines random number generation for centroid initialization and
random reassignment. Use an int to make the randomness deterministic.
See random_state.
 
Some of the docstrings for this module have been automatically
extracted from the scikit-learn library
and are covered by their respective licenses.
- 
class node_clustering.MiniBatchKMeansClustering[source]