scipy.cluster.vq.kmeans2¶

scipy.cluster.vq.
kmeans2
(data, k, iter=10, thresh=1e05, minit='random', missing='warn', check_finite=True)[source]¶ Classify a set of observations into k clusters using the kmeans algorithm.
The algorithm attempts to minimize the Euclidian distance between observations and centroids. Several initialization methods are included.
Parameters:  data : ndarray
A ‘M’ by ‘N’ array of ‘M’ observations in ‘N’ dimensions or a length ‘M’ array of ‘M’ onedimensional observations.
 k : int or ndarray
The number of clusters to form as well as the number of centroids to generate. If minit initialization string is ‘matrix’, or if a ndarray is given instead, it is interpreted as initial cluster to use instead.
 iter : int, optional
Number of iterations of the kmeans algorithm to run. Note that this differs in meaning from the iters parameter to the kmeans function.
 thresh : float, optional
(not used yet)
 minit : str, optional
Method for initialization. Available methods are ‘random’, ‘points’, ‘++’ and ‘matrix’:
‘random’: generate k centroids from a Gaussian with mean and variance estimated from the data.
‘points’: choose k observations (rows) at random from data for the initial centroids.
‘++’: choose k observations accordingly to the kmeans++ method (careful seeding)
‘matrix’: interpret the k parameter as a k by M (or length k array for onedimensional data) array of initial centroids.
 missing : str, optional
Method to deal with empty clusters. Available methods are ‘warn’ and ‘raise’:
‘warn’: give a warning and continue.
‘raise’: raise an ClusterError and terminate the algorithm.
 check_finite : bool, optional
Whether to check that the input matrices contain only finite numbers. Disabling may give a performance gain, but may result in problems (crashes, nontermination) if the inputs do contain infinities or NaNs. Default: True
Returns:  centroid : ndarray
A ‘k’ by ‘N’ array of centroids found at the last iteration of kmeans.
 label : ndarray
label[i] is the code or index of the centroid the i’th observation is closest to.
References
[1] D. Arthur and S. Vassilvitskii, “kmeans++: the advantages of careful seeding”, Proceedings of the Eighteenth Annual ACMSIAM Symposium on Discrete Algorithms, 2007.