scipy.cluster.vq.kmeans2¶

scipy.cluster.vq.
kmeans2
(data, k, iter=10, thresh=1e05, minit='random', missing='warn', check_finite=True)[source]¶ Classify a set of observations into k clusters using the kmeans algorithm.
The algorithm attempts to minimize the Euclidian distance between observations and centroids. Several initialization methods are included.
 Parameters
 datandarray
A ‘M’ by ‘N’ array of ‘M’ observations in ‘N’ dimensions or a length ‘M’ array of ‘M’ onedimensional observations.
 kint or ndarray
The number of clusters to form as well as the number of centroids to generate. If minit initialization string is ‘matrix’, or if a ndarray is given instead, it is interpreted as initial cluster to use instead.
 iterint, optional
Number of iterations of the kmeans algorithm to run. Note that this differs in meaning from the iters parameter to the kmeans function.
 threshfloat, optional
(not used yet)
 minitstr, optional
Method for initialization. Available methods are ‘random’, ‘points’, ‘++’ and ‘matrix’:
‘random’: generate k centroids from a Gaussian with mean and variance estimated from the data.
‘points’: choose k observations (rows) at random from data for the initial centroids.
‘++’: choose k observations accordingly to the kmeans++ method (careful seeding)
‘matrix’: interpret the k parameter as a k by M (or length k array for onedimensional data) array of initial centroids.
 missingstr, optional
Method to deal with empty clusters. Available methods are ‘warn’ and ‘raise’:
‘warn’: give a warning and continue.
‘raise’: raise an ClusterError and terminate the algorithm.
 check_finitebool, optional
Whether to check that the input matrices contain only finite numbers. Disabling may give a performance gain, but may result in problems (crashes, nontermination) if the inputs do contain infinities or NaNs. Default: True
 Returns
 centroidndarray
A ‘k’ by ‘N’ array of centroids found at the last iteration of kmeans.
 labelndarray
label[i] is the code or index of the centroid the i’th observation is closest to.
References
 1
D. Arthur and S. Vassilvitskii, “kmeans++: the advantages of careful seeding”, Proceedings of the Eighteenth Annual ACMSIAM Symposium on Discrete Algorithms, 2007.