scipy.cluster.hierarchy.leaders¶
-
scipy.cluster.hierarchy.
leaders
(Z, T)[source]¶ Return the root nodes in a hierarchical clustering.
Returns the root nodes in a hierarchical clustering corresponding to a cut defined by a flat cluster assignment vector
T
. See thefcluster
function for more information on the format ofT
.For each flat cluster \(j\) of the \(k\) flat clusters represented in the n-sized flat cluster assignment vector
T
, this function finds the lowest cluster node \(i\) in the linkage tree Z such that:- leaf descendants belong only to flat cluster j
(i.e.
T[p]==j
for all \(p\) in \(S(i)\) where \(S(i)\) is the set of leaf ids of descendant leaf nodes with cluster node \(i\)) - there does not exist a leaf that is not a descendant with
\(i\) that also belongs to cluster \(j\)
(i.e.
T[q]!=j
for all \(q\) not in \(S(i)\)). If this condition is violated,T
is not a valid cluster assignment vector, and an exception will be thrown.
Parameters: - Z : ndarray
The hierarchical clustering encoded as a matrix. See
linkage
for more information.- T : ndarray
The flat cluster assignment vector.
Returns: - L : ndarray
The leader linkage node id’s stored as a k-element 1-D array where
k
is the number of flat clusters found inT
.L[j]=i
is the linkage cluster node id that is the leader of flat cluster with id M[j]. Ifi < n
,i
corresponds to an original observation, otherwise it corresponds to a non-singleton cluster.- M : ndarray
The leader linkage node id’s stored as a k-element 1-D array where
k
is the number of flat clusters found inT
. This allows the set of flat cluster ids to be any arbitrary set ofk
integers.For example: if
L[3]=2
andM[3]=8
, the flat cluster with id 8’s leader is linkage node 2.
See also
fcluster
- for the creation of flat cluster assignments.
Examples
>>> from scipy.cluster.hierarchy import ward, fcluster, leaders >>> from scipy.spatial.distance import pdist
Given a linkage matrix
Z
- obtained after apply a clustering method to a datasetX
- and a flat cluster assignment arrayT
:>>> X = [[0, 0], [0, 1], [1, 0], ... [0, 4], [0, 3], [1, 4], ... [4, 0], [3, 0], [4, 1], ... [4, 4], [3, 4], [4, 3]]
>>> Z = ward(pdist(X)) >>> Z array([[ 0. , 1. , 1. , 2. ], [ 3. , 4. , 1. , 2. ], [ 6. , 7. , 1. , 2. ], [ 9. , 10. , 1. , 2. ], [ 2. , 12. , 1.29099445, 3. ], [ 5. , 13. , 1.29099445, 3. ], [ 8. , 14. , 1.29099445, 3. ], [11. , 15. , 1.29099445, 3. ], [16. , 17. , 5.77350269, 6. ], [18. , 19. , 5.77350269, 6. ], [20. , 21. , 8.16496581, 12. ]])
>>> T = fcluster(Z, 3, criterion='distance') >>> T array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4], dtype=int32)
scipy.cluster.hierarchy.leaders
returns the indexes of the nodes in the dendrogram that are the leaders of each flat cluster:>>> L, M = leaders(Z, T) >>> L array([16, 17, 18, 19], dtype=int32)
(remember that indexes 0-11 point to the 12 data points in
X
whereas indexes 12-22 point to the 11 rows ofZ
)scipy.cluster.hierarchy.leaders
also returns the indexes of the flat clusters inT
:>>> M array([1, 2, 3, 4], dtype=int32)
- leaf descendants belong only to flat cluster j
(i.e.