scipy.cluster.hierarchy.leaders#

scipy.cluster.hierarchy.leaders(Z, T)[source]#

Return the root nodes in a hierarchical clustering.

Returns the root nodes in a hierarchical clustering corresponding to a cut defined by a flat cluster assignment vector T. See the fcluster function for more information on the format of T.

For each flat cluster \(j\) of the \(k\) flat clusters represented in the n-sized flat cluster assignment vector T, this function finds the lowest cluster node \(i\) in the linkage tree Z, such that:

  • leaf descendants belong only to flat cluster j (i.e., T[p]==j for all \(p\) in \(S(i)\), where \(S(i)\) is the set of leaf ids of descendant leaf nodes with cluster node \(i\))

  • there does not exist a leaf that is not a descendant with \(i\) that also belongs to cluster \(j\) (i.e., T[q]!=j for all \(q\) not in \(S(i)\)). If this condition is violated, T is not a valid cluster assignment vector, and an exception will be thrown.

Parameters:
Zndarray

The hierarchical clustering encoded as a matrix. See linkage for more information.

Tndarray

The flat cluster assignment vector.

Returns:
Lndarray

The leader linkage node id’s stored as a k-element 1-D array, where k is the number of flat clusters found in T.

L[j]=i is the linkage cluster node id that is the leader of flat cluster with id M[j]. If i < n, i corresponds to an original observation, otherwise it corresponds to a non-singleton cluster.

Mndarray

The leader linkage node id’s stored as a k-element 1-D array, where k is the number of flat clusters found in T. This allows the set of flat cluster ids to be any arbitrary set of k integers.

For example: if L[3]=2 and M[3]=8, the flat cluster with id 8’s leader is linkage node 2.

See also

fcluster

for the creation of flat cluster assignments.

Examples

>>> from scipy.cluster.hierarchy import ward, fcluster, leaders
>>> from scipy.spatial.distance import pdist

Given a linkage matrix Z - obtained after apply a clustering method to a dataset X - and a flat cluster assignment array T:

>>> X = [[0, 0], [0, 1], [1, 0],
...      [0, 4], [0, 3], [1, 4],
...      [4, 0], [3, 0], [4, 1],
...      [4, 4], [3, 4], [4, 3]]
>>> Z = ward(pdist(X))
>>> Z
array([[ 0.        ,  1.        ,  1.        ,  2.        ],
       [ 3.        ,  4.        ,  1.        ,  2.        ],
       [ 6.        ,  7.        ,  1.        ,  2.        ],
       [ 9.        , 10.        ,  1.        ,  2.        ],
       [ 2.        , 12.        ,  1.29099445,  3.        ],
       [ 5.        , 13.        ,  1.29099445,  3.        ],
       [ 8.        , 14.        ,  1.29099445,  3.        ],
       [11.        , 15.        ,  1.29099445,  3.        ],
       [16.        , 17.        ,  5.77350269,  6.        ],
       [18.        , 19.        ,  5.77350269,  6.        ],
       [20.        , 21.        ,  8.16496581, 12.        ]])
>>> T = fcluster(Z, 3, criterion='distance')
>>> T
array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4], dtype=int32)

scipy.cluster.hierarchy.leaders returns the indices of the nodes in the dendrogram that are the leaders of each flat cluster:

>>> L, M = leaders(Z, T)
>>> L
array([16, 17, 18, 19], dtype=int32)

(remember that indices 0-11 point to the 12 data points in X, whereas indices 12-22 point to the 11 rows of Z)

scipy.cluster.hierarchy.leaders also returns the indices of the flat clusters in T:

>>> M
array([1, 2, 3, 4], dtype=int32)