scipy.cluster.hierarchy.maxdists¶
-
scipy.cluster.hierarchy.
maxdists
(Z)[source]¶ Return the maximum distance between any non-singleton cluster.
- Parameters
- Zndarray
The hierarchical clustering encoded as a matrix. See
linkage
for more information.
- Returns
- maxdistsndarray
A
(n-1)
sized numpy array of doubles;MD[i]
represents the maximum distance between any cluster (including singletons) below and including the node with index i. More specifically,MD[i] = Z[Q(i)-n, 2].max()
whereQ(i)
is the set of all node indices below and including node i.
See also
linkage
for a description of what a linkage matrix is.
is_monotonic
for testing for monotonicity of a linkage matrix.
Examples
>>> from scipy.cluster.hierarchy import median, maxdists >>> from scipy.spatial.distance import pdist
Given a linkage matrix
Z
,scipy.cluster.hierarchy.maxdists
computes for each new cluster generated (i.e. for each row of the linkage matrix) what is the maximum distance between any two child clusters.Due to the nature of hierarchical clustering, in many cases this is going to be just the distance between the two child clusters that were merged to form the current one - that is, Z[:,2].
However, for non-monotonic cluster assignments such as
scipy.cluster.hierarchy.median
clustering this is not always the case: There may be cluster formations were the distance between the two clusters merged is smaller than the distance between their children.We can see this in an example:
>>> X = [[0, 0], [0, 1], [1, 0], ... [0, 4], [0, 3], [1, 4], ... [4, 0], [3, 0], [4, 1], ... [4, 4], [3, 4], [4, 3]]
>>> Z = median(pdist(X)) >>> Z array([[ 0. , 1. , 1. , 2. ], [ 3. , 4. , 1. , 2. ], [ 9. , 10. , 1. , 2. ], [ 6. , 7. , 1. , 2. ], [ 2. , 12. , 1.11803399, 3. ], [ 5. , 13. , 1.11803399, 3. ], [ 8. , 15. , 1.11803399, 3. ], [11. , 14. , 1.11803399, 3. ], [18. , 19. , 3. , 6. ], [16. , 17. , 3.5 , 6. ], [20. , 21. , 3.25 , 12. ]]) >>> maxdists(Z) array([1. , 1. , 1. , 1. , 1.11803399, 1.11803399, 1.11803399, 1.11803399, 3. , 3.5 , 3.5 ])
Note that while the distance between the two clusters merged when creating the last cluster is 3.25, there are two children (clusters 16 and 17) whose distance is larger (3.5). Thus,
scipy.cluster.hierarchy.maxdists
returns 3.5 in this case.