scipy.cluster.hierarchy.is_valid_linkage#

scipy.cluster.hierarchy.is_valid_linkage(Z, warning=False, throw=False, name=None)[source]#

Check the validity of a linkage matrix.

A linkage matrix is valid if it is a 2-D array (type double) with \(n\) rows and 4 columns. The first two columns must contain indices between 0 and \(2n-1\). For a given row i, the following two expressions have to hold:

\[0 \leq \mathtt{Z[i,0]} \leq i+n-1 0 \leq Z[i,1] \leq i+n-1\]

I.e., a cluster cannot join another cluster unless the cluster being joined has been generated.

Parameters
Zarray_like

Linkage matrix.

warningbool, optional

When True, issues a Python warning if the linkage matrix passed is invalid.

throwbool, optional

When True, throws a Python exception if the linkage matrix passed is invalid.

namestr, optional

This string refers to the variable name of the invalid linkage matrix.

Returns
bbool

True if the inconsistency matrix is valid.

See also

linkage

for a description of what a linkage matrix is.

Examples

>>> from scipy.cluster.hierarchy import ward, is_valid_linkage
>>> from scipy.spatial.distance import pdist

All linkage matrices generated by the clustering methods in this module will be valid (i.e., they will have the appropriate dimensions and the two required expressions will hold for all the rows).

We can check this using scipy.cluster.hierarchy.is_valid_linkage:

>>> X = [[0, 0], [0, 1], [1, 0],
...      [0, 4], [0, 3], [1, 4],
...      [4, 0], [3, 0], [4, 1],
...      [4, 4], [3, 4], [4, 3]]
>>> Z = ward(pdist(X))
>>> Z
array([[ 0.        ,  1.        ,  1.        ,  2.        ],
       [ 3.        ,  4.        ,  1.        ,  2.        ],
       [ 6.        ,  7.        ,  1.        ,  2.        ],
       [ 9.        , 10.        ,  1.        ,  2.        ],
       [ 2.        , 12.        ,  1.29099445,  3.        ],
       [ 5.        , 13.        ,  1.29099445,  3.        ],
       [ 8.        , 14.        ,  1.29099445,  3.        ],
       [11.        , 15.        ,  1.29099445,  3.        ],
       [16.        , 17.        ,  5.77350269,  6.        ],
       [18.        , 19.        ,  5.77350269,  6.        ],
       [20.        , 21.        ,  8.16496581, 12.        ]])
>>> is_valid_linkage(Z)
True

However, if we create a linkage matrix in a wrong way - or if we modify a valid one in a way that any of the required expressions don’t hold anymore, then the check will fail:

>>> Z[3][1] = 20    # the cluster number 20 is not defined at this point
>>> is_valid_linkage(Z)
False