scipy.cluster.hierarchy.cophenet¶
-
scipy.cluster.hierarchy.
cophenet
(Z, Y=None)[source]¶ Calculate the cophenetic distances between each observation in the hierarchical clustering defined by the linkage
Z
.Suppose
p
andq
are original observations in disjoint clusterss
andt
, respectively ands
andt
are joined by a direct parent clusteru
. The cophenetic distance between observationsi
andj
is simply the distance between clusterss
andt
.- Parameters
- Zndarray
The hierarchical clustering encoded as an array (see
linkage
function).- Yndarray (optional)
Calculates the cophenetic correlation coefficient
c
of a hierarchical clustering defined by the linkage matrix Z of a set of \(n\) observations in \(m\) dimensions. Y is the condensed distance matrix from which Z was generated.
- Returns
- cndarray
The cophentic correlation distance (if
Y
is passed).- dndarray
The cophenetic distance matrix in condensed form. The \(ij\) th entry is the cophenetic distance between original observations \(i\) and \(j\).
See also
linkage
for a description of what a linkage matrix is.
scipy.spatial.distance.squareform
transforming condensed matrices into square ones.
Examples
>>> from scipy.cluster.hierarchy import single, cophenet >>> from scipy.spatial.distance import pdist, squareform
Given a dataset
X
and a linkage matrixZ
, the cophenetic distance between two points ofX
is the distance between the largest two distinct clusters that each of the points:>>> X = [[0, 0], [0, 1], [1, 0], ... [0, 4], [0, 3], [1, 4], ... [4, 0], [3, 0], [4, 1], ... [4, 4], [3, 4], [4, 3]]
X
corresponds to this datasetx x x x x x x x x x x x
>>> Z = single(pdist(X)) >>> Z array([[ 0., 1., 1., 2.], [ 2., 12., 1., 3.], [ 3., 4., 1., 2.], [ 5., 14., 1., 3.], [ 6., 7., 1., 2.], [ 8., 16., 1., 3.], [ 9., 10., 1., 2.], [11., 18., 1., 3.], [13., 15., 2., 6.], [17., 20., 2., 9.], [19., 21., 2., 12.]]) >>> cophenet(Z) array([1., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 1., 2., 2., 2., 2., 2., 2., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 1., 2., 2., 2., 1., 2., 2., 2., 2., 2., 2., 1., 1., 1.])
The output of the
scipy.cluster.hierarchy.cophenet
method is represented in condensed form. We can usescipy.spatial.distance.squareform
to see the output as a regular matrix (where each elementij
denotes the cophenetic distance between eachi
,j
pair of points inX
):>>> squareform(cophenet(Z)) array([[0., 1., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2.], [1., 0., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2.], [1., 1., 0., 2., 2., 2., 2., 2., 2., 2., 2., 2.], [2., 2., 2., 0., 1., 1., 2., 2., 2., 2., 2., 2.], [2., 2., 2., 1., 0., 1., 2., 2., 2., 2., 2., 2.], [2., 2., 2., 1., 1., 0., 2., 2., 2., 2., 2., 2.], [2., 2., 2., 2., 2., 2., 0., 1., 1., 2., 2., 2.], [2., 2., 2., 2., 2., 2., 1., 0., 1., 2., 2., 2.], [2., 2., 2., 2., 2., 2., 1., 1., 0., 2., 2., 2.], [2., 2., 2., 2., 2., 2., 2., 2., 2., 0., 1., 1.], [2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 0., 1.], [2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 1., 0.]])
In this example, the cophenetic distance between points on
X
that are very close (i.e. in the same corner) is 1. For other pairs of points is 2, because the points will be located in clusters at different corners - thus the distance between these clusters will be larger.