# scipy.cluster.hierarchy.cophenet¶

scipy.cluster.hierarchy.cophenet(Z, Y=None)[source]

Calculate the cophenetic distances between each observation in the hierarchical clustering defined by the linkage Z.

Suppose p and q are original observations in disjoint clusters s and t, respectively and s and t are joined by a direct parent cluster u. The cophenetic distance between observations i and j is simply the distance between clusters s and t.

Parameters: Z : ndarray The hierarchical clustering encoded as an array (see linkage function). Y : ndarray (optional) Calculates the cophenetic correlation coefficient c of a hierarchical clustering defined by the linkage matrix Z of a set of $$n$$ observations in $$m$$ dimensions. Y is the condensed distance matrix from which Z was generated. c : ndarray The cophentic correlation distance (if Y is passed). d : ndarray The cophenetic distance matrix in condensed form. The $$ij$$ th entry is the cophenetic distance between original observations $$i$$ and $$j$$.

linkage
for a description of what a linkage matrix is.
scipy.spatial.distance.squareform
transforming condensed matrices into square ones.

Examples

>>> from scipy.cluster.hierarchy import single, cophenet
>>> from scipy.spatial.distance import pdist, squareform


Given a dataset X and a linkage matrix Z, the cophenetic distance between two points of X is the distance between the largest two distinct clusters that each of the points:

>>> X = [[0, 0], [0, 1], [1, 0],
...      [0, 4], [0, 3], [1, 4],
...      [4, 0], [3, 0], [4, 1],
...      [4, 4], [3, 4], [4, 3]]


X corresponds to this dataset

x x    x x
x        x

x        x
x x    x x

>>> Z = single(pdist(X))
>>> Z
array([[ 0.,  1.,  1.,  2.],
[ 2., 12.,  1.,  3.],
[ 3.,  4.,  1.,  2.],
[ 5., 14.,  1.,  3.],
[ 6.,  7.,  1.,  2.],
[ 8., 16.,  1.,  3.],
[ 9., 10.,  1.,  2.],
[11., 18.,  1.,  3.],
[13., 15.,  2.,  6.],
[17., 20.,  2.,  9.],
[19., 21.,  2., 12.]])
>>> cophenet(Z)
array([1., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 2., 2., 2., 2., 2.,
2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 1., 2., 2.,
2., 2., 2., 2., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.,
1., 1., 2., 2., 2., 1., 2., 2., 2., 2., 2., 2., 1., 1., 1.])


The output of the scipy.cluster.hierarchy.cophenet method is represented in condensed form. We can use scipy.spatial.distance.squareform to see the output as a regular matrix (where each element ij denotes the cophenetic distance between each i, j pair of points in X):

>>> squareform(cophenet(Z))
array([[0., 1., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
[1., 0., 1., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
[1., 1., 0., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
[2., 2., 2., 0., 1., 1., 2., 2., 2., 2., 2., 2.],
[2., 2., 2., 1., 0., 1., 2., 2., 2., 2., 2., 2.],
[2., 2., 2., 1., 1., 0., 2., 2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2., 2., 0., 1., 1., 2., 2., 2.],
[2., 2., 2., 2., 2., 2., 1., 0., 1., 2., 2., 2.],
[2., 2., 2., 2., 2., 2., 1., 1., 0., 2., 2., 2.],
[2., 2., 2., 2., 2., 2., 2., 2., 2., 0., 1., 1.],
[2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 0., 1.],
[2., 2., 2., 2., 2., 2., 2., 2., 2., 1., 1., 0.]])


In this example, the cophenetic distance between points on X that are very close (i.e. in the same corner) is 1. For other pairs of points is 2, because the points will be located in clusters at different corners - thus the distance between these clusters will be larger.

#### Previous topic

scipy.cluster.hierarchy.ward