# scipy.cluster.hierarchy.average¶

scipy.cluster.hierarchy.average(y)[source]

Perform average/UPGMA linkage on a condensed distance matrix.

Parameters
yndarray

The upper triangular of the distance matrix. The result of pdist is returned in this form.

Returns
Zndarray

for advanced creation of hierarchical clusterings.

scipy.spatial.distance.pdist

pairwise distance metrics

Examples

>>> from scipy.cluster.hierarchy import average, fcluster
>>> from scipy.spatial.distance import pdist

First we need a toy dataset to play with:

x x    x x
x        x

x        x
x x    x x
>>> X = [[0, 0], [0, 1], [1, 0],
...      [0, 4], [0, 3], [1, 4],
...      [4, 0], [3, 0], [4, 1],
...      [4, 4], [3, 4], [4, 3]]

Then we get a condensed distance matrix from this dataset:

>>> y = pdist(X)

Finally, we can perform the clustering:

>>> Z = average(y)
>>> Z
array([[ 0.        ,  1.        ,  1.        ,  2.        ],
[ 3.        ,  4.        ,  1.        ,  2.        ],
[ 6.        ,  7.        ,  1.        ,  2.        ],
[ 9.        , 10.        ,  1.        ,  2.        ],
[ 2.        , 12.        ,  1.20710678,  3.        ],
[ 5.        , 13.        ,  1.20710678,  3.        ],
[ 8.        , 14.        ,  1.20710678,  3.        ],
[11.        , 15.        ,  1.20710678,  3.        ],
[16.        , 17.        ,  3.39675184,  6.        ],
[18.        , 19.        ,  3.39675184,  6.        ],
[20.        , 21.        ,  4.09206523, 12.        ]])

The linkage matrix Z represents a dendrogram - see scipy.cluster.hierarchy.linkage for a detailed explanation of its contents.

We can use scipy.cluster.hierarchy.fcluster to see to which cluster each initial point would belong given a distance threshold:

>>> fcluster(Z, 0.9, criterion='distance')
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12], dtype=int32)
>>> fcluster(Z, 1.5, criterion='distance')
array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4], dtype=int32)
>>> fcluster(Z, 4, criterion='distance')
array([1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2], dtype=int32)
>>> fcluster(Z, 6, criterion='distance')
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

Also scipy.cluster.hierarchy.dendrogram can be used to generate a plot of the dendrogram.

#### Previous topic

scipy.cluster.hierarchy.complete

#### Next topic

scipy.cluster.hierarchy.weighted