Performs hierarchical/agglomerative clustering on the condensed
distance matrix y.
y must be a sized
vector where n is the number of original observations paired
in the distance matrix. The behavior of this function is very
similar to the MATLAB linkage function.
A 4 by matrix Z is returned. At the
-th iteration, clusters with indices Z[i, 0] and
Z[i, 1] are combined to form cluster . A
cluster with an index less than corresponds to one of
the original observations. The distance between
clusters Z[i, 0] and Z[i, 1] is given by Z[i, 2]. The
fourth value Z[i, 3] represents the number of original
observations in the newly formed cluster.
The following linkage methods are used to compute the distance
between two clusters and
. The algorithm begins with a forest of clusters that
have yet to be used in the hierarchy being formed. When two
clusters and from this forest are combined
into a single cluster , and are
removed from the forest, and is added to the
forest. When only one cluster remains in the forest, the algorithm
stops, and this cluster becomes the root.
A distance matrix is maintained at each iteration. The d[i,j]
entry corresponds to the distance between cluster and
in the original forest.
At each iteration, the algorithm must update the distance matrix
to reflect the distance of the newly formed cluster u with the
remaining clusters in the forest.
Suppose there are original observations
in cluster and
original objects in
cluster . Recall and are
combined to form cluster . Let be any
remaining cluster in the forest that is not .
The following are methods for calculating the distance between the
newly formed cluster and each .
Warning: When the minimum distance pair in the forest is chosen, there
may be two or more pairs with the same minimum distance. This
implementation may chose a different minimum than the MATLAB
y : ndarray
method : str, optional
The linkage algorithm to use. See the Linkage Methods section below
for full descriptions.
metric : str, optional
The distance metric to use. See the distance.pdist function for a
list of valid distance metrics.
Z : ndarray
The hierarchical clustering encoded as a linkage matrix.