SciPy

scipy.spatial.cKDTree

class scipy.spatial.cKDTree

kd-tree for quick nearest-neighbor lookup

This class provides an index into a set of k-dimensional points which can be used to rapidly look up the nearest neighbors of any point.

The algorithm used is described in Maneewongvatana and Mount 1999. The general idea is that the kd-tree is a binary trie, each of whose nodes represents an axis-aligned hyperrectangle. Each node specifies an axis and splits the set of points based on whether their coordinate along that axis is greater than or less than a particular value.

During construction, the axis and splitting point are chosen by the “sliding midpoint” rule, which ensures that the cells do not all become long and thin.

The tree can be queried for the r closest neighbors of any given point (optionally returning only those within some maximum distance of the point). It can also be queried, with a substantial gain in efficiency, for the r approximate closest neighbors.

For large dimensions (20 is already large) do not expect this to run significantly faster than brute force. High-dimensional nearest-neighbor queries are a substantial open problem in computer science.

Parameters:

data : array_like, shape (n,m)

The n data points of dimension m to be indexed. This array is not copied unless this is necessary to produce a contiguous array of doubles, and so modifying this data will result in bogus results. The data are also copied if the kd-tree is built with copy_data=True.

leafsize : positive int, optional

The number of points at which the algorithm switches over to brute-force. Default: 16.

compact_nodes : bool, optional

If True, the kd-tree is built to shrink the hyperrectangles to the actual data range. This usually gives a more compact tree and faster queries at the expense of longer build time. Default: True.

copy_data : bool, optional

If True the data is always copied to protect the kd-tree against data corruption. Default: False.

balanced_tree : bool, optional

If True, the median is used to split the hyperrectangles instead of the midpoint. This usually gives a more compact tree and faster queries at the expense of longer build time. Default: True.

Attributes

data
indices
leafsize
m
maxes
mins
n
tree

Methods

count_neighbors(self, other, r[, p]) Count how many nearby pairs can be formed.
query(self, x[, k, eps, p, ...]) Query the kd-tree for nearest neighbors
query_ball_point(self, x, r[, p, eps]) Find all points within distance r of point(s) x.
query_ball_tree(self, other, r[, p, eps]) Find all pairs of points whose distance is at most r
query_pairs(self, r[, p, eps]) Find all pairs of points whose distance is at most r.
sparse_distance_matrix(self, other, max_distance) Compute a sparse distance matrix