Warning
This documentation is work-in-progress and unorganized.
Nearest-neighbor queries:
KDTree – class for efficient nearest-neighbor queries distance – module containing many different distance measures
kd-tree for quick nearest-neighbor lookup
This class provides an index into a set of k-dimensional points which can be used to rapidly look up the nearest neighbors of any point.
The algorithm used is described in Maneewongvatana and Mount 1999. The general idea is that the kd-tree is a binary tree, each of whose nodes represents an axis-aligned hyperrectangle. Each node specifies an axis and splits the set of points based on whether their coordinate along that axis is greater than or less than a particular value.
During construction, the axis and splitting point are chosen by the “sliding midpoint” rule, which ensures that the cells do not all become long and thin.
The tree can be queried for the r closest neighbors of any given point (optionally returning only those within some maximum distance of the point). It can also be queried, with a substantial gain in efficiency, for the r approximate closest neighbors.
For large dimensions (20 is already large) do not expect this to run significantly faster than brute force. High-dimensional nearest-neighbor queries are a substantial open problem in computer science.
The tree also supports all-neighbors queries, both with arrays of points and with other kd-trees. These do use a reasonably efficient algorithm, but the kd-tree is not necessarily the best data structure for this sort of calculation.
Methods
count_neighbors | |
innernode | |
leafnode | |
node | |
query | |
query_ball_point | |
query_ball_tree | |
query_pairs | |
sparse_distance_matrix |
Count how many nearby pairs can be formed.
Count the number of pairs (x1,x2) can be formed, with x1 drawn from self and x2 drawn from other, and where distance(x1,x2,p)<=r. This is the “two-point correlation” described in Gray and Moore 2000, “N-body problems in statistical learning”, and the code here is based on their algorithm.
Parameters : | other : KDTree r : float or one-dimensional array of floats
p : float, 1<=p<=infinity
|
---|---|
Returns : | result : integer or one-dimensional array of integers
|
query the kd-tree for nearest neighbors
Parameters : | x : array-like, last dimension self.m
k : integer
eps : nonnegative float
p : float, 1<=p<=infinity
distance_upper_bound : nonnegative float
|
---|---|
Returns : | d : array of floats
i : array of integers
|
Examples
>>> from scipy.spatial import KDTree
>>> x, y = np.mgrid[0:5, 2:8]
>>> tree = KDTree(zip(x.ravel(), y.ravel()))
>>> tree.data
array([[0, 2],
[0, 3],
[0, 4],
[0, 5],
[0, 6],
[0, 7],
[1, 2],
[1, 3],
[1, 4],
[1, 5],
[1, 6],
[1, 7],
[2, 2],
[2, 3],
[2, 4],
[2, 5],
[2, 6],
[2, 7],
[3, 2],
[3, 3],
[3, 4],
[3, 5],
[3, 6],
[3, 7],
[4, 2],
[4, 3],
[4, 4],
[4, 5],
[4, 6],
[4, 7]])
>>> pts = np.array([[0, 0], [2.1, 2.9]])
>>> tree.query(pts)
(array([ 2. , 0.14142136]), array([ 0, 13]))
Find all points within r of x
Parameters : | x : array_like, shape tuple + (self.m,)
r : positive float
p : float 1<=p<=infinity
eps : nonnegative float
|
---|---|
Returns : | results : list or array of lists
Note: if you have many points whose neighbors you want to find, you may save : substantial amounts of time by putting them in a KDTree and using query_ball_tree. : |
Find all pairs of points whose distance is at most r
Parameters : | other : KDTree
r : positive float
p : float 1<=p<=infinity
eps : nonnegative float
|
---|---|
Returns : | results : list of lists
|
Find all pairs of points whose distance is at most r
Parameters : | r : positive float
p : float 1<=p<=infinity
eps : nonnegative float
|
---|---|
Returns : | results : set
|
Compute a sparse distance matrix
Computes a distance matrix between two KDTrees, leaving as zero any distance greater than max_distance.
Parameters : | other : KDTree max_distance : positive float |
---|---|
Returns : | result : dok_matrix
|
Hyperrectangle class.
Represents a Cartesian product of intervals.
Methods
max_distance_point | |
max_distance_rectangle | |
min_distance_point | |
min_distance_rectangle | |
split | |
volume |
Compute the maximum distance between x and a point in the hyperrectangle.
Compute the maximum distance between points in the two hyperrectangles.
Compute the minimum distance between x and a point in the hyperrectangle.
Compute the minimum distance between points in the two hyperrectangles.
Produce two hyperrectangles by splitting along axis d.
In general, if you need to compute maximum and minimum distances to the children, it can be done more efficiently by updating the maximum and minimum distances to the parent.
Total volume.
kd-tree for quick nearest-neighbor lookup
This class provides an index into a set of k-dimensional points which can be used to rapidly look up the nearest neighbors of any point.
The algorithm used is described in Maneewongvatana and Mount 1999. The general idea is that the kd-tree is a binary trie, each of whose nodes represents an axis-aligned hyperrectangle. Each node specifies an axis and splits the set of points based on whether their coordinate along that axis is greater than or less than a particular value.
During construction, the axis and splitting point are chosen by the “sliding midpoint” rule, which ensures that the cells do not all become long and thin.
The tree can be queried for the r closest neighbors of any given point (optionally returning only those within some maximum distance of the point). It can also be queried, with a substantial gain in efficiency, for the r approximate closest neighbors.
For large dimensions (20 is already large) do not expect this to run significantly faster than brute force. High-dimensional nearest-neighbor queries are a substantial open problem in computer science.
Methods
query |
query the kd-tree for nearest neighbors
Compute the distance matrix.
Computes the matrix of all pairwise distances.
Parameters : | x : array-like, m by k y : array-like, n by k p : float 1<=p<=infinity
threshold : positive integer
|
---|---|
Returns : | result : array-like, m by n |
Pop the smallest item off the heap, maintaining the heap invariant.
Push item onto heap, maintaining the heap invariant.
Compute the L**p distance between x and y
Compute the pth power of the L**p distance between x and y
For efficiency, this function computes the L**p distance but does not extract the pth root. If p is 1 or infinity, this is equal to the actual L**p distance.