Spatial algorithms and data structures (scipy.spatial)

Warning

This documentation is work-in-progress and unorganized.

Spatial data structures and algorithms

Nearest-neighbor queries:

KDTree – class for efficient nearest-neighbor queries distance – module containing many different distance measures
class scipy.spatial.KDTree(data, leafsize=10)

kd-tree for quick nearest-neighbor lookup

This class provides an index into a set of k-dimensional points which can be used to rapidly look up the nearest neighbors of any point.

The algorithm used is described in Maneewongvatana and Mount 1999. The general idea is that the kd-tree is a binary trie, each of whose nodes represents an axis-aligned hyperrectangle. Each node specifies an axis and splits the set of points based on whether their coordinate along that axis is greater than or less than a particular value.

During construction, the axis and splitting point are chosen by the “sliding midpoint” rule, which ensures that the cells do not all become long and thin.

The tree can be queried for the r closest neighbors of any given point (optionally returning only those within some maximum distance of the point). It can also be queried, with a substantial gain in efficiency, for the r approximate closest neighbors.

For large dimensions (20 is already large) do not expect this to run significantly faster than brute force. High-dimensional nearest-neighbor queries are a substantial open problem in computer science.

The tree also supports all-neighbors queries, both with arrays of points and with other kd-trees. These do use a reasonably efficient algorithm, but the kd-tree is not necessarily the best data structure for this sort of calculation.

count_neighbors(other, r, p=2.0)

Count how many nearby pairs can be formed.

Count the number of pairs (x1,x2) can be formed, with x1 drawn from self and x2 drawn from other, and where distance(x1,x2,p)<=r. This is the “two-point correlation” described in Gray and Moore 2000, “N-body problems in statistical learning”, and the code here is based on their algorithm.

Parameters:

other : KDTree

r : float or one-dimensional array of floats

The radius to produce a count for. Multiple radii are searched with a single tree traversal.

p : float, 1<=p<=infinity

Which Minkowski p-norm to use

Returns:

result : integer or one-dimensional array of integers

The number of pairs. Note that this is internally stored in a numpy int, and so may overflow if very large (two billion).

query(x, k=1, eps=0, p=2, distance_upper_bound=inf)
query the kd-tree for nearest neighbors
query_ball_point(x, r, p=2.0, eps=0)

Find all points within r of x

Parameters:

x : array_like, shape tuple + (self.m,)

The point or points to search for neighbors of

r : positive float

The radius of points to return

p : float 1<=p<=infinity

Which Minkowski p-norm to use

eps : nonnegative float

Approximate search. Branches of the tree are not explored if their nearest points are further than r/(1+eps), and branches are added in bulk if their furthest points are nearer than r*(1+eps).

Returns:

results : list or array of lists

If x is a single point, returns a list of the indices of the neighbors of x. If x is an array of points, returns an object array of shape tuple containing lists of neighbors.

Note: if you have many points whose neighbors you want to find, you may save :

substantial amounts of time by putting them in a KDTree and using query_ball_tree. :

query_ball_tree(other, r, p=2.0, eps=0)

Find all pairs of points whose distance is at most r

Parameters:

other : KDTree

The tree containing points to search against

r : positive float

The maximum distance

p : float 1<=p<=infinity

Which Minkowski norm to use

eps : nonnegative float

Approximate search. Branches of the tree are not explored if their nearest points are further than r/(1+eps), and branches are added in bulk if their furthest points are nearer than r*(1+eps).

Returns:

results : list of lists

For each element self.data[i] of this tree, results[i] is a list of the indices of its neighbors in other.data.

sparse_distance_matrix(other, max_distance, p=2.0)

Compute a sparse distance matrix

Computes a distance matrix between two KDTrees, leaving as zero any distance greater than max_distance.

Parameters:

other : KDTree

max_distance : positive float

Returns:

result : dok_matrix

Sparse matrix representing the results in “dictionary of keys” format.

class scipy.spatial.Rectangle(maxes, mins)

Hyperrectangle class.

Represents a Cartesian product of intervals.

max_distance_point(x, p=2.0)
Compute the maximum distance between x and a point in the hyperrectangle.
max_distance_rectangle(other, p=2.0)
Compute the maximum distance between points in the two hyperrectangles.
min_distance_point(x, p=2.0)
Compute the minimum distance between x and a point in the hyperrectangle.
min_distance_rectangle(other, p=2.0)
Compute the minimum distance between points in the two hyperrectangles.
split(d, split)

Produce two hyperrectangles by splitting along axis d.

In general, if you need to compute maximum and minimum distances to the children, it can be done more efficiently by updating the maximum and minimum distances to the parent.

volume()
Total volume.
class scipy.spatial.cKDTree

kd-tree for quick nearest-neighbor lookup

This class provides an index into a set of k-dimensional points which can be used to rapidly look up the nearest neighbors of any point.

The algorithm used is described in Maneewongvatana and Mount 1999. The general idea is that the kd-tree is a binary trie, each of whose nodes represents an axis-aligned hyperrectangle. Each node specifies an axis and splits the set of points based on whether their coordinate along that axis is greater than or less than a particular value.

During construction, the axis and splitting point are chosen by the “sliding midpoint” rule, which ensures that the cells do not all become long and thin.

The tree can be queried for the r closest neighbors of any given point (optionally returning only those within some maximum distance of the point). It can also be queried, with a substantial gain in efficiency, for the r approximate closest neighbors.

For large dimensions (20 is already large) do not expect this to run significantly faster than brute force. High-dimensional nearest-neighbor queries are a substantial open problem in computer science.

query()
query the kd-tree for nearest neighbors
scipy.spatial.distance_matrix(x, y, p=2, threshold=1000000)

Compute the distance matrix.

Computes the matrix of all pairwise distances.

Parameters:

x : array-like, m by k

y : array-like, n by k

p : float 1<=p<=infinity

Which Minkowski p-norm to use.

threshold : positive integer

If m*n*k>threshold use a python loop instead of creating a very large temporary.

Returns:

result : array-like, m by n

scipy.spatial.heappop()
Pop the smallest item off the heap, maintaining the heap invariant.
scipy.spatial.heappush()
Push item onto heap, maintaining the heap invariant.
scipy.spatial.minkowski_distance(x, y, p=2)
Compute the L**p distance between x and y
scipy.spatial.minkowski_distance_p(x, y, p=2)

Compute the pth power of the L**p distance between x and y

For efficiency, this function computes the L**p distance but does not extract the pth root. If p is 1 or infinity, this is equal to the actual L**p distance.