scipy.spatial.cKDTree.count_neighbors¶

cKDTree.
count_neighbors
(self, other, r, p=2.0, weights=None, cumulative=True)¶ Count how many nearby pairs can be formed.
Count the number of pairs
(x1,x2)
can be formed, withx1
drawn fromself
andx2
drawn fromother
, and wheredistance(x1, x2, p) <= r
.Data points on
self
andother
are optionally weighted by theweights
argument. (See below)This is adapted from the “twopoint correlation” algorithm described by Gray and Moore [1]. See notes for further discussion.
 Parameters
 othercKDTree instance
The other tree to draw points from, can be the same tree as self.
 rfloat or onedimensional array of floats
The radius to produce a count for. Multiple radii are searched with a single tree traversal. If the count is noncumulative(
cumulative=False
),r
defines the edges of the bins, and must be nondecreasing. pfloat, optional
1<=p<=infinity. Which Minkowski pnorm to use. Default 2.0. A finite large p may cause a ValueError if overflow can occur.
 weightstuple, array_like, or None, optional
If None, the paircounting is unweighted. If given as a tuple, weights[0] is the weights of points in
self
, and weights[1] is the weights of points inother
; either can be None to indicate the points are unweighted. If given as an array_like, weights is the weights of points inself
andother
. For this to make sense,self
andother
must be the same tree. Ifself
andother
are two different trees, aValueError
is raised. Default: None cumulativebool, optional
Whether the returned counts are cumulative. When cumulative is set to
False
the algorithm is optimized to work with a large number of bins (>10) specified byr
. Whencumulative
is set to True, the algorithm is optimized to work with a small number ofr
. Default: True
 Returns
 resultscalar or 1D array
The number of pairs. For unweighted counts, the result is integer. For weighted counts, the result is float. If cumulative is False,
result[i]
contains the counts with(inf if i == 0 else r[i1]) < R <= r[i]
Notes
Paircounting is the basic operation used to calculate the two point correlation functions from a data set composed of position of objects.
Two point correlation function measures the clustering of objects and is widely used in cosmology to quantify the large scale structure in our Universe, but it may be useful for data analysis in other fields where selfsimilar assembly of objects also occur.
The LandySzalay estimator for the two point correlation function of
D
measures the clustering signal inD
. [2]For example, given the position of two sets of objects,
objects
D
(data) contains the clustering signal, andobjects
R
(random) that contains no signal,
\[\xi(r) = \frac{<D, D>  2 f <D, R> + f^2<R, R>}{f^2<R, R>},\]where the brackets represents counting pairs between two data sets in a finite bin around
r
(distance), corresponding to setting cumulative=False, andf = float(len(D)) / float(len(R))
is the ratio between number of objects from data and random.The algorithm implemented here is loosely based on the dualtree algorithm described in [1]. We switch between two different paircumulation scheme depending on the setting of
cumulative
. The computing time of the method we use when forcumulative == False
does not scale with the total number of bins. The algorithm forcumulative == True
scales linearly with the number of bins, though it is slightly faster when only 1 or 2 bins are used. [5].As an extension to the naive paircounting, weighted paircounting counts the product of weights instead of number of pairs. Weighted paircounting is used to estimate marked correlation functions ([3], section 2.2), or to properly calculate the average of data per distance bin (e.g. [4], section 2.1 on redshift).
 1(1,2)
Gray and Moore, “Nbody problems in statistical learning”, Mining the sky, 2000, arXiv:astroph/0012333
 2
Landy and Szalay, “Bias and variance of angular correlation functions”, The Astrophysical Journal, 1993, DOI:10.1086/172900
 3
Sheth, Connolly and Skibba, “Marked correlations in galaxy formation models”, 2005, arXiv:astroph/0511773
 4
Hawkins, et al., “The 2dF Galaxy Redshift Survey: correlation functions, peculiar velocities and the matter density of the Universe”, Monthly Notices of the Royal Astronomical Society, 2002, DOI:10.1046/j.13652966.2003.07063.x
 5
https://github.com/scipy/scipy/pull/5647#issuecomment168474926
Examples
You can count neighbors number between two kdtrees within a distance:
>>> import numpy as np >>> from scipy.spatial import cKDTree >>> np.random.seed(21701) >>> points1 = np.random.random((5, 2)) >>> points2 = np.random.random((5, 2)) >>> kd_tree1 = cKDTree(points1) >>> kd_tree2 = cKDTree(points2) >>> kd_tree1.count_neighbors(kd_tree2, 0.2) 9
This number is same as the total pair number calculated by
query_ball_tree
:>>> indexes = kd_tree1.query_ball_tree(kd_tree2, r=0.2) >>> sum([len(i) for i in indexes]) 9