directional_stats#
- scipy.stats.directional_stats(samples, *, axis=0, normalize=True)[source]#
Computes sample statistics for directional data.
Computes the directional mean (also called the mean direction vector) and mean resultant length of a sample of vectors.
The directional mean is a measure of “preferred direction” of vector data. It is analogous to the sample mean, but it is for use when the length of the data is irrelevant (e.g. unit vectors).
The mean resultant length is a value between 0 and 1 used to quantify the dispersion of directional data: the smaller the mean resultant length, the greater the dispersion. Several definitions of directional variance involving the mean resultant length are given in [1] and [2].
- Parameters:
- samplesarray_like
Input array. Must be at least two-dimensional, and the last axis of the input must correspond with the dimensionality of the vector space. When the input is exactly two dimensional, this means that each row of the data is a vector observation.
- axisint, default: 0
Axis along which the directional mean is computed.
- normalize: boolean, default: True
If True, normalize the input to ensure that each observation is a unit vector. It the observations are already unit vectors, consider setting this to False to avoid unnecessary computation.
- Returns:
- resDirectionalStats
An object containing attributes:
- mean_directionndarray
Directional mean.
- mean_resultant_lengthndarray
The mean resultant length [1].
See also
Notes
This uses a definition of directional mean from [1]. Assuming the observations are unit vectors, the calculation is as follows.
mean = samples.mean(axis=0) mean_resultant_length = np.linalg.norm(mean) mean_direction = mean / mean_resultant_length
This definition is appropriate for directional data (i.e. vector data for which the magnitude of each observation is irrelevant) but not for axial data (i.e. vector data for which the magnitude and sign of each observation is irrelevant).
Several definitions of directional variance involving the mean resultant length
R
have been proposed, including1 - R
[1],1 - R**2
[2], and2 * (1 - R)
[2]. Rather than choosing one, this function returnsR
as attribute mean_resultant_length so the user can compute their preferred measure of dispersion.References
Examples
>>> import numpy as np >>> from scipy.stats import directional_stats >>> data = np.array([[3, 4], # first observation, 2D vector space ... [6, -8]]) # second observation >>> dirstats = directional_stats(data) >>> dirstats.mean_direction array([1., 0.])
In contrast, the regular sample mean of the vectors would be influenced by the magnitude of each observation. Furthermore, the result would not be a unit vector.
>>> data.mean(axis=0) array([4.5, -2.])
An exemplary use case for
directional_stats
is to find a meaningful center for a set of observations on a sphere, e.g. geographical locations.>>> data = np.array([[0.8660254, 0.5, 0.], ... [0.8660254, -0.5, 0.]]) >>> dirstats = directional_stats(data) >>> dirstats.mean_direction array([1., 0., 0.])
The regular sample mean on the other hand yields a result which does not lie on the surface of the sphere.
>>> data.mean(axis=0) array([0.8660254, 0., 0.])
The function also returns the mean resultant length, which can be used to calculate a directional variance. For example, using the definition
Var(z) = 1 - R
from [2] whereR
is the mean resultant length, we can calculate the directional variance of the vectors in the above example as:>>> 1 - dirstats.mean_resultant_length 0.13397459716167093