scipy.stats.siegelslopes#

scipy.stats.siegelslopes(y, x=None, method='hierarchical')[source]#

Computes the Siegel estimator for a set of points (x, y).

siegelslopes implements a method for robust linear regression using repeated medians (see [1]) to fit a line to the points (x, y). The method is robust to outliers with an asymptotic breakdown point of 50%.

Parameters:
yarray_like

Dependent variable.

xarray_like or None, optional

Independent variable. If None, use arange(len(y)) instead.

method{‘hierarchical’, ‘separate’}

If ‘hierarchical’, estimate the intercept using the estimated slope slope (default option). If ‘separate’, estimate the intercept independent of the estimated slope. See Notes for details.

Returns:
resultSiegelslopesResult instance

The return value is an object with the following attributes:

slopefloat

Estimate of the slope of the regression line.

interceptfloat

Estimate of the intercept of the regression line.

See also

theilslopes

a similar technique without repeated medians

Notes

With n = len(y), compute m_j as the median of the slopes from the point (x[j], y[j]) to all other n-1 points. slope is then the median of all slopes m_j. Two ways are given to estimate the intercept in [1] which can be chosen via the parameter method. The hierarchical approach uses the estimated slope slope and computes intercept as the median of y - slope*x. The other approach estimates the intercept separately as follows: for each point (x[j], y[j]), compute the intercepts of all the n-1 lines through the remaining points and take the median i_j. intercept is the median of the i_j.

The implementation computes n times the median of a vector of size n which can be slow for large vectors. There are more efficient algorithms (see [2]) which are not implemented here.

For compatibility with older versions of SciPy, the return value acts like a namedtuple of length 2, with fields slope and intercept, so one can continue to write:

slope, intercept = siegelslopes(y, x)

References

[1] (1,2)

A. Siegel, “Robust Regression Using Repeated Medians”, Biometrika, Vol. 69, pp. 242-244, 1982.

[2]

A. Stein and M. Werman, “Finding the repeated median regression line”, Proceedings of the Third Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 409-413, 1992.

Examples

>>> import numpy as np
>>> from scipy import stats
>>> import matplotlib.pyplot as plt
>>> x = np.linspace(-5, 5, num=150)
>>> y = x + np.random.normal(size=x.size)
>>> y[11:15] += 10  # add outliers
>>> y[-5:] -= 7

Compute the slope and intercept. For comparison, also compute the least-squares fit with linregress:

>>> res = stats.siegelslopes(y, x)
>>> lsq_res = stats.linregress(x, y)

Plot the results. The Siegel regression line is shown in red. The green line shows the least-squares fit for comparison.

>>> fig = plt.figure()
>>> ax = fig.add_subplot(111)
>>> ax.plot(x, y, 'b.')
>>> ax.plot(x, res[1] + res[0] * x, 'r-')
>>> ax.plot(x, lsq_res[1] + lsq_res[0] * x, 'g-')
>>> plt.show()
../../_images/scipy-stats-siegelslopes-1.png