This is documentation for an old release of SciPy (version 0.16.0). Read this page Search for this page in the documentation of the latest stable release (version 1.15.1).

scipy.stats.mstats.pointbiserialr¶

scipy.stats.mstats.pointbiserialr(x, y)[source]¶

Calculates a point biserial correlation coefficient and the associated p-value.

The point biserial correlation is used to measure the relationship between a binary variable, x, and a continuous variable, y. Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply a determinative relationship.

This function uses a shortcut formula but produces the same result as pearsonr.

Parameters:

Parameters:	x : array_like of bools Input array. y : array_like Input array.
Returns:	correlation : float R value pvalue : float 2-tailed p-value

x : array_like of bools

Input array.

y : array_like

Input array.

Returns:

correlation : float

R value

pvalue : float

2-tailed p-value

Notes

Missing values are considered pair-wise: if a value is missing in x, the corresponding value in y is masked.

References

http://en.wikipedia.org/wiki/Point-biserial_correlation_coefficient

Examples

>>>>>> from scipy import stats
>>> a = np.array([0, 0, 0, 1, 1, 1, 1])
>>> b = np.arange(7)
>>> stats.pointbiserialr(a, b)
(0.8660254037844386, 0.011724811003954652)
>>> stats.pearsonr(a, b)
(0.86602540378443871, 0.011724811003954626)
>>> np.corrcoef(a, b)
array([[ 1.       ,  0.8660254],
       [ 0.8660254,  1.       ]])

Previous topic

scipy.stats.mstats.plotting_positions

Next topic

scipy.stats.mstats.rankdata