scipy.stats.pearsonr¶

scipy.stats.
pearsonr
(x, y)[source]¶ Calculate a Pearson correlation coefficient and the pvalue for testing noncorrelation.
The Pearson correlation coefficient measures the linear relationship between two datasets. Strictly speaking, Pearson’s correlation requires that each dataset be normally distributed, and not necessarily zeromean. Like other correlation coefficients, this one varies between 1 and +1 with 0 implying no correlation. Correlations of 1 or +1 imply an exact linear relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases.
The pvalue roughly indicates the probability of an uncorrelated system producing datasets that have a Pearson correlation at least as extreme as the one computed from these datasets. The pvalues are not entirely reliable but are probably reasonable for datasets larger than 500 or so.
Parameters:  x : (N,) array_like
Input
 y : (N,) array_like
Input
Returns:  r : float
Pearson’s correlation coefficient
 pvalue : float
2tailed pvalue
Notes
The correlation coefficient is calculated as follows:
\[r_{pb} = \frac{\sum (x  m_x) (y  m_y) }{\sqrt{\sum (x  m_x)^2 (y  m_y)^2}}\]where \(m_x\) is the mean of the vector \(x\) and \(m_y\) is the mean of the vector \(y\).
References
http://www.statsoft.com/textbook/glosp.html#Pearson%20Correlation
Examples
>>> from scipy import stats >>> a = np.array([0, 0, 0, 1, 1, 1, 1]) >>> b = np.arange(7) >>> stats.pearsonr(a, b) (0.8660254037844386, 0.011724811003954654)
>>> stats.pearsonr([1,2,3,4,5], [5,6,7,8,7]) (0.83205029433784372, 0.080509573298498519)