scipy.stats.ttest_ind¶
- scipy.stats.ttest_ind(a, b, axis=0, equal_var=True)[source]¶
Calculates the T-test for the means of TWO INDEPENDENT samples of scores.
This is a two-sided test for the null hypothesis that 2 independent samples have identical average (expected) values. This test assumes that the populations have identical variances.
Parameters: a, b : array_like
The arrays must have the same shape, except in the dimension corresponding to axis (the first, by default).
axis : int, optional
Axis can equal None (ravel array first), or an integer (the axis over which to operate on a and b).
equal_var : bool, optional
Returns: t : float or array
The calculated t-statistic.
prob : float or array
The two-tailed p-value.
Notes
We can use this test, if we observe two independent samples from the same or different population, e.g. exam scores of boys and girls or of two ethnic groups. The test measures whether the average (expected) value differs significantly across samples. If we observe a large p-value, for example larger than 0.05 or 0.1, then we cannot reject the null hypothesis of identical average scores. If the p-value is smaller than the threshold, e.g. 1%, 5% or 10%, then we reject the null hypothesis of equal averages.
References
[R315] (1, 2) http://en.wikipedia.org/wiki/T-test#Independent_two-sample_t-test [R316] (1, 2) http://en.wikipedia.org/wiki/Welch%27s_t_test Examples
>>> from scipy import stats >>> np.random.seed(12345678)
Test with sample with identical means:
>>> rvs1 = stats.norm.rvs(loc=5,scale=10,size=500) >>> rvs2 = stats.norm.rvs(loc=5,scale=10,size=500) >>> stats.ttest_ind(rvs1,rvs2) (0.26833823296239279, 0.78849443369564776) >>> stats.ttest_ind(rvs1,rvs2, equal_var = False) (0.26833823296239279, 0.78849452749500748)
ttest_ind underestimates p for unequal variances:
>>> rvs3 = stats.norm.rvs(loc=5, scale=20, size=500) >>> stats.ttest_ind(rvs1, rvs3) (-0.46580283298287162, 0.64145827413436174) >>> stats.ttest_ind(rvs1, rvs3, equal_var = False) (-0.46580283298287162, 0.64149646246569292)
When n1 != n2, the equal variance t-statistic is no longer equal to the unequal variance t-statistic:
>>> rvs4 = stats.norm.rvs(loc=5, scale=20, size=100) >>> stats.ttest_ind(rvs1, rvs4) (-0.99882539442782481, 0.3182832709103896) >>> stats.ttest_ind(rvs1, rvs4, equal_var = False) (-0.69712570584654099, 0.48716927725402048)
T-test with different means, variance, and n:
>>> rvs5 = stats.norm.rvs(loc=8, scale=20, size=100) >>> stats.ttest_ind(rvs1, rvs5) (-1.4679669854490653, 0.14263895620529152) >>> stats.ttest_ind(rvs1, rvs5, equal_var = False) (-0.94365973617132992, 0.34744170334794122)