scipy.stats.mstats.chisquare#
- scipy.stats.mstats.chisquare(f_obs, f_exp=None, ddof=0, axis=0)[source]#
- Calculate a one-way chi-square test. - The chi-square test tests the null hypothesis that the categorical data has the given frequencies. - Parameters
- f_obsarray_like
- Observed frequencies in each category. 
- f_exparray_like, optional
- Expected frequencies in each category. By default the categories are assumed to be equally likely. 
- ddofint, optional
- “Delta degrees of freedom”: adjustment to the degrees of freedom for the p-value. The p-value is computed using a chi-squared distribution with - k - 1 - ddofdegrees of freedom, where k is the number of observed frequencies. The default value of ddof is 0.
- axisint or None, optional
- The axis of the broadcast result of f_obs and f_exp along which to apply the test. If axis is None, all values in f_obs are treated as a single data set. Default is 0. 
 
- Returns
- chisqfloat or ndarray
- The chi-squared test statistic. The value is a float if axis is None or f_obs and f_exp are 1-D. 
- pfloat or ndarray
- The p-value of the test. The value is a float if ddof and the return value chisq are scalars. 
 
 - See also - scipy.stats.power_divergence
- scipy.stats.fisher_exact
- Fisher exact test on a 2x2 contingency table. 
- scipy.stats.barnard_exact
- An unconditional exact test. An alternative to chi-squared test for small sample sizes. 
 - Notes - This test is invalid when the observed or expected frequencies in each category are too small. A typical rule is that all of the observed and expected frequencies should be at least 5. According to [3], the total number of samples is recommended to be greater than 13, otherwise exact tests (such as Barnard’s Exact test) should be used because they do not overreject. - Also, the sum of the observed and expected frequencies must be the same for the test to be valid; - chisquareraises an error if the sums do not agree within a relative tolerance of- 1e-8.- The default degrees of freedom, k-1, are for the case when no parameters of the distribution are estimated. If p parameters are estimated by efficient maximum likelihood then the correct degrees of freedom are k-1-p. If the parameters are estimated in a different way, then the dof can be between k-1-p and k-1. However, it is also possible that the asymptotic distribution is not chi-square, in which case this test is not appropriate. - References - 1
- Lowry, Richard. “Concepts and Applications of Inferential Statistics”. Chapter 8. https://web.archive.org/web/20171022032306/http://vassarstats.net:80/textbook/ch8pt1.html 
- 2
- “Chi-squared test”, https://en.wikipedia.org/wiki/Chi-squared_test 
- 3
- Pearson, Karl. “On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling”, Philosophical Magazine. Series 5. 50 (1900), pp. 157-175. 
 - Examples - When just f_obs is given, it is assumed that the expected frequencies are uniform and given by the mean of the observed frequencies. - >>> from scipy.stats import chisquare >>> chisquare([16, 18, 16, 14, 12, 12]) (2.0, 0.84914503608460956) - With f_exp the expected frequencies can be given. - >>> chisquare([16, 18, 16, 14, 12, 12], f_exp=[16, 16, 16, 16, 16, 8]) (3.5, 0.62338762774958223) - When f_obs is 2-D, by default the test is applied to each column. - >>> obs = np.array([[16, 18, 16, 14, 12, 12], [32, 24, 16, 28, 20, 24]]).T >>> obs.shape (6, 2) >>> chisquare(obs) (array([ 2. , 6.66666667]), array([ 0.84914504, 0.24663415])) - By setting - axis=None, the test is applied to all data in the array, which is equivalent to applying the test to the flattened array.- >>> chisquare(obs, axis=None) (23.31034482758621, 0.015975692534127565) >>> chisquare(obs.ravel()) (23.31034482758621, 0.015975692534127565) - ddof is the change to make to the default degrees of freedom. - >>> chisquare([16, 18, 16, 14, 12, 12], ddof=1) (2.0, 0.73575888234288467) - The calculation of the p-values is done by broadcasting the chi-squared statistic with ddof. - >>> chisquare([16, 18, 16, 14, 12, 12], ddof=[0,1,2]) (2.0, array([ 0.84914504, 0.73575888, 0.5724067 ])) - f_obs and f_exp are also broadcast. In the following, f_obs has shape (6,) and f_exp has shape (2, 6), so the result of broadcasting f_obs and f_exp has shape (2, 6). To compute the desired chi-squared statistics, we use - axis=1:- >>> chisquare([16, 18, 16, 14, 12, 12], ... f_exp=[[16, 16, 16, 16, 16, 8], [8, 20, 20, 16, 12, 12]], ... axis=1) (array([ 3.5 , 9.25]), array([ 0.62338763, 0.09949846]))