Discrete Statistical Distributions#

Discrete random variables take on only a countable number of values. The commonly used distributions are included in SciPy and described in this document. Each discrete distribution can take one extra integer parameter: \(L.\) The relationship between the general distribution \(p\) and the standard distribution \(p_{0}\) is

\[p\left(x\right) = p_{0}\left(x-L\right)\]

which allows for shifting of the input. When a distribution generator is initialized, the discrete distribution can either specify the beginning and ending (integer) values \(a\) and \(b\) which must be such that

\[p_{0}\left(x\right) = 0\quad x < a \textrm{ or } x > b\]

in which case, it is assumed that the pdf function is specified on the integers \(a+mk\leq b\) where \(k\) is a non-negative integer ( \(0,1,2,\ldots\) ) and \(m\) is a positive integer multiplier. Alternatively, the two lists \(x_{k}\) and \(p\left(x_{k}\right)\) can be provided directly in which case a dictionary is set up internally to evaluate probabilities and generate random variates.

Probability Mass Function (PMF)#

The probability mass function of a random variable X is defined as the probability that the random variable takes on a particular value.


This is also sometimes called the probability density function, although technically


is the probability density function for a discrete distribution 1 .


XXX: Unknown layout Plain Layout: Note that we will be using \(p\) to represent the probability mass function and a parameter (a XXX: probability). The usage should be obvious from context.

Cumulative Distribution Function (CDF)#

The cumulative distribution function is

\[F\left(x\right)=P\left[X\leq x\right]=\sum_{x_{k}\leq x}p\left(x_{k}\right)\]

and is also useful to be able to compute. Note that


Survival Function#

The survival function is just


the probability that the random variable is strictly larger than \(k\) .

Percent Point Function (Inverse CDF)#

The percent point function is the inverse of the cumulative distribution function and is


for discrete distributions, this must be modified for cases where there is no \(x_{k}\) such that \(F\left(x_{k}\right)=q.\) In these cases we choose \(G\left(q\right)\) to be the smallest value \(x_{k}=G\left(q\right)\) for which \(F\left(x_{k}\right)\geq q\) . If \(q=0\) then we define \(G\left(0\right)=a-1\) . This definition allows random variates to be defined in the same way as with continuous rv’s using the inverse cdf on a uniform distribution to generate random variates.

Inverse survival function#

The inverse survival function is the inverse of the survival function


and is thus the smallest non-negative integer \(k\) for which \(F\left(k\right)\geq1-\alpha\) or the smallest non-negative integer \(k\) for which \(S\left(k\right)\leq\alpha.\)

Hazard functions#

If desired, the hazard function and the cumulative hazard function could be defined as



\[H\left(x\right)=\sum_{x_{k}\leq x}h\left(x_{k}\right)=\sum_{x_{k}\leq x}\frac{F\left(x_{k}\right)-F\left(x_{k-1}\right)}{1-F\left(x_{k}\right)}.\]


Non-central moments are defined using the PDF


Central moments are computed similarly \(\mu=\mu_{1}^{\prime}\)

\begin{eqnarray*} \mu_{m}=E\left[\left(X-\mu\right)^{m}\right] & = & \sum_{k}\left(x_{k}-\mu\right)^{m}p\left(x_{k}\right)\\ & = & \sum_{k=0}^{m}\left(-1\right)^{m-k}\left(\begin{array}{c} m\\ k\end{array}\right)\mu^{m-k}\mu_{k}^{\prime}\end{eqnarray*}

The mean is the first moment


the variance is the second central moment


Skewness is defined as


while (Fisher) kurtosis is


so that a normal distribution has a kurtosis of zero.

Moment generating function#

The moment generating function is defined as


Moments are found as the derivatives of the moment generating function evaluated at \(0.\)

Fitting data#

To fit data to a distribution, maximizing the likelihood function is common. Alternatively, some distributions have well-known minimum variance unbiased estimators. These will be chosen by default, but the likelihood function will always be available for minimizing.

If \(f_{i}\left(k;\boldsymbol{\theta}\right)\) is the PDF of a random-variable where \(\boldsymbol{\theta}\) is a vector of parameters ( e.g. \(L\) and \(S\) ), then for a collection of \(N\) independent samples from this distribution, the joint distribution the random vector \(\mathbf{k}\) is


The maximum likelihood estimate of the parameters \(\boldsymbol{\theta}\) are the parameters which maximize this function with \(\mathbf{x}\) fixed and given by the data:

\begin{eqnarray*} \hat{\boldsymbol{\theta}} & = & \arg\max_{\boldsymbol{\theta}}f\left(\mathbf{k};\boldsymbol{\theta}\right)\\ & = & \arg\min_{\boldsymbol{\theta}}l_{\mathbf{k}}\left(\boldsymbol{\theta}\right).\end{eqnarray*}


\begin{eqnarray*} l_{\mathbf{k}}\left(\boldsymbol{\theta}\right) & = & -\sum_{i=1}^{N}\log f\left(k_{i};\boldsymbol{\theta}\right)\\ & = & -N\overline{\log f\left(k_{i};\boldsymbol{\theta}\right)}\end{eqnarray*}

Standard notation for mean#

We will use


where \(N\) should be clear from context.


Note that


and has special cases of

\begin{eqnarray*} 0! & \equiv & 1\\ k! & \equiv & 0\quad k<0\end{eqnarray*}


\[\begin{split}\left(\begin{array}{c} n\\ k\end{array}\right)=\frac{n!}{\left(n-k\right)!k!}.\end{split}\]

If \(n<0\) or \(k<0\) or \(k>n\) we define \(\left(\begin{array}{c} n\\ k\end{array}\right)=0\)

Discrete Distributions in scipy.stats#