SciPy

Continuous Statistical Distributions

Overview

All distributions will have location (L) and Scale (S) parameters along with any shape parameters needed, the names for the shape parameters will vary. Standard form for the distributions will be given where \(L=0.0\) and \(S=1.0.\) The nonstandard forms can be obtained for the various functions using (note \(U\) is a standard uniform random variate).

Function Name Standard Function Transformation
Cumulative Distribution Function (CDF) \(F\left(x\right)\) \(F\left(x;L,S\right)=F\left(\frac{\left(x-L\right)}{S}\right)\)
Probability Density Function (PDF) \(f\left(x\right)=F^{\prime}\left(x\right)\) \(f\left(x;L,S\right)=\frac{1}{S}f\left(\frac{\left(x-L\right)}{S}\right)\)
Percent Point Function (PPF) \(G\left(q\right)=F^{-1}\left(q\right)\) \(G\left(q;L,S\right)=L+SG\left(q\right)\)
Probability Sparsity Function (PSF) \(g\left(q\right)=G^{\prime}\left(q\right)\) \(g\left(q;L,S\right)=Sg\left(q\right)\)
Hazard Function (HF) \(h_{a}\left(x\right)=\frac{f\left(x\right)}{1-F\left(x\right)}\) \(h_{a}\left(x;L,S\right)=\frac{1}{S}h_{a}\left(\frac{\left(x-L\right)}{S}\right)\)
Cumulative Hazard Functon (CHF) \(H_{a}\left(x\right)=\) \(\log\frac{1}{1-F\left(x\right)}\) \(H_{a}\left(x;L,S\right)=H_{a}\left(\frac{\left(x-L\right)}{S}\right)\)
Survival Function (SF) \(S\left(x\right)=1-F\left(x\right)\) \(S\left(x;L,S\right)=S\left(\frac{\left(x-L\right)}{S}\right)\)
Inverse Survival Function (ISF) \(Z\left(\alpha\right)=S^{-1}\left(\alpha\right)=G\left(1-\alpha\right)\) \(Z\left(\alpha;L,S\right)=L+SZ\left(\alpha\right)\)
Moment Generating Function (MGF) \(M_{Y}\left(t\right)=E\left[e^{Yt}\right]\) \(M_{X}\left(t\right)=e^{Lt}M_{Y}\left(St\right)\)
Random Variates \(Y=G\left(U\right)\) \(X=L+SY\)
(Differential) Entropy \(h\left[Y\right]=-\int f\left(y\right)\log f\left(y\right)dy\) \(h\left[X\right]=h\left[Y\right]+\log S\)
(Non-central) Moments \(\mu_{n}^{\prime}=E\left[Y^{n}\right]\) \(E\left[X^{n}\right]=L^{n}\sum_{k=0}^{N}\left(\begin{array}{c} n\\ k\end{array}\right)\left(\frac{S}{L}\right)^{k}\mu_{k}^{\prime}\)
Central Moments \(\mu_{n}=E\left[\left(Y-\mu\right)^{n}\right]\) \(E\left[\left(X-\mu_{X}\right)^{n}\right]=S^{n}\mu_{n}\)
mean (mode, median), var \(\mu,\,\mu_{2}\) \(L+S\mu,\, S^{2}\mu_{2}\)
skewness, kurtosis \(\gamma_{1}=\frac{\mu_{3}}{\left(\mu_{2}\right)^{3/2}},\,\) \(\gamma_{2}=\frac{\mu_{4}}{\left(\mu_{2}\right)^{2}}-3\) \(\gamma_{1},\,\gamma_{2}\)

Moments

Non-central moments are defined using the PDF

\[\mu_{n}^{\prime}=\int_{-\infty}^{\infty}x^{n}f\left(x\right)dx.\]

Note, that these can always be computed using the PPF. Substitute \(x=G\left(q\right)\) in the above equation and get

\[\mu_{n}^{\prime}=\int_{0}^{1}G^{n}\left(q\right)dq\]

which may be easier to compute numerically. Note that \(q=F\left(x\right)\) so that \(dq=f\left(x\right)dx.\) Central moments are computed similarly \(\mu=\mu_{1}^{\prime}\)

\[ \begin{eqnarray*} \mu_{n} & = & \int_{-\infty}^{\infty}\left(x-\mu\right)^{n}f\left(x\right)dx\\ & = & \int_{0}^{1}\left(G\left(q\right)-\mu\right)^{n}dq\\ & = & \sum_{k=0}^{n}\left(\begin{array}{c} n\\ k\end{array}\right)\left(-\mu\right)^{k}\mu_{n-k}^{\prime}\end{eqnarray*}\]

In particular

\[ \begin{eqnarray*} \mu_{3} & = & \mu_{3}^{\prime}-3\mu\mu_{2}^{\prime}+2\mu^{3}\\ & = & \mu_{3}^{\prime}-3\mu\mu_{2}-\mu^{3}\\ \mu_{4} & = & \mu_{4}^{\prime}-4\mu\mu_{3}^{\prime}+6\mu^{2}\mu_{2}^{\prime}-3\mu^{4}\\ & = & \mu_{4}^{\prime}-4\mu\mu_{3}-6\mu^{2}\mu_{2}-\mu^{4}\end{eqnarray*}\]

Skewness is defined as

\[\gamma_{1}=\sqrt{\beta_{1}}=\frac{\mu_{3}}{\mu_{2}^{3/2}}\]

while (Fisher) kurtosis is

\[\gamma_{2}=\frac{\mu_{4}}{\mu_{2}^{2}}-3,\]

so that a normal distribution has a kurtosis of zero.

Median and mode

The median, \(m_{n}\) is defined as the point at which half of the density is on one side and half on the other. In other words, \(F\left(m_{n}\right)=\frac{1}{2}\) so that

\[m_{n}=G\left(\frac{1}{2}\right).\]

In addition, the mode, \(m_{d}\) , is defined as the value for which the probability density function reaches it’s peak

\[m_{d}=\arg\max_{x}f\left(x\right).\]

Fitting data

To fit data to a distribution, maximizing the likelihood function is common. Alternatively, some distributions have well-known minimum variance unbiased estimators. These will be chosen by default, but the likelihood function will always be available for minimizing.

If \(f\left(x;\boldsymbol{\theta}\right)\) is the PDF of a random-variable where \(\boldsymbol{\theta}\) is a vector of parameters ( e.g. \(L\) and \(S\) ), then for a collection of \(N\) independent samples from this distribution, the joint distribution the random vector \(\mathbf{x}\) is

\[f\left(\mathbf{x};\boldsymbol{\theta}\right)=\prod_{i=1}^{N}f\left(x_{i};\boldsymbol{\theta}\right).\]

The maximum likelihood estimate of the parameters \(\boldsymbol{\theta}\) are the parameters which maximize this function with \(\mathbf{x}\) fixed and given by the data:

\[ \begin{eqnarray*} \boldsymbol{\theta}_{es} & = & \arg\max_{\boldsymbol{\theta}}f\left(\mathbf{x};\boldsymbol{\theta}\right)\\ & = & \arg\min_{\boldsymbol{\theta}}l_{\mathbf{x}}\left(\boldsymbol{\theta}\right).\end{eqnarray*}\]

Where

\[ \begin{eqnarray*} l_{\mathbf{x}}\left(\boldsymbol{\theta}\right) & = & -\sum_{i=1}^{N}\log f\left(x_{i};\boldsymbol{\theta}\right)\\ & = & -N\overline{\log f\left(x_{i};\boldsymbol{\theta}\right)}\end{eqnarray*}\]

Note that if \(\boldsymbol{\theta}\) includes only shape parameters, the location and scale-parameters can be fit by replacing \(x_{i}\) with \(\left(x_{i}-L\right)/S\) in the log-likelihood function adding \(N\log S\) and minimizing, thus

\[ \begin{eqnarray*} l_{\mathbf{x}}\left(L,S;\boldsymbol{\theta}\right) & = & N\log S-\sum_{i=1}^{N}\log f\left(\frac{x_{i}-L}{S};\boldsymbol{\theta}\right)\\ & = & N\log S+l_{\frac{\mathbf{x}-S}{L}}\left(\boldsymbol{\theta}\right)\end{eqnarray*}\]

If desired, sample estimates for \(L\) and \(S\) (not necessarily maximum likelihood estimates) can be obtained from samples estimates of the mean and variance using

\[ \begin{eqnarray*} \hat{S} & = & \sqrt{\frac{\hat{\mu}_{2}}{\mu_{2}}}\\ \hat{L} & = & \hat{\mu}-\hat{S}\mu\end{eqnarray*}\]

where \(\mu\) and \(\mu_{2}\) are assumed known as the mean and variance of the untransformed distribution (when \(L=0\) and \(S=1\) ) and

\[ \begin{eqnarray*} \hat{\mu} & = & \frac{1}{N}\sum_{i=1}^{N}x_{i}=\bar{\mathbf{x}}\\ \hat{\mu}_{2} & = & \frac{1}{N-1}\sum_{i=1}^{N}\left(x_{i}-\hat{\mu}\right)^{2}=\frac{N}{N-1}\overline{\left(\mathbf{x}-\bar{\mathbf{x}}\right)^{2}}\end{eqnarray*}\]

Standard notation for mean

We will use

\[\overline{y\left(\mathbf{x}\right)}=\frac{1}{N}\sum_{i=1}^{N}y\left(x_{i}\right)\]

where \(N\) should be clear from context as the number of samples \(x_{i}\)

References

Continuous Distributions in scipy.stats