Continuous Statistical Distributions¶
Overview¶
All distributions will have location (L) and Scale (S) parameters along with any shape parameters needed, the names for the shape parameters will vary. Standard form for the distributions will be given where \(L=0.0\) and \(S=1.0.\) The nonstandard forms can be obtained for the various functions using (note \(U\) is a standard uniform random variate).
Function Name |
Standard Function |
Transformation |
---|---|---|
Cumulative Distribution Function (CDF) |
\(F\left(x\right)\) |
\(F\left(x;L,S\right)=F\left(\frac{\left(x-L\right)}{S}\right)\) |
Probability Density Function (PDF) |
\(f\left(x\right)=F^{\prime}\left(x\right)\) |
\(f\left(x;L,S\right)=\frac{1}{S}f\left(\frac{\left(x-L\right)}{S}\right)\) |
Percent Point Function (PPF) |
\(G\left(q\right)=F^{-1}\left(q\right)\) |
\(G\left(q;L,S\right)=L+SG\left(q\right)\) |
Probability Sparsity Function (PSF) |
\(g\left(q\right)=G^{\prime}\left(q\right)\) |
\(g\left(q;L,S\right)=Sg\left(q\right)\) |
Hazard Function (HF) |
\(h_{a}\left(x\right)=\frac{f\left(x\right)}{1-F\left(x\right)}\) |
\(h_{a}\left(x;L,S\right)=\frac{1}{S}h_{a}\left(\frac{\left(x-L\right)}{S}\right)\) |
Cumulative Hazard Function (CHF) |
\(H_{a}\left(x\right)=\) \(\log\frac{1}{1-F\left(x\right)}\) |
\(H_{a}\left(x;L,S\right)=H_{a}\left(\frac{\left(x-L\right)}{S}\right)\) |
Survival Function (SF) |
\(S\left(x\right)=1-F\left(x\right)\) |
\(S\left(x;L,S\right)=S\left(\frac{\left(x-L\right)}{S}\right)\) |
Inverse Survival Function (ISF) |
\(Z\left(\alpha\right)=S^{-1}\left(\alpha\right)=G\left(1-\alpha\right)\) |
\(Z\left(\alpha;L,S\right)=L+SZ\left(\alpha\right)\) |
Moment Generating Function (MGF) |
\(M_{Y}\left(t\right)=E\left[e^{Yt}\right]\) |
\(M_{X}\left(t\right)=e^{Lt}M_{Y}\left(St\right)\) |
Random Variates |
\(Y=G\left(U\right)\) |
\(X=L+SY\) |
(Differential) Entropy |
\(h\left[Y\right]=-\int f\left(y\right)\log f\left(y\right)dy\) |
\(h\left[X\right]=h\left[Y\right]+\log S\) |
(Non-central) Moments |
\(\mu_{n}^{\prime}=E\left[Y^{n}\right]\) |
\(E\left[X^{n}\right]=L^{n}\sum_{k=0}^{N}\left(\begin{array}{c} n\\ k\end{array}\right)\left(\frac{S}{L}\right)^{k}\mu_{k}^{\prime}\) |
Central Moments |
\(\mu_{n}=E\left[\left(Y-\mu\right)^{n}\right]\) |
\(E\left[\left(X-\mu_{X}\right)^{n}\right]=S^{n}\mu_{n}\) |
mean (mode, median), var |
\(\mu,\,\mu_{2}\) |
\(L+S\mu,\, S^{2}\mu_{2}\) |
skewness |
\(\gamma_{1}=\frac{\mu_{3}}{\left(\mu_{2}\right)^{3/2}}\) |
\(\gamma_{1}\) |
kurtosis |
\(\gamma_{2}=\frac{\mu_{4}}{\left(\mu_{2}\right)^{2}}-3\) |
\(\gamma_{2}\) |
Moments¶
Non-central moments are defined using the PDF
Note, that these can always be computed using the PPF. Substitute \(x=G\left(q\right)\) in the above equation and get
which may be easier to compute numerically. Note that \(q=F\left(x\right)\) so that \(dq=f\left(x\right)dx.\) Central moments are computed similarly \(\mu=\mu_{1}^{\prime}\)
In particular
Skewness is defined as
while (Fisher) kurtosis is
so that a normal distribution has a kurtosis of zero.
Median and mode¶
The median, \(m_{n}\) is defined as the point at which half of the density is on one side and half on the other. In other words, \(F\left(m_{n}\right)=\frac{1}{2}\) so that
In addition, the mode, \(m_{d}\) , is defined as the value for which the probability density function reaches it’s peak
Fitting data¶
To fit data to a distribution, maximizing the likelihood function is common. Alternatively, some distributions have well-known minimum variance unbiased estimators. These will be chosen by default, but the likelihood function will always be available for minimizing.
If \(f\left(x;\boldsymbol{\theta}\right)\) is the PDF of a random-variable where \(\boldsymbol{\theta}\) is a vector of parameters ( e.g. \(L\) and \(S\) ), then for a collection of \(N\) independent samples from this distribution, the joint distribution the random vector \(\mathbf{x}\) is
The maximum likelihood estimate of the parameters \(\boldsymbol{\theta}\) are the parameters which maximize this function with \(\mathbf{x}\) fixed and given by the data:
Where
Note that if \(\boldsymbol{\theta}\) includes only shape parameters, the location and scale-parameters can be fit by replacing \(x_{i}\) with \(\left(x_{i}-L\right)/S\) in the log-likelihood function adding \(N\log S\) and minimizing, thus
If desired, sample estimates for \(L\) and \(S\) (not necessarily maximum likelihood estimates) can be obtained from samples estimates of the mean and variance using
where \(\mu\) and \(\mu_{2}\) are assumed known as the mean and variance of the untransformed distribution (when \(L=0\) and \(S=1\) ) and
Standard notation for mean¶
We will use
where \(N\) should be clear from context as the number of samples \(x_{i}\)
References¶
Documentation for ranlib, rv2, cdflib
Eric Weisstein’s world of mathematics http://mathworld.wolfram.com/, http://mathworld.wolfram.com/topics/StatisticalDistributions.html
Documentation to Regress+ by Michael McLaughlin item Engineering and Statistics Handbook (NIST), https://www.itl.nist.gov/div898/handbook/
Documentation for DATAPLOT from NIST, https://www.itl.nist.gov/div898/software/dataplot/distribu.htm
Norman Johnson, Samuel Kotz, and N. Balakrishnan Continuous Univariate Distributions, second edition, Volumes I and II, Wiley & Sons, 1994.
In the tutorials several special functions appear repeatedly and are listed here.
Symbol |
Description |
Definition |
---|---|---|
\(\gamma\left(s, x\right)\) |
lower incomplete Gamma function |
\(\int_0^x t^{s-1} e^{-t} dt\) |
\(\Gamma\left(s, x\right)\) |
upper incomplete Gamma function |
\(\int_x^\infty t^{s-1} e^{-t} dt\) |
\(B\left(x;a,b\right)\) |
incomplete Beta function |
\(\int_{0}^{x} t^{a-1}\left(1-t\right)^{b-1} dt\) |
\(I\left(x;a,b\right)\) |
regularized incomplete Beta function |
\(\frac{\Gamma\left(a+b\right)}{\Gamma\left(a\right)\Gamma\left(b\right)} \int_{0}^{x} t^{a-1}\left(1-t\right)^{b-1} dt\) |
\(\phi\left(x\right)\) |
PDF for normal distribution |
\(\frac{1}{\sqrt{2\pi}}e^{-x^{2}/2}\) |
\(\Phi\left(x\right)\) |
CDF for normal distribution |
\(\int_{-\infty}^{x}\phi\left(t\right) dt = \frac{1}{2}+\frac{1}{2}\mathrm{erf}\left(\frac{x}{\sqrt{2}}\right)\) |
\(\psi\left(z\right)\) |
digamma function |
\(\frac{d}{dz} \log\left(\Gamma\left(z\right)\right)\) |
\(\psi_{n}\left(z\right)\) |
polygamma function |
\(\frac{d^{n+1}}{dz^{n+1}}\log\left(\Gamma\left(z\right)\right)\) |
\(I_{\nu}\left(y\right)\) |
modified Bessel function of the first kind |
|
\(\mathrm{Ei}(\mathrm{z})\) |
exponential integral |
\(-\int_{-x}^\infty \frac{e^{-t}}{t} dt\) |
\(\zeta\left(n\right)\) |
Riemann zeta function |
\(\sum_{k=1}^{\infty} \frac{1}{k^{n}}\) |
\(\zeta\left(n,z\right)\) |
Hurwitz zeta function |
\(\sum_{k=0}^{\infty} \frac{1}{\left(k+z\right)^{n}}\) |
\(\,{}_{p}F_{q}(a_{1},\ldots,a_{p};b_{1},\ldots,b_{q};z)\) |
Hypergeometric function |
\(\sum_{n=0}^{\infty} {\frac{(a_{1})_{n}\cdots(a_{p})_{n}}{(b_{1})_{n}\cdots(b_{q})_{n}}} \,{\frac{z^{n}}{n!}}\) |
Continuous Distributions in scipy.stats
¶
- Alpha Distribution
- Anglit Distribution
- Arcsine Distribution
- Beta Distribution
- Beta Prime Distribution
- Bradford Distribution
- Burr Distribution
- Cauchy Distribution
- Chi Distribution
- Chi-squared Distribution
- Cosine Distribution
- Double Gamma Distribution
- Double Weibull Distribution
- Erlang Distribution
- Exponential Distribution
- Exponentiated Weibull Distribution
- Exponential Power Distribution
- Fatigue Life (Birnbaum-Saunders) Distribution
- Fisk (Log Logistic) Distribution
- Folded Cauchy Distribution
- Folded Normal Distribution
- Fratio (or F) Distribution
- Gamma Distribution
- Generalized Logistic Distribution
- Generalized Pareto Distribution
- Generalized Exponential Distribution
- Generalized Extreme Value Distribution
- Generalized Gamma Distribution
- Generalized Half-Logistic Distribution
- Generalized Normal Distribution
- Gilbrat Distribution
- Gompertz (Truncated Gumbel) Distribution
- Gumbel (LogWeibull, Fisher-Tippetts, Type I Extreme Value) Distribution
- Gumbel Left-skewed (for minimum order statistic) Distribution
- HalfCauchy Distribution
- HalfNormal Distribution
- Half-Logistic Distribution
- Hyperbolic Secant Distribution
- Gauss Hypergeometric Distribution
- Inverted Gamma Distribution
- Inverse Normal (Inverse Gaussian) Distribution
- Inverted Weibull Distribution
- Johnson SB Distribution
- Johnson SU Distribution
- KSone Distribution
- KStwo Distribution
- Laplace (Double Exponential, Bilateral Exponential) Distribution
- Left-skewed Lévy Distribution
- Lévy Distribution
- Logistic (Sech-squared) Distribution
- Log Double Exponential (Log-Laplace) Distribution
- Log Gamma Distribution
- Log Normal (Cobb-Douglass) Distribution
- Maxwell Distribution
- Mielke’s Beta-Kappa Distribution
- Nakagami Distribution
- Noncentral chi-squared Distribution
- Noncentral F Distribution
- Noncentral t Distribution
- Normal Distribution
- Normal Inverse Gaussian Distribution
- Pareto Distribution
- Pareto Second Kind (Lomax) Distribution
- Power Log Normal Distribution
- Power Normal Distribution
- Power-function Distribution
- R-distribution Distribution
- Rayleigh Distribution
- Rice Distribution
- Reciprocal Distribution
- Reciprocal Inverse Gaussian Distribution
- Semicircular Distribution
- Student t Distribution
- Trapezoidal Distribution
- Triangular Distribution
- Truncated Exponential Distribution
- Truncated Normal Distribution
- Tukey-Lambda Distribution
- Uniform Distribution
- Von Mises Distribution
- Wald Distribution
- Weibull Maximum Extreme Value Distribution
- Weibull Minimum Extreme Value Distribution
- Wrapped Cauchy Distribution