This is documentation for an old release of SciPy (version 0.9.0). Read this page in the documentation of the latest stable release (version 1.15.1).
Discrete random variables take on only a countable number of values.
The commonly used distributions are included in SciPy and described in
this document. Each discrete distribution can take one extra integer
parameter: The relationship between the general distribution
and the standard distribution
is
which allows for shifting of the input. When a distribution generator
is initialized, the discrete distribution can either specify the
beginning and ending (integer) values and
which must be such that
in which case, it is assumed that the pdf function is specified on the
integers where
is a non-negative integer (
) and
is a positive integer multiplier. Alternatively, the two lists
and
can be provided directly in which case a dictionary is set up
internally to evaulate probabilities and generate random variates.
The probability mass function of a random variable X is defined as the probability that the random variable takes on a particular value.
This is also sometimes called the probability density function, although technically
is the probability density function for a discrete distribution [1] .
[1] | XXX: Unknown layout Plain Layout: Note that we will be using ![]() |
The cumulative distribution function is
and is also useful to be able to compute. Note that
The survival function is just
the probability that the random variable is strictly larger than .
The percent point function is the inverse of the cumulative distribution function and is
for discrete distributions, this must be modified for cases where
there is no such that
In these cases we choose
to be the smallest value
for which
. If
then we define
. This definition allows random variates to be defined in the same way
as with continuous rv’s using the inverse cdf on a uniform
distribution to generate random variates.
The inverse survival function is the inverse of the survival function
and is thus the smallest non-negative integer for which
or the smallest non-negative integer
for which
If desired, the hazard function and the cumulative hazard function could be defined as
and
Non-central moments are defined using the PDF
Central moments are computed similarly
The mean is the first moment
the variance is the second central moment
Skewness is defined as
while (Fisher) kurtosis is
so that a normal distribution has a kurtosis of zero.
The moment generating funtion is defined as
Moments are found as the derivatives of the moment generating function
evaluated at
To fit data to a distribution, maximizing the likelihood function is common. Alternatively, some distributions have well-known minimum variance unbiased estimators. These will be chosen by default, but the likelihood function will always be available for minimizing.
If is the PDF of a random-variable where
is a vector of parameters ( e.g.
and
), then for a collection of
independent samples from this distribution, the joint distribution the
random vector
is
The maximum likelihood estimate of the parameters are the parameters which maximize this function with
fixed and given by the data:
Where
Note that
and has special cases of
and
If or
or
we define
A Bernoulli random variable of parameter takes one of only two values
or
. The probability of success (
) is
, and the probability of failure (
) is
It can be thought of as a binomial random variable with
. The PMF is
for
and
A binomial random variable with parameters can be described as the sum of
independent Bernoulli random variables of parameter
Therefore, this random variable counts the number of successes in independent trials of a random experiment where the probability of
success is
where the incomplete beta integral is
Now
Named Planck because of its relationship to the black-body problem he solved.
The Poisson random variable counts the number of successes in independent Bernoulli trials in the limit as
and
where the probability of success in each trial is
and
is a constant. It can be used to approximate the Binomial random
variable or in it’s own right to count the number of events that occur
in the interval
for a process satisfying certain “sparsity “constraints. The functions are
The geometric random variable with parameter can be defined as the number of trials required to obtain a success
where the probability of success on each trial is
. Thus,
The negative binomial random variable with parameters and
can be defined as the number of extra independent trials (beyond
) required to accumulate a total of
successes where the probability of a success on each trial is
Equivalently, this random variable is the number of failures
encoutered while accumulating
successes during independent trials of an experiment that succeeds
with probability
Thus,
Recall that is the incomplete beta integral.
The hypergeometric random variable with parameters counts the number of “good “objects in a sample of size
chosen without replacement from a population of
objects where
is the number of “good “objects in the total population.
where (defining )
A random variable has the zeta distribution (also called the zipf
distribution) with parameter if it’s probability mass function is given by
where
is the Riemann zeta function. Other functions of this distribution are
where and
is the
polylogarithm function of
defined as
The logarimthic distribution with parameter has a probability mass function with terms proportional to the Taylor
series expansion of
where
is the Lerch Transcendent. Also define
Thus,
The discrete uniform distribution with parameters constructs a random variable that has an equal probability of being
any one of the integers in the half-open range
If
is not given it is assumed to be zero and the only parameter is
Therefore,
Defined over all integers for
Thus,
where is the polylogarithm function of order
evaluated at