fit#
- rv_histogram.fit(data, *args, **kwds)[source]#
Return estimates of shape (if applicable), location, and scale parameters from data. The default estimation method is Maximum Likelihood Estimation (MLE), but Method of Moments (MM) is also available.
Starting estimates for the fit are given by input arguments; for any arguments not provided with starting estimates,
self._fitstart(data)
is called to generate such.One can hold some parameters fixed to specific values by passing in keyword arguments
f0
,f1
, …,fn
(for shape parameters) andfloc
andfscale
(for location and scale parameters, respectively).- Parameters:
- dataarray_like or
CensoredData
instance Data to use in estimating the distribution parameters.
- arg1, arg2, arg3,…floats, optional
Starting value(s) for any shape-characterizing arguments (those not provided will be determined by a call to
_fitstart(data)
). No default value.- **kwdsfloats, optional
loc: initial guess of the distribution’s location parameter.
scale: initial guess of the distribution’s scale parameter.
Special keyword arguments are recognized as holding certain parameters fixed:
f0…fn : hold respective shape parameters fixed. Alternatively, shape parameters to fix can be specified by name. For example, if
self.shapes == "a, b"
,fa
andfix_a
are equivalent tof0
, andfb
andfix_b
are equivalent tof1
.floc : hold location parameter fixed to specified value.
fscale : hold scale parameter fixed to specified value.
optimizer : The optimizer to use. The optimizer must take
func
and starting position as the first two arguments, plusargs
(for extra arguments to pass to the function to be optimized) anddisp
. Thefit
method calls the optimizer withdisp=0
to suppress output. The optimizer must return the estimated parameters.method : The method to use. The default is “MLE” (Maximum Likelihood Estimate); “MM” (Method of Moments) is also available.
- dataarray_like or
- Returns:
- parameter_tupletuple of floats
Estimates for any shape parameters (if applicable), followed by those for location and scale. For most random variables, shape statistics will be returned, but there are exceptions (e.g.
norm
).
- Raises:
- TypeError, ValueError
If an input is invalid
FitError
If fitting fails or the fit produced would be invalid
Notes
With
method="MLE"
(default), the fit is computed by minimizing the negative log-likelihood function. A large, finite penalty (rather than infinite negative log-likelihood) is applied for observations beyond the support of the distribution.With
method="MM"
, the fit is computed by minimizing the L2 norm of the relative errors between the first k raw (about zero) data moments and the corresponding distribution moments, where k is the number of non-fixed parameters. More precisely, the objective function is:(((data_moments - dist_moments) / np.maximum(np.abs(data_moments), 1e-8))**2).sum()
where the constant
1e-8
avoids division by zero in case of vanishing data moments. Typically, this error norm can be reduced to zero. Note that the standard method of moments can produce parameters for which some data are outside the support of the fitted distribution; this implementation does nothing to prevent this.For either method, the returned answer is not guaranteed to be globally optimal; it may only be locally optimal, or the optimization may fail altogether. If the data contain any of
np.nan
,np.inf
, or-np.inf
, thefit
method will raise aRuntimeError
.When passing a
CensoredData
instance todata
, the log-likelihood function is defined as:\[\begin{split}l(\pmb{\theta}; k) & = \sum \log(f(k_u; \pmb{\theta})) + \sum \log(F(k_l; \pmb{\theta})) \\ & + \sum \log(1 - F(k_r; \pmb{\theta})) \\ & + \sum \log(F(k_{\text{high}, i}; \pmb{\theta}) - F(k_{\text{low}, i}; \pmb{\theta}))\end{split}\]where \(f\) and \(F\) are the pdf and cdf, respectively, of the function being fitted, \(\pmb{\theta}\) is the parameter vector, \(u\) are the indices of uncensored observations, \(l\) are the indices of left-censored observations, \(r\) are the indices of right-censored observations, subscripts “low”/”high” denote endpoints of interval-censored observations, and \(i\) are the indices of interval-censored observations.
Examples
Generate some data to fit: draw random variates from the
beta
distribution>>> import numpy as np >>> from scipy.stats import beta >>> a, b = 1., 2. >>> rng = np.random.default_rng() >>> x = beta.rvs(a, b, size=1000, random_state=rng)
Now we can fit all four parameters (
a
,b
,loc
andscale
):>>> a1, b1, loc1, scale1 = beta.fit(x) >>> a1, b1, loc1, scale1 (1.0198945204435628, 1.9484708982737828, 4.372241314917588e-05, 0.9979078845964814)
The fit can be done also using a custom optimizer:
>>> from scipy.optimize import minimize >>> def custom_optimizer(func, x0, args=(), disp=0): ... res = minimize(func, x0, args, method="slsqp", options={"disp": disp}) ... if res.success: ... return res.x ... raise RuntimeError('optimization routine failed') >>> a1, b1, loc1, scale1 = beta.fit(x, method="MLE", optimizer=custom_optimizer) >>> a1, b1, loc1, scale1 (1.0198821087258905, 1.948484145914738, 4.3705304486881485e-05, 0.9979104663953395)
We can also use some prior knowledge about the dataset: let’s keep
loc
andscale
fixed:>>> a1, b1, loc1, scale1 = beta.fit(x, floc=0, fscale=1) >>> loc1, scale1 (0, 1)
We can also keep shape parameters fixed by using
f
-keywords. To keep the zero-th shape parametera
equal 1, usef0=1
or, equivalently,fa=1
:>>> a1, b1, loc1, scale1 = beta.fit(x, fa=1, floc=0, fscale=1) >>> a1 1
Not all distributions return estimates for the shape parameters.
norm
for example just returns estimates for location and scale:>>> from scipy.stats import norm >>> x = norm.rvs(a, b, size=1000, random_state=123) >>> loc1, scale1 = norm.fit(x) >>> loc1, scale1 (0.92087172783841631, 2.0015750750324668)