CensoredData#
- class scipy.stats.CensoredData(uncensored=None, *, left=None, right=None, interval=None)[source]#
Instances of this class represent censored data.
Instances may be passed to the
fit
method of continuous univariate SciPy distributions for maximum likelihood estimation. The only method of the univariate continuous distributions that understandsCensoredData
is thefit
method. An instance ofCensoredData
can not be passed to methods such aspdf
andcdf
.An observation is said to be censored when the precise value is unknown, but it has a known upper and/or lower bound. The conventional terminology is:
left-censored: an observation is below a certain value but it is unknown by how much.
right-censored: an observation is above a certain value but it is unknown by how much.
interval-censored: an observation lies somewhere on an interval between two values.
Left-, right-, and interval-censored data can be represented by
CensoredData
.For convenience, the class methods
left_censored
andright_censored
are provided to create aCensoredData
instance from a single one-dimensional array of measurements and a corresponding boolean array to indicate which measurements are censored. The class methodinterval_censored
accepts two one-dimensional arrays that hold the lower and upper bounds of the intervals.- Parameters:
- uncensoredarray_like, 1D
Uncensored observations.
- leftarray_like, 1D
Left-censored observations.
- rightarray_like, 1D
Right-censored observations.
- intervalarray_like, 2D, with shape (m, 2)
Interval-censored observations. Each row
interval[k, :]
represents the interval for the kth interval-censored observation.
Methods
__len__
()The number of values (censored and not censored).
interval_censored
(low, high)Create a
CensoredData
instance of interval-censored data.left_censored
(x, censored)Create a
CensoredData
instance of left-censored data.Number of censored values.
right_censored
(x, censored)Create a
CensoredData
instance of right-censored data.Notes
In the input array interval, the lower bound of the interval may be
-inf
, and the upper bound may beinf
, but at least one must be finite. When the lower bound is-inf
, the row represents a left- censored observation, and when the upper bound isinf
, the row represents a right-censored observation. If the length of an interval is 0 (i.e.interval[k, 0] == interval[k, 1]
, the observation is treated as uncensored. So one can represent all the types of censored and uncensored data ininterval
, but it is generally more convenient to use uncensored, left and right for uncensored, left-censored and right-censored observations, respectively.Examples
In the most general case, a censored data set may contain values that are left-censored, right-censored, interval-censored, and uncensored. For example, here we create a data set with five observations. Two are uncensored (values 1 and 1.5), one is a left-censored observation of 0, one is a right-censored observation of 10 and one is interval-censored in the interval [2, 3].
>>> import numpy as np >>> from scipy.stats import CensoredData >>> data = CensoredData(uncensored=[1, 1.5], left=[0], right=[10], ... interval=[[2, 3]]) >>> print(data) CensoredData(5 values: 2 not censored, 1 left-censored, 1 right-censored, 1 interval-censored)
Equivalently,
>>> data = CensoredData(interval=[[1, 1], ... [1.5, 1.5], ... [-np.inf, 0], ... [10, np.inf], ... [2, 3]]) >>> print(data) CensoredData(5 values: 2 not censored, 1 left-censored, 1 right-censored, 1 interval-censored)
A common case is to have a mix of uncensored observations and censored observations that are all right-censored (or all left-censored). For example, consider an experiment in which six devices are started at various times and left running until they fail. Assume that time is measured in hours, and the experiment is stopped after 30 hours, even if all the devices have not failed by that time. We might end up with data such as this:
Device Start-time Fail-time Time-to-failure 1 0 13 13 2 2 24 22 3 5 22 17 4 8 23 15 5 10 *** >20 6 12 *** >18
Two of the devices had not failed when the experiment was stopped; the observations of the time-to-failure for these two devices are right-censored. We can represent this data with
>>> data = CensoredData(uncensored=[13, 22, 17, 15], right=[20, 18]) >>> print(data) CensoredData(6 values: 4 not censored, 2 right-censored)
Alternatively, we can use the method
CensoredData.right_censored
to create a representation of this data. The time-to-failure observations are put the listttf
. Thecensored
list indicates which values inttf
are censored.>>> ttf = [13, 22, 17, 15, 20, 18] >>> censored = [False, False, False, False, True, True]
Pass these lists to
CensoredData.right_censored
to create an instance ofCensoredData
.>>> data = CensoredData.right_censored(ttf, censored) >>> print(data) CensoredData(6 values: 4 not censored, 2 right-censored)
If the input data is interval censored and already stored in two arrays, one holding the low end of the intervals and another holding the high ends, the class method
interval_censored
can be used to create theCensoredData
instance.This example creates an instance with four interval-censored values. The intervals are [10, 11], [0.5, 1], [2, 3], and [12.5, 13.5].
>>> a = [10, 0.5, 2, 12.5] # Low ends of the intervals >>> b = [11, 1.0, 3, 13.5] # High ends of the intervals >>> data = CensoredData.interval_censored(low=a, high=b) >>> print(data) CensoredData(4 values: 0 not censored, 4 interval-censored)
Finally, we create and censor some data from the
weibull_min
distribution, and then fitweibull_min
to that data. We’ll assume that the location parameter is known to be 0.>>> from scipy.stats import weibull_min >>> rng = np.random.default_rng()
Create the random data set.
>>> x = weibull_min.rvs(2.5, loc=0, scale=30, size=250, random_state=rng) >>> x[x > 40] = 40 # Right-censor values greater or equal to 40.
Create the
CensoredData
instance with theright_censored
method. The censored values are those where the value is 40.>>> data = CensoredData.right_censored(x, x == 40) >>> print(data) CensoredData(250 values: 215 not censored, 35 right-censored)
35 values have been right-censored.
Fit
weibull_min
to the censored data. We expect to shape and scale to be approximately 2.5 and 30, respectively.>>> weibull_min.fit(data, floc=0) (2.3575922823897315, 0, 30.40650074451254)