scipy.stats.mstats.winsorize#
- scipy.stats.mstats.winsorize(a, limits=None, inclusive=(True, True), inplace=False, axis=None, nan_policy='propagate')[source]#
- Returns a Winsorized version of the input array. - The (limits[0])th lowest values are set to the (limits[0])th percentile, and the (limits[1])th highest values are set to the (1 - limits[1])th percentile. Masked values are skipped. - Parameters
- asequence
- Input array. 
- limits{None, tuple of float}, optional
- Tuple of the percentages to cut on each side of the array, with respect to the number of unmasked data, as floats between 0. and 1. Noting n the number of unmasked data before trimming, the (n*limits[0])th smallest data and the (n*limits[1])th largest data are masked, and the total number of unmasked data after trimming is n*(1.-sum(limits)) The value of one limit can be set to None to indicate an open interval. 
- inclusive{(True, True) tuple}, optional
- Tuple indicating whether the number of data being masked on each side should be truncated (True) or rounded (False). 
- inplace{False, True}, optional
- Whether to winsorize in place (True) or to use a copy (False) 
- axis{None, int}, optional
- Axis along which to trim. If None, the whole array is trimmed, but its shape is maintained. 
- nan_policy{‘propagate’, ‘raise’, ‘omit’}, optional
- Defines how to handle when input contains nan. The following options are available (default is ‘propagate’): - ‘propagate’: allows nan values and may overwrite or propagate them 
- ‘raise’: throws an error 
- ‘omit’: performs the calculations ignoring nan values 
 
 
 - Notes - This function is applied to reduce the effect of possibly spurious outliers by limiting the extreme values. - Examples - >>> from scipy.stats.mstats import winsorize - A shuffled array contains integers from 1 to 10. - >>> a = np.array([10, 4, 9, 8, 5, 3, 7, 2, 1, 6]) - The 10% of the lowest value (i.e., 1) and the 20% of the highest values (i.e., 9 and 10) are replaced. - >>> winsorize(a, limits=[0.1, 0.2]) masked_array(data=[8, 4, 8, 8, 5, 3, 7, 2, 2, 6], mask=False, fill_value=999999)