Adding New Methods, Functions, and Classes

While adding code to SciPy is in most cases quite straight forward, there are a few places where that is not the case. This document contains detailed information on some specific situations where it may not be clear from the outset what is involved in the task.

Adding A New Statistics Distribution

For hundreds of years statisticians, mathematicians and scientists have needed to understand, analyze and model data. This has led to a plethora of statisics distributions, many of which are related to others. Modeling of new types of data continues to give rise to new distributions, as does theoretical considerations being applied to new disciplines. SciPy models about a dozen discrete distributions Discrete Statistical Distributions and 100 continuous distributions Continuous Statistical Distributions.

To add a new distribution, a good reference is needed. Scipy typically uses [JKB] as its gold standard, with WikipediaDistributions articles often providing some extra details and/or graphical plots.

How to create a new continuous distribution

There are a few steps to be done to add a continuous distribution to SciPy. (Adding a discrete distribution is similar). We’ll use the fictitious “Squirrel” distribution in the instructions below.

Before Implementation

  1. See if Squirrel has already been implemented–that saves a lot of effort!

    • It may have been implemented with a different name.

    • It may have been implemented with a different parameterization (shape parameters).

    • It may be a specialization of a more general family of distributions.

    It is very common for multiple disciplines to discover/rediscover a distribution (or a specialization or different parameterization). There are a few existing SciPy distributions which are specializations of other distributions. E.g. The scipy.stats.arcsine distribution is a specialization of the scipy.stats.beta distribution. These duplications exist for (very!) historical and widespread usage reasons. At this time, adding new specializations/reparametrizations of existing distributions to SciPy is not supported, mainly due to the increase in user confusion resulting from such additions.

  2. Create a SciPy Issue on github, listing the distribution, references and reasons for its inclusion.

Implementation

  1. Find an already existing distribution similar to Squirrel. Use its code as a template for Squirrel.

  2. Read the docstring for class rv_continuous in scipy/stats/_distn_infrastructure.py.

  3. Write the new code for class squirrel_gen and insert it into scipy/stats/_continuous_distns.py, which is in (mostly) alphabetical order by distribution name.

  4. Does the distribution have infinite support? If not, left and/or right endpoints a, b need to be specified in the call to squirrel_gen(name='squirrel', a=?, b=?).

  5. If the support depends upon the shape parameters, squirrel_gen._get_support() needs to be implemented.

  6. The default inherited _argcheck() implementation checks that the shape parameters are positive. Create a more appropriate implementation.

  7. If squirrel_gen.ppf() is expensive to compute relative to squirrel_gen.pdf(), consider setting the momtype in the call to squirrel_gen().

  8. If squirrel_gen.rvs() is expensive to compute, consider implementing a specific squirrel_gen._rvs().

  9. Add the name to the listing in the docstring of scipy/stats/__init__.py.

  10. Add the name and a good set of example shape parameters to the distcont list in scipy/stats/_distr_params.py. These shape parameters are used both for testing and automatic documentation generation.

  11. Add the name and an _invalid_ set of example shape parameters to the list in invdistcont, also in _distr_params.py. These shape parameters are also used for testing.

  12. Add a TestSquirrel class and any specific tests to scipy/stats/tests/test_distributions.py.

  13. Run and pass(!) the tests.

After Implementation

  1. Add a tutorial doc/source/tutorial/stats/continuous_squirrel.rst

  2. Add it to the listing of continuous distributions in doc/source/tutorial/stats/continuous.rst.

  3. Update the number of continuous distributions in the example code in doc/source/tutorial/stats.rst.

  4. Build the documentation successfully.

  5. Submit a PR.

References

JKB

Johnson, Kotz, and Balakrishnan, “Continuous Univariate Distributions, Volume 1”, Second Edition, John Wiley and Sons, p. 173 (1994).