This document aims to give an overview of how to contribute to SciPy. It tries to answer commonly asked questions, and provide some insight into how the community process works in practice. Readers who are familiar with the SciPy community and are experienced Python coders may want to jump straight to the git workflow documentation.
If you have been working with the scientific Python toolstack for a while, you probably have some code lying around of which you think “this could be useful for others too”. Perhaps it’s a good idea then to contribute it to SciPy or another open source project. The first question to ask is then, where does this code belong? That question is hard to answer here, so we start with a more specific one: what code is suitable for putting into SciPy? Almost all of the new code added to scipy has in common that it’s potentially useful in multiple scientific domains and it fits in the scope of existing scipy submodules. In principle new submodules can be added too, but this is far less common. For code that is specific to a single application, there may be an existing project that can use the code. Some scikits (scikit-learn, scikits-image, statsmodels, etc.) are good examples here; they have a narrower focus and because of that more domain-specific code than SciPy.
Now if you have code that you would like to see included in SciPy, how do you go about it? After checking that your code can be distributed in SciPy under a compatible license (see FAQ for details), the first step is to discuss on the scipy-dev mailing list. All new features, as well as changes to existing code, are discussed and decided on there. You can, and probably should, already start this discussion before your code is finished.
Assuming the outcome of the discussion on the mailing list is positive and you have a function or piece of code that does what you need it to do, what next? Before code is added to SciPy, it at least has to have good documentation, unit tests and correct code style.
In principle you should aim to create unit tests that exercise all the code that you are adding. This gives some degree of confidence that your code runs correctly, also on Python versions and hardware or OSes that you don’t have available yourself. An extensive description of how to write unit tests is given in the NumPy testing guidelines.
Clear and complete documentation is essential in order for users to be able to find and understand the code. Documentation for individual functions and classes – which includes at least a basic description, type and meaning of all parameters and returns values, and usage examples in doctest format – is put in docstrings. Those docstrings can be read within the interpreter, and are compiled into a reference guide in html and pdf format. Higher-level documentation for key (areas of) functionality is provided in tutorial format and/or in module docstrings. A guide on how to write documentation is given in how to document.
Uniformity of style in which code is written is important to others trying to understand the code. SciPy follows the standard Python guidelines for code style, PEP8. In order to check that your code conforms to PEP8, you can use the pep8 package style checker. Most IDEs and text editors have settings that can help you follow PEP8, for example by translating tabs by four spaces. Using pyflakes to check your code is also a good idea.
At the end of this document a checklist is given that may help to check if your code fulfills all requirements for inclusion in SciPy.
Another question you may have is: where exactly do I put my code? To answer this, it is useful to understand how the SciPy public API (application programming interface) is defined. For most modules the API is two levels deep, which means your new function should appear as scipy.submodule.my_new_func. my_new_func can be put in an existing or new file under /scipy/<submodule>/, its name is added to the __all__ list in that file (which lists all public functions in the file), and those public functions are then imported in /scipy/<submodule>/__init__.py. Any private functions/classes should have a leading underscore (_) in their name. A more detailed description of what the public API of SciPy is, is given in SciPy API.
Once you think your code is ready for inclusion in SciPy, you can send a pull request (PR) on Github. We won’t go into the details of how to work with git here, this is described well in the git workflow section of the NumPy documentation and in the Github help pages. When you send the PR for a new feature, be sure to also mention this on the scipy-dev mailing list. This can prompt interested people to help review your PR. Assuming that you already got positive feedback before on the general idea of your code/feature, the purpose of the code review is to ensure that the code is correct, efficient and meets the requirements outlined above. In many cases the code review happens relatively quickly, but it’s possible that it stalls. If you have addressed all feedback already given, it’s perfectly fine to ask on the mailing list again for review (after a reasonable amount of time, say a couple of weeks, has passed). Once the review is completed, the PR is merged into the “master” branch of SciPy.
The above describes the requirements and process for adding code to SciPy. It doesn’t yet answer the question though how decisions are made exactly. The basic answer is: decisions are made by consensus, by everyone who chooses to participate in the discussion on the mailing list. This includes developers, other users and yourself. Aiming for consensus in the discussion is important – SciPy is a project by and for the scientific Python community. In those rare cases that agreement cannot be reached, the maintainers of the module in question can decide the issue.
The previous section talked specifically about adding new functionality to SciPy. A large part of that discussion also applies to maintenance of existing code. Maintenance means fixing bugs, improving code quality or style, documenting existing functionality better, adding missing unit tests, keeping build scripts up-to-date, etc. The SciPy Trac bug tracker contains all reported bugs, build/documentation issues, etc. Fixing issues described in Trac tickets helps improve the overall quality of SciPy, and is also a good way of getting familiar with the project. You may also want to fix a bug because you ran into it and need the function in question to work correctly.
The discussion on code style and unit testing above applies equally to bug fixes. It is usually best to start by writing a unit test that shows the problem, i.e. it should pass but doesn’t. Once you have that, you can fix the code so that the test does pass. That should be enough to send a PR for this issue. Unlike when adding new code, discussing this on the mailing list may not be necessary - if the old behavior of the code is clearly incorrect, no one will object to having it fixed. It may be necessary to add some warning or deprecation message for the changed behavior. This should be part of the review process.
There are many ways to contribute other than contributing code. Participating in discussions on the scipy-user and scipy-dev mailing lists is a contribution in itself. The scipy.org website contains a lot of information on the SciPy community and can always use a new pair of hands. A redesign of this website is ongoing, see scipy.github.com. The redesigned website is a static site based on Sphinx, the sources for it are also on Github at scipy.org-new.
The SciPy documentation is constantly being improved by many developers and users. You can contribute by sending a PR on Github that improves the documentation, but there’s also a documentation wiki that is very convenient for making edits to docstrings (and doesn’t require git knowledge). Anyone can register a username on that wiki, ask on the scipy-dev mailing list for edit rights and make edits. The documentation there is updated every day with the latest changes in the SciPy master branch, and wiki edits are regularly reviewed and merged into master. Another advantage of the documentation wiki is that you can immediately see how the reStructuredText (reST) of docstrings and other docs is rendered as html, so you can easily catch formatting errors.
Code that doesn’t belong in SciPy itself or in another package but helps users accomplish a certain task is valuable. SciPy Central is the place to share this type of code (snippets, examples, plotting code, etc.).
- Are there unit tests with good code coverage?
- Do all public function have docstrings including examples?
- Is the code style correct (PEP8, pyflakes)
- Is the new functionality tagged with .. versionadded:: X.Y.Z (with X.Y.Z the version number of the next release - can be found in setup.py)?
- Is the new functionality mentioned in the release notes of the next release?
- Is the new functionality added to the reference guide?
- In case of larger additions, is there a tutorial or more extensive module-level description?
- In case compiled code is added, is it integrated correctly via setup.py (and preferably also Bento/Numscons configuration files)?
- If you are a first-time contributor, did you add yourself to THANKS.txt? Please note that this is perfectly normal and desirable - the aim is to give every single contributor credit, and if you don’t add yourself it’s simply extra work for the reviewer (or worse, the reviewer may forget).
- Did you check that the code can be distributed under a BSD license?
- The how to document guidelines
- NumPy/SciPy testing guidelines
- SciPy API
- SciPy maintainers
- NumPy/SciPy git workflow
I based my code on existing Matlab/R/... code I found online, is this OK?
It depends. SciPy is distributed under a BSD license, so if the code that you based your code on is also BSD licensed or has a BSD-compatible license (MIT, Apache, ...) then it’s OK. Code which is GPL-licensed, has no clear license, requires citation or is free for academic use only can’t be included in SciPy. Therefore if you copied existing code with such a license or made a direct translation to Python of it, your code can’t be included. See also license compatibility.
Why is SciPy under the BSD license and not, say, the GPL?
Like Python, SciPy uses a “permissive” open source license, which allows proprietary re-use. While this allows companies to use and modify the software without giving anything back, it is felt that the larger user base results in more contributions overall, and companies often publish their modifications anyway, without being required to. See John Hunter’s BSD pitch.
How do I set up SciPy so I can edit files, run the tests and make commits?
The simplest method is setting up an in-place build. To create your local git repo and do the in-place build:
$ git clone https://github.com/scipy/scipy.git scipy
$ cd scipy
$ python setup.py build_ext -i
Then you need to either set up a symlink in your site-packages or add this directory to your PYTHONPATH environment variable, so Python can find it. Some IDEs (Spyder for example) have utilities to manage PYTHONPATH. On Linux and OS X, you can for example edit your .bash_login file to automatically add this dir on startup of your terminal. Add the line:
export PYTHONPATH="$HOME/scipy:${PYTHONPATH}"
Alternatively, to set up the symlink, use (prefix only necessary if you want to use your local instead of global site-packages dir):
$ python setupegg.py develop --prefix=${HOME}
To test that everything works, start the interpreter (not inside the scipy/ source dir) and run the tests:
$ python
>>> import scipy as sp
>>> sp.test()
Now editing a Python source file in SciPy allows you to immediately test and use your changes, by simply restarting the interpreter.
Note that while the above procedure is the most straightforward way to get started, you may want to look into using Bento or numscons for faster and more flexible building, or virtualenv to maintain development environments for multiple Python versions.
How do I set up a development version of SciPy in parallel to a released version that I use to do my job/research?
One simple way to achieve this is to install the released version in site-packages, by using a binary installer or pip for example, and set up the development version with an in-place build in a virtualenv. First install virtualenv and virtualenvwrapper, then create your virtualenv (named scipy-dev here) with:
$ mkvirtualenv scipy-dev
Now, whenever you want to switch to the virtual environment, you can use the command workon scipy-dev, while the command deactivate exits from the virtual environment and brings back your previous shell. With scipy-dev activated, follow the in-place build with the symlink install above to actually install your development version of SciPy.
Can I use a programming language other than Python to speed up my code?
Yes. The languages used in SciPy are Python, Cython, C, C++ and Fortran. All of these have their pros and cons. If Python really doesn’t offer enough performance, one of those languages can be used. Important concerns when using compiled languages are maintainability and portability. For maintainability, Cython is clearly preferred over C/C++/Fortran. Cython and C are more portable than C++/Fortran. A lot of the existing C and Fortran code in SciPy is older, battle-tested code that was only wrapped in (but not specifically written for) Python/SciPy. Therefore the basic advice is: use Cython. If there’s specific reasons why C/C++/Fortran should be preferred, please discuss those reasons first.
There’s overlap between Trac and Github, which do I use for what?
Trac is the bug tracker, Github the code repository. Before the SciPy code repository moved to Github, the preferred way to contribute code was to create a patch and attach it to a Trac ticket. The overhead of this approach is much larger than sending a PR on Github, so please don’t do this anymore. Use Trac for bug reports, Github for patches.