NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.
At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance. There are several important differences between NumPy arrays and the standard Python sequences:
The points about sequence size and speed are particularly important in scientific computing. As a simple example, consider the case of multiplying each element in a 1-D sequence with the corresponding element in another sequence of the same length. If the data are stored in two Python lists, a and b, we could iterate over each element:
c = []
for i in range(len(a)):
c.append(a[i]*b[i])
This produces the correct answer, but if a and b each contain millions of numbers, we will pay the price for the inefficiencies of looping in Python. We could accomplish the same task much more quickly in C by writing (for clarity we neglect variable declarations and initializations, memory allocation, etc.)
for (i = 0; i < rows; i++): {
c[i] = a[i]*b[i];
}
This saves all the overhead involved in interpreting the Python code and manipulating Python objects, but at the expense of the benefits gained from coding in Python. Furthermore, the coding work required increases with the dimensionality of our data. In the case of a 2-D array, for example, the C code (abridged as before) expands to
for (i = 0; i < rows; i++): {
for (j = 0; j < columns; j++): {
c[i][j] = a[i][j]*b[i][j];
}
}
NumPy gives us the best of both worlds: element-by-element operations are the “default mode” when an ndarray is involved, but the element-by-element operation is speedily executed by pre-compiled C code. In NumPy
c = a * b
does what the earlier examples do, at near-C speeds, but with the code simplicity we expect from something based on Python (indeed, the NumPy idiom is even simpler!). This last example illustrates two of NumPy’s features which are the basis of much of its power: vectorization and broadcasting.
Vectorization describes the absence of any explicit looping, indexing, etc., in the code - these things are taking place, of course, just “behind the scenes” (in optimized, pre-compiled C code). Vectorized code has many advantages, among which are:
Broadcasting is the term used to describe the implicit element-by-element behavior of operations; generally speaking, in NumPy all operations (i.e., not just arithmetic operations, but logical, bit-wise, functional, etc.) behave in this implicit element-by-element fashion, i.e., they broadcast. Moreover, in the example above, a and b could be multidimensional arrays of the same shape, or a scalar and an array, or even two arrays of with different shapes. Provided that the smaller array is “expandable” to the shape of the larger in such a way that the resulting broadcast is unambiguous (for detailed “rules” of broadcasting see numpy.doc.broadcasting).
NumPy fully supports an object-oriented approach, starting, once again, with ndarray. For example, ndarray is a class, possessing numerous methods and attributes. Many, of it’s methods mirror functions in the outer-most NumPy namespace, giving the programmer has complete freedom to code in whichever paradigm she prefers and/or which seems most appropriate to the task at hand.