This page is based with thanks on the wiki page on subclassing by Pierre Gerard-Marchant - http://www.scipy.org/Subclasses.
Subclassing ndarray is relatively simple, but it has some complications compared to other Python objects. On this page we explain the machinery that allows you to subclass ndarray, and the implications for implementing a subclass.
Subclassing ndarray is complicated by the fact that new instances of ndarray classes can come about in three different ways. These are:
The last two are characteristics of ndarrays - in order to support things like array slicing. The complications of subclassing ndarray are due to the mechanisms numpy has to support these latter two routes of instance creation.
View casting is the standard ndarray mechanism by which you take an ndarray of any subclass, and return a view of the array as another (specified) subclass:
>>> import numpy as np >>> # create a completely useless ndarray subclass >>> class C(np.ndarray): pass >>> # create a standard ndarray >>> arr = np.zeros((3,)) >>> # take a view of it, as our useless subclass >>> c_arr = arr.view(C) >>> type(c_arr) <class 'C'>
New instances of an ndarray subclass can also come about by a very similar mechanism to View casting, when numpy finds it needs to create a new instance from a template instance. The most obvious place this has to happen is when you are taking slices of subclassed arrays. For example:
>>> v = c_arr[1:] >>> type(v) # the view is of type 'C' <class 'C'> >>> v is c_arr # but it's a new instance False
The slice is a view onto the original c_arr data. So, when we take a view from the ndarray, we return a new ndarray, of the same class, that points to the data in the original.
There are other points in the use of ndarrays where we need such views, such as copying arrays (c_arr.copy()), creating ufunc output arrays (see also __array_wrap__ for ufuncs), and reducing methods (like c_arr.mean().
These paths both use the same machinery. We make the distinction here, because they result in different input to your methods. Specifically, View casting means you have created a new instance of your array type from any potential subclass of ndarray. Creating new from template means you have created a new instance of your class from a pre-existing instance, allowing you - for example - to copy across attributes that are particular to your subclass.
If we subclass ndarray, we need to deal not only with explicit construction of our array type, but also View casting or Creating new from template. Numpy has the machinery to do this, and this machinery that makes subclassing slightly non-standard.
There are two aspects to the machinery that ndarray uses to support views and new-from-template in subclasses.
The first is the use of the ndarray.__new__ method for the main work of object initialization, rather then the more usual __init__ method. The second is the use of the __array_finalize__ method to allow subclasses to clean up after the creation of views and new instances from templates.
__new__ is a standard Python method, and, if present, is called before __init__ when we create a class instance. See the python __new__ documentation for more detail.
For example, consider the following Python code:
class C(object): def __new__(cls, *args): print 'Cls in __new__:', cls print 'Args in __new__:', args return object.__new__(cls, *args) def __init__(self, *args): print 'type(self) in __init__:', type(self) print 'Args in __init__:', args
meaning that we get:
>>> c = C('hello') Cls in __new__: <class 'C'> Args in __new__: ('hello',) type(self) in __init__: <class 'C'> Args in __init__: ('hello',)
When we call C('hello'), the __new__ method gets its own class as first argument, and the passed argument, which is the string 'hello'. After python calls __new__, it usually (see below) calls our __init__ method, with the output of __new__ as the first argument (now a class instance), and the passed arguments following.
As you can see, the object can be initialized in the __new__ method or the __init__ method, or both, and in fact ndarray does not have an __init__ method, because all the initialization is done in the __new__ method.
Why use __new__ rather than just the usual __init__? Because in some cases, as for ndarray, we want to be able to return an object of some other class. Consider the following:
class D(C): def __new__(cls, *args): print 'D cls is:', cls print 'D args in __new__:', args return C.__new__(C, *args) def __init__(self, *args): # we never get here print 'In D __init__'
>>> obj = D('hello') D cls is: <class 'D'> D args in __new__: ('hello',) Cls in __new__: <class 'C'> Args in __new__: ('hello',) >>> type(obj) <class 'C'>
The definition of C is the same as before, but for D, the __new__ method returns an instance of class C rather than D. Note that the __init__ method of D does not get called. In general, when the __new__ method returns an object of class other than the class in which it is defined, the __init__ method of that class is not called.
This is how subclasses of the ndarray class are able to return views that preserve the class type. When taking a view, the standard ndarray machinery creates the new ndarray object with something like:
obj = ndarray.__new__(subtype, shape, ...
where subdtype is the subclass. Thus the returned view is of the same class as the subclass, rather than being of class ndarray.
That solves the problem of returning views of the same type, but now we have a new problem. The machinery of ndarray can set the class this way, in its standard methods for taking views, but the ndarray __new__ method knows nothing of what we have done in our own __new__ method in order to set attributes, and so on. (Aside - why not call obj = subdtype.__new__(... then? Because we may not have a __new__ method with the same call signature).
__array_finalize__ is the mechanism that numpy provides to allow subclasses to handle the various ways that new instances get created.
Remember that subclass instances can come about in these three ways:
Our MySubClass.__new__ method only gets called in the case of the explicit constructor call, so we can’t rely on MySubClass.__new__ or MySubClass.__init__ to deal with the view casting and new-from-template. It turns out that MySubClass.__array_finalize__ does get called for all three methods of object creation, so this is where our object creation housekeeping usually goes.
The arguments that __array_finalize__ recieves differ for the three methods of instance creation above.
The following code allows us to look at the call sequences and arguments:
import numpy as np class C(np.ndarray): def __new__(cls, *args, **kwargs): print 'In __new__ with class %s' % cls return np.ndarray.__new__(cls, *args, **kwargs) def __init__(self, *args, **kwargs): # in practice you probably will not need or want an __init__ # method for your subclass print 'In __init__ with class %s' % self.__class__ def __array_finalize__(self, obj): print 'In array_finalize:' print ' self type is %s' % type(self) print ' obj type is %s' % type(obj)
>>> # Explicit constructor >>> c = C((10,)) In __new__ with class <class 'C'> In array_finalize: self type is <class 'C'> obj type is <type 'NoneType'> In __init__ with class <class 'C'> >>> # View casting >>> a = np.arange(10) >>> cast_a = a.view(C) In array_finalize: self type is <class 'C'> obj type is <type 'numpy.ndarray'> >>> # Slicing (example of new-from-template) >>> cv = c[:1] In array_finalize: self type is <class 'C'> obj type is <class 'C'>
The signature of __array_finalize__ is:
def __array_finalize__(self, obj):
ndarray.__new__ passes __array_finalize__ the new object, of our own class (self) as well as the object from which the view has been taken (obj). As you can see from the output above, the self is always a newly created instance of our subclass, and the type of obj differs for the three instance creation methods:
Because __array_finalize__ is the only method that always sees new instances being created, it is the sensible place to fill in instance defaults for new object attributes, among other tasks.
This may be clearer with an example.
import numpy as np class InfoArray(np.ndarray): def __new__(subtype, shape, dtype=float, buffer=None, offset=0, strides=None, order=None, info=None): # Create the ndarray instance of our type, given the usual # ndarray input arguments. This will call the standard # ndarray constructor, but return an object of our type. # It also triggers a call to InfoArray.__array_finalize__ obj = np.ndarray.__new__(subtype, shape, dtype, buffer, offset, strides, order) # set the new 'info' attribute to the value passed obj.info = info # Finally, we must return the newly created object: return obj def __array_finalize__(self, obj): # ``self`` is a new object resulting from # ndarray.__new__(InfoArray, ...), therefore it only has # attributes that the ndarray.__new__ constructor gave it - # i.e. those of a standard ndarray. # # We could have got to the ndarray.__new__ call in 3 ways: # From an explicit constructor - e.g. InfoArray(): # obj is None # (we're in the middle of the InfoArray.__new__ # constructor, and self.info will be set when we return to # InfoArray.__new__) if obj is None: return # From view casting - e.g arr.view(InfoArray): # obj is arr # (type(obj) can be InfoArray) # From new-from-template - e.g infoarr[:3] # type(obj) is InfoArray # # Note that it is here, rather than in the __new__ method, # that we set the default value for 'info', because this # method sees all creation of default objects - with the # InfoArray.__new__ constructor, but also with # arr.view(InfoArray). self.info = getattr(obj, 'info', None) # We do not need to return anything
Using the object looks like this:
>>> obj = InfoArray(shape=(3,)) # explicit constructor >>> type(obj) <class 'InfoArray'> >>> obj.info is None True >>> obj = InfoArray(shape=(3,), info='information') >>> obj.info 'information' >>> v = obj[1:] # new-from-template - here - slicing >>> type(v) <class 'InfoArray'> >>> v.info 'information' >>> arr = np.arange(10) >>> cast_arr = arr.view(InfoArray) # view casting >>> type(cast_arr) <class 'InfoArray'> >>> cast_arr.info is None True
This class isn’t very useful, because it has the same constructor as the bare ndarray object, including passing in buffers and shapes and so on. We would probably prefer the constructor to be able to take an already formed ndarray from the usual numpy calls to np.array and return an object.
Here is a class that takes a standard ndarray that already exists, casts as our type, and adds an extra attribute.
import numpy as np class RealisticInfoArray(np.ndarray): def __new__(cls, input_array, info=None): # Input array is an already formed ndarray instance # We first cast to be our class type obj = np.asarray(input_array).view(cls) # add the new attribute to the created instance obj.info = info # Finally, we must return the newly created object: return obj def __array_finalize__(self, obj): # see InfoArray.__array_finalize__ for comments if obj is None: return self.info = getattr(obj, 'info', None)
>>> arr = np.arange(5) >>> obj = RealisticInfoArray(arr, info='information') >>> type(obj) <class 'RealisticInfoArray'> >>> obj.info 'information' >>> v = obj[1:] >>> type(v) <class 'RealisticInfoArray'> >>> v.info 'information'
__array_wrap__ gets called at the end of numpy ufuncs and other numpy functions, to allow a subclass to set the type of the return value and update attributes and metadata. Let’s show how this works with an example. First we make the same subclass as above, but with a different name and some print statements:
import numpy as np class MySubClass(np.ndarray): def __new__(cls, input_array, info=None): obj = np.asarray(input_array).view(cls) obj.info = info return obj def __array_finalize__(self, obj): print 'In __array_finalize__:' print ' self is %s' % repr(self) print ' obj is %s' % repr(obj) if obj is None: return self.info = getattr(obj, 'info', None) def __array_wrap__(self, out_arr, context=None): print 'In __array_wrap__:' print ' self is %s' % repr(self) print ' arr is %s' % repr(out_arr) # then just call the parent return np.ndarray.__array_wrap__(self, out_arr, context)
We run a ufunc on an instance of our new array:
>>> obj = MySubClass(np.arange(5), info='spam') In __array_finalize__: self is MySubClass([0, 1, 2, 3, 4]) obj is array([0, 1, 2, 3, 4]) >>> arr2 = np.arange(5)+1 >>> ret = np.add(arr2, obj) In __array_wrap__: self is MySubClass([0, 1, 2, 3, 4]) arr is array([1, 3, 5, 7, 9]) In __array_finalize__: self is MySubClass([1, 3, 5, 7, 9]) obj is MySubClass([0, 1, 2, 3, 4]) >>> ret MySubClass([1, 3, 5, 7, 9]) >>> ret.info 'spam'
Note that the ufunc (np.add) has called the __array_wrap__ method of the input with the highest __array_priority__ value, in this case MySubClass.__array_wrap__, with arguments self as obj, and out_arr as the (ndarray) result of the addition. In turn, the default __array_wrap__ (ndarray.__array_wrap__) has cast the result to class MySubClass, and called __array_finalize__ - hence the copying of the info attribute. This has all happened at the C level.
But, we could do anything we wanted:
class SillySubClass(np.ndarray): def __array_wrap__(self, arr, context=None): return 'I lost your data'
>>> arr1 = np.arange(5) >>> obj = arr1.view(SillySubClass) >>> arr2 = np.arange(5) >>> ret = np.multiply(obj, arr2) >>> ret 'I lost your data'
So, by defining a specific __array_wrap__ method for our subclass, we can tweak the output from ufuncs. The __array_wrap__ method requires self, then an argument - which is the result of the ufunc - and an optional parameter context. This parameter is returned by some ufuncs as a 3-element tuple: (name of the ufunc, argument of the ufunc, domain of the ufunc). __array_wrap__ should return an instance of its containing class. See the masked array subclass for an implementation.
In addition to __array_wrap__, which is called on the way out of the ufunc, there is also an __array_prepare__ method which is called on the way into the ufunc, after the output arrays are created but before any computation has been performed. The default implementation does nothing but pass through the array. __array_prepare__ should not attempt to access the array data or resize the array, it is intended for setting the output array type, updating attributes and metadata, and performing any checks based on the input that may be desired before computation begins. Like __array_wrap__, __array_prepare__ must return an ndarray or subclass thereof or raise an error.
One of the problems that ndarray solves is keeping track of memory ownership of ndarrays and their views. Consider the case where we have created an ndarray, arr and have taken a slice with v = arr[1:]. The two objects are looking at the same memory. Numpy keeps track of where the data came from for a particular array or view, with the base attribute:
>>> # A normal ndarray, that owns its own data >>> arr = np.zeros((4,)) >>> # In this case, base is None >>> arr.base is None True >>> # We take a view >>> v1 = arr[1:] >>> # base now points to the array that it derived from >>> v1.base is arr True >>> # Take a view of a view >>> v2 = v1[1:] >>> # base points to the view it derived from >>> v2.base is v1 True
In general, if the array owns its own memory, as for arr in this case, then arr.base will be None - there are some exceptions to this - see the numpy book for more details.
The base attribute is useful in being able to tell whether we have a view or the original array. This in turn can be useful if we need to know whether or not to do some specific cleanup when the subclassed array is deleted. For example, we may only want to do the cleanup if the original array is deleted, but not the views. For an example of how this can work, have a look at the memmap class in numpy.core.