scipy.optimize.differential_evolution#
- scipy.optimize.differential_evolution(func, bounds, args=(), strategy='best1bin', maxiter=1000, popsize=15, tol=0.01, mutation=(0.5, 1), recombination=0.7, seed=None, callback=None, disp=False, polish=True, init='latinhypercube', atol=0, updating='immediate', workers=1, constraints=(), x0=None, *, integrality=None, vectorized=False)[source]#
Finds the global minimum of a multivariate function.
Differential Evolution is stochastic in nature (does not use gradient methods) to find the minimum, and can search large areas of candidate space, but often requires larger numbers of function evaluations than conventional gradient-based techniques.
The algorithm is due to Storn and Price [1].
- Parameters
- funccallable
The objective function to be minimized. Must be in the form
f(x, *args)
, wherex
is the argument in the form of a 1-D array andargs
is a tuple of any additional fixed parameters needed to completely specify the function. The number of parameters, N, is equal tolen(x)
.- boundssequence or
Bounds
Bounds for variables. There are two ways to specify the bounds: 1. Instance of
Bounds
class. 2.(min, max)
pairs for each element inx
, defining the finite lower and upper bounds for the optimizing argument of func. The total number of bounds is used to determine the number of parameters, N.- argstuple, optional
Any additional fixed parameters needed to completely specify the objective function.
- strategystr, optional
The differential evolution strategy to use. Should be one of:
‘best1bin’
‘best1exp’
‘rand1exp’
‘randtobest1exp’
‘currenttobest1exp’
‘best2exp’
‘rand2exp’
‘randtobest1bin’
‘currenttobest1bin’
‘best2bin’
‘rand2bin’
‘rand1bin’
The default is ‘best1bin’.
- maxiterint, optional
The maximum number of generations over which the entire population is evolved. The maximum number of function evaluations (with no polishing) is:
(maxiter + 1) * popsize * N
- popsizeint, optional
A multiplier for setting the total population size. The population has
popsize * N
individuals. This keyword is overridden if an initial population is supplied via the init keyword. When usinginit='sobol'
the population size is calculated as the next power of 2 afterpopsize * N
.- tolfloat, optional
Relative tolerance for convergence, the solving stops when
np.std(pop) <= atol + tol * np.abs(np.mean(population_energies))
, where and atol and tol are the absolute and relative tolerance respectively.- mutationfloat or tuple(float, float), optional
The mutation constant. In the literature this is also known as differential weight, being denoted by F. If specified as a float it should be in the range [0, 2]. If specified as a tuple
(min, max)
dithering is employed. Dithering randomly changes the mutation constant on a generation by generation basis. The mutation constant for that generation is taken fromU[min, max)
. Dithering can help speed convergence significantly. Increasing the mutation constant increases the search radius, but will slow down convergence.- recombinationfloat, optional
The recombination constant, should be in the range [0, 1]. In the literature this is also known as the crossover probability, being denoted by CR. Increasing this value allows a larger number of mutants to progress into the next generation, but at the risk of population stability.
- seed{None, int,
numpy.random.Generator
, numpy.random.RandomState
}, optionalIf seed is None (or np.random), the
numpy.random.RandomState
singleton is used. If seed is an int, a newRandomState
instance is used, seeded with seed. If seed is already aGenerator
orRandomState
instance then that instance is used. Specify seed for repeatable minimizations.- dispbool, optional
Prints the evaluated func at every iteration.
- callbackcallable, callback(xk, convergence=val), optional
A function to follow the progress of the minimization.
xk
is the best solution found so far.val
represents the fractional value of the population convergence. Whenval
is greater than one the function halts. If callback returns True, then the minimization is halted (any polishing is still carried out).- polishbool, optional
If True (default), then
scipy.optimize.minimize
with the L-BFGS-B method is used to polish the best population member at the end, which can improve the minimization slightly. If a constrained problem is being studied then the trust-constr method is used instead.- initstr or array-like, optional
Specify which type of population initialization is performed. Should be one of:
‘latinhypercube’
‘sobol’
‘halton’
‘random’
array specifying the initial population. The array should have shape
(S, N)
, where S is the total population size and N is the number of parameters. init is clipped to bounds before use.
The default is ‘latinhypercube’. Latin Hypercube sampling tries to maximize coverage of the available parameter space.
‘sobol’ and ‘halton’ are superior alternatives and maximize even more the parameter space. ‘sobol’ will enforce an initial population size which is calculated as the next power of 2 after
popsize * N
. ‘halton’ has no requirements but is a bit less efficient. Seescipy.stats.qmc
for more details.‘random’ initializes the population randomly - this has the drawback that clustering can occur, preventing the whole of parameter space being covered. Use of an array to specify a population could be used, for example, to create a tight bunch of initial guesses in an location where the solution is known to exist, thereby reducing time for convergence.
- atolfloat, optional
Absolute tolerance for convergence, the solving stops when
np.std(pop) <= atol + tol * np.abs(np.mean(population_energies))
, where and atol and tol are the absolute and relative tolerance respectively.- updating{‘immediate’, ‘deferred’}, optional
If
'immediate'
, the best solution vector is continuously updated within a single generation [4]. This can lead to faster convergence as trial vectors can take advantage of continuous improvements in the best solution. With'deferred'
, the best solution vector is updated once per generation. Only'deferred'
is compatible with parallelization or vectorization, and the workers and vectorized keywords can over-ride this option.New in version 1.2.0.
- workersint or map-like callable, optional
If workers is an int the population is subdivided into workers sections and evaluated in parallel (uses
multiprocessing.Pool
). Supply -1 to use all available CPU cores. Alternatively supply a map-like callable, such as multiprocessing.Pool.map for evaluating the population in parallel. This evaluation is carried out asworkers(func, iterable)
. This option will override the updating keyword toupdating='deferred'
ifworkers != 1
. This option overrides the vectorized keyword ifworkers != 1
. Requires that func be pickleable.New in version 1.2.0.
- constraints{NonLinearConstraint, LinearConstraint, Bounds}
Constraints on the solver, over and above those applied by the bounds kwd. Uses the approach by Lampinen [5].
New in version 1.4.0.
- x0None or array-like, optional
Provides an initial guess to the minimization. Once the population has been initialized this vector replaces the first (best) member. This replacement is done even if init is given an initial population.
x0.shape == (N,)
.New in version 1.7.0.
- integrality1-D array, optional
For each decision variable, a boolean value indicating whether the decision variable is constrained to integer values. The array is broadcast to
(N,)
. If any decision variables are constrained to be integral, they will not be changed during polishing. Only integer values lying between the lower and upper bounds are used. If there are no integer values lying between the bounds then a ValueError is raised.New in version 1.9.0.
- vectorizedbool, optional
If
vectorized is True
, func is sent an x array withx.shape == (N, S)
, and is expected to return an array of shape(S,)
, where S is the number of solution vectors to be calculated. If constraints are applied, each of the functions used to construct a Constraint object should accept an x array withx.shape == (N, S)
, and return an array of shape(M, S)
, where M is the number of constraint components. This option is an alternative to the parallelization offered by workers, and may help in optimization speed by reducing interpreter overhead from multiple function calls. This keyword is ignored ifworkers != 1
. This option will override the updating keyword toupdating='deferred'
. See the notes section for further discussion on when to use'vectorized'
, and when to use'workers'
.New in version 1.9.0.
- Returns
- resOptimizeResult
The optimization result represented as a
OptimizeResult
object. Important attributes are:x
the solution array,success
a Boolean flag indicating if the optimizer exited successfully andmessage
which describes the cause of the termination. SeeOptimizeResult
for a description of other attributes. If polish was employed, and a lower minimum was obtained by the polishing, then OptimizeResult also contains thejac
attribute. If the eventual solution does not satisfy the applied constraintssuccess
will be False.
Notes
Differential evolution is a stochastic population based method that is useful for global optimization problems. At each pass through the population the algorithm mutates each candidate solution by mixing with other candidate solutions to create a trial candidate. There are several strategies [2] for creating trial candidates, which suit some problems more than others. The ‘best1bin’ strategy is a good starting point for many systems. In this strategy two members of the population are randomly chosen. Their difference is used to mutate the best member (the ‘best’ in ‘best1bin’), \(b_0\), so far:
\[b' = b_0 + mutation * (population[rand0] - population[rand1])\]A trial vector is then constructed. Starting with a randomly chosen ith parameter the trial is sequentially filled (in modulo) with parameters from
b'
or the original candidate. The choice of whether to useb'
or the original candidate is made with a binomial distribution (the ‘bin’ in ‘best1bin’) - a random number in [0, 1) is generated. If this number is less than the recombination constant then the parameter is loaded fromb'
, otherwise it is loaded from the original candidate. The final parameter is always loaded fromb'
. Once the trial candidate is built its fitness is assessed. If the trial is better than the original candidate then it takes its place. If it is also better than the best overall candidate it also replaces that. To improve your chances of finding a global minimum use higher popsize values, with higher mutation and (dithering), but lower recombination values. This has the effect of widening the search radius, but slowing convergence. By default the best solution vector is updated continuously within a single iteration (updating='immediate'
). This is a modification [4] of the original differential evolution algorithm which can lead to faster convergence as trial vectors can immediately benefit from improved solutions. To use the original Storn and Price behaviour, updating the best solution once per iteration, setupdating='deferred'
. The'deferred'
approach is compatible with both parallelization and vectorization ('workers'
and'vectorized'
keywords). These may improve minimization speed by using computer resources more efficiently. The'workers'
distribute calculations over multiple processors. By default the Pythonmultiprocessing
module is used, but other approaches are also possible, such as the Message Passing Interface (MPI) used on clusters [6] [7]. The overhead from these approaches (creating new Processes, etc) may be significant, meaning that computational speed doesn’t necessarily scale with the number of processors used. Parallelization is best suited to computationally expensive objective functions. If the objective function is less expensive, then'vectorized'
may aid by only calling the objective function once per iteration, rather than multiple times for all the population members; the interpreter overhead is reduced.New in version 0.15.0.
References
- 1
Storn, R and Price, K, Differential Evolution - a Simple and Efficient Heuristic for Global Optimization over Continuous Spaces, Journal of Global Optimization, 1997, 11, 341 - 359.
- 2
- 3
- 4(1,2)
Wormington, M., Panaccione, C., Matney, K. M., Bowen, D. K., - Characterization of structures from X-ray scattering data using genetic algorithms, Phil. Trans. R. Soc. Lond. A, 1999, 357, 2827-2848
- 5
Lampinen, J., A constraint handling approach for the differential evolution algorithm. Proceedings of the 2002 Congress on Evolutionary Computation. CEC’02 (Cat. No. 02TH8600). Vol. 2. IEEE, 2002.
- 6
- 7
Examples
Let us consider the problem of minimizing the Rosenbrock function. This function is implemented in
rosen
inscipy.optimize
.>>> from scipy.optimize import rosen, differential_evolution >>> bounds = [(0,2), (0, 2), (0, 2), (0, 2), (0, 2)] >>> result = differential_evolution(rosen, bounds) >>> result.x, result.fun (array([1., 1., 1., 1., 1.]), 1.9216496320061384e-19)
Now repeat, but with parallelization.
>>> result = differential_evolution(rosen, bounds, updating='deferred', ... workers=2) >>> result.x, result.fun (array([1., 1., 1., 1., 1.]), 1.9216496320061384e-19)
Let’s try and do a constrained minimization
>>> from scipy.optimize import NonlinearConstraint, Bounds >>> def constr_f(x): ... return np.array(x[0] + x[1]) >>> >>> # the sum of x[0] and x[1] must be less than 1.9 >>> nlc = NonlinearConstraint(constr_f, -np.inf, 1.9) >>> # specify limits using a `Bounds` object. >>> bounds = Bounds([0., 0.], [2., 2.]) >>> result = differential_evolution(rosen, bounds, constraints=(nlc), ... seed=1) >>> result.x, result.fun (array([0.96633867, 0.93363577]), 0.0011361355854792312)
Next find the minimum of the Ackley function (https://en.wikipedia.org/wiki/Test_functions_for_optimization).
>>> from scipy.optimize import differential_evolution >>> import numpy as np >>> def ackley(x): ... arg1 = -0.2 * np.sqrt(0.5 * (x[0] ** 2 + x[1] ** 2)) ... arg2 = 0.5 * (np.cos(2. * np.pi * x[0]) + np.cos(2. * np.pi * x[1])) ... return -20. * np.exp(arg1) - np.exp(arg2) + 20. + np.e >>> bounds = [(-5, 5), (-5, 5)] >>> result = differential_evolution(ackley, bounds, seed=1) >>> result.x, result.fun, result.nfev (array([0., 0.]), 4.440892098500626e-16, 3063)
The Ackley function is written in a vectorized manner, so the
'vectorized'
keyword can be employed. Note the reduced number of function evaluations.>>> result = differential_evolution( ... ackley, bounds, vectorized=True, updating='deferred', seed=1 ... ) >>> result.x, result.fun, result.nfev (array([0., 0.]), 4.440892098500626e-16, 190)