Minimize a function using simulated annealing.
Schedule is a schedule class implementing the annealing schedule. Available ones are ‘fast’, ‘cauchy’, ‘boltzmann’
Parameters :  func : callable f(x, *args)
x0 : ndarray
args : tuple
schedule : base_schedule
full_output : bool
T0 : float
Tf : float
maxeval : int
maxaccept : int
maxiter : int
learn_rate : float
boltzmann : float
feps : float
quench, m, n : float
lower, upper : float or ndarray
dwell : int


Returns :  xmin : ndarray
Jmin : float
T : float
feval : int
iters : int
accept : int
retval : int

Notes
Simulated annealing is a random algorithm which uses no derivative information from the function being optimized. In practice it has been more useful in discrete optimization than continuous optimization, as there are usually better algorithms for continuous optimization problems.
Some experimentation by trying the difference temperature schedules and altering their parameters is likely required to obtain good performance.
The randomness in the algorithm comes from random sampling in numpy. To obtain the same results you can call numpy.random.seed with the same seed immediately before calling scipy.optimize.anneal.
We give a brief description of how the three temperature schedules generate new points and vary their temperature. Temperatures are only updated with iterations in the outer loop. The inner loop is over loop over xrange(dwell), and new points are generated for every iteration in the inner loop. (Though whether the proposed new points are accepted is probabilistic.)
For readability, let d denote the dimension of the inputs to func. Also, let x_old denote the previous state, and k denote the iteration number of the outer loop. All other variables not defined below are input variables to scipy.optimize.anneal itself.
In the ‘fast’ schedule the updates are
u ~ Uniform(0, 1, size=d)
y = sgn(u  0.5) * T * ((1+ 1/T)**abs(2u1) 1.0)
xc = y * (upper  lower)
x_new = x_old + xc
c = n * exp(n * quench)
T_new = T0 * exp(c * k**quench)
In the ‘cauchy’ schedule the updates are
u ~ Uniform(pi/2, pi/2, size=d)
xc = learn_rate * T * tan(u)
x_new = x_old + xc
T_new = T0 / (1+k)
In the ‘boltzmann’ schedule the updates are
std = minimum( sqrt(T) * ones(d), (upperlower) / (3*learn_rate) )
y ~ Normal(0, std, size=d)
x_new = x_old + learn_rate * y
T_new = T0 / log(1+k)