Note
Click here to download the full example code or to run this example in your browser via Binder
Interruptible optimization runs with checkpoints¶
Christian Schell, Mai 2018 Reformatted by Holger Nahrstaedt 2020
Problem statement¶
Optimization runs can take a very long time and even run for multiple days. If for some reason the process has to be interrupted results are irreversibly lost, and the routine has to start over from the beginning.
With the help of the callbacks.CheckpointSaver
callback the optimizer’s current state
can be saved after each iteration, allowing to restart from that point at any
time.
This is useful, for example,
if you don’t know how long the process will take and cannot hog computational resources forever
if there might be system failures due to shaky infrastructure (or colleagues…)
if you want to adjust some parameters and continue with the already obtained results
print(__doc__)
import sys
import numpy as np
np.random.seed(777)
import os
Simple example¶
We will use pretty much the same optimization problem as in the
Bayesian optimization with skopt
notebook. Additionally we will instantiate the callbacks.CheckpointSaver
and pass it to the minimizer:
from skopt import gp_minimize
from skopt import callbacks
from skopt.callbacks import CheckpointSaver
noise_level = 0.1
def obj_fun(x, noise_level=noise_level):
return np.sin(5 * x[0]) * (1 - np.tanh(x[0] ** 2)) + np.random.randn() \
* noise_level
checkpoint_saver = CheckpointSaver("./checkpoint.pkl", compress=9) # keyword arguments will be passed to `skopt.dump`
gp_minimize(obj_fun, # the function to minimize
[(-20.0, 20.0)], # the bounds on each dimension of x
x0=[-20.], # the starting point
acq_func="LCB", # the acquisition function (optional)
n_calls=10, # number of evaluations of f including at x0
n_random_starts=3, # the number of random initial points
callback=[checkpoint_saver],
# a list of callbacks including the checkpoint saver
random_state=777)
Out:
fun: -0.17524445239614728
func_vals: array([-0.04682088, -0.08228249, -0.00653801, -0.07133619, 0.09063509,
0.07662367, 0.08260541, -0.13236828, -0.17524445, 0.10024491])
models: [GaussianProcessRegressor(kernel=1**2 * Matern(length_scale=1, nu=2.5) + WhiteKernel(noise_level=1),
n_restarts_optimizer=2, noise='gaussian',
normalize_y=True, random_state=655685735), GaussianProcessRegressor(kernel=1**2 * Matern(length_scale=1, nu=2.5) + WhiteKernel(noise_level=1),
n_restarts_optimizer=2, noise='gaussian',
normalize_y=True, random_state=655685735), GaussianProcessRegressor(kernel=1**2 * Matern(length_scale=1, nu=2.5) + WhiteKernel(noise_level=1),
n_restarts_optimizer=2, noise='gaussian',
normalize_y=True, random_state=655685735), GaussianProcessRegressor(kernel=1**2 * Matern(length_scale=1, nu=2.5) + WhiteKernel(noise_level=1),
n_restarts_optimizer=2, noise='gaussian',
normalize_y=True, random_state=655685735), GaussianProcessRegressor(kernel=1**2 * Matern(length_scale=1, nu=2.5) + WhiteKernel(noise_level=1),
n_restarts_optimizer=2, noise='gaussian',
normalize_y=True, random_state=655685735), GaussianProcessRegressor(kernel=1**2 * Matern(length_scale=1, nu=2.5) + WhiteKernel(noise_level=1),
n_restarts_optimizer=2, noise='gaussian',
normalize_y=True, random_state=655685735), GaussianProcessRegressor(kernel=1**2 * Matern(length_scale=1, nu=2.5) + WhiteKernel(noise_level=1),
n_restarts_optimizer=2, noise='gaussian',
normalize_y=True, random_state=655685735)]
random_state: RandomState(MT19937) at 0x7F900BD12B40
space: Space([Real(low=-20.0, high=20.0, prior='uniform', transform='normalize')])
specs: {'args': {'func': <function obj_fun at 0x7f900af80280>, 'dimensions': Space([Real(low=-20.0, high=20.0, prior='uniform', transform='normalize')]), 'base_estimator': GaussianProcessRegressor(kernel=1**2 * Matern(length_scale=1, nu=2.5),
n_restarts_optimizer=2, noise='gaussian',
normalize_y=True, random_state=655685735), 'n_calls': 10, 'n_random_starts': 3, 'n_initial_points': 10, 'initial_point_generator': 'random', 'acq_func': 'LCB', 'acq_optimizer': 'auto', 'x0': [-20.0], 'y0': None, 'random_state': RandomState(MT19937) at 0x7F900BD12B40, 'verbose': False, 'callback': [<skopt.callbacks.CheckpointSaver object at 0x7f900b0e0ee0>], 'n_points': 10000, 'n_restarts_optimizer': 5, 'xi': 0.01, 'kappa': 1.96, 'n_jobs': 1, 'model_queue_size': None}, 'function': 'base_minimize'}
x: [-18.660711622818603]
x_iters: [[-20.0], [5.857990176187936], [-11.97095004855501], [5.450171667295798], [10.524218484749973], [-17.111120867646513], [7.251301450238415], [-19.16709880491886], [-18.660711622818603], [-18.284297243496926]]
Now let’s assume this did not finish at once but took some long time: you started this on Friday night, went out for the weekend and now, Monday morning, you’re eager to see the results. However, instead of the notebook server you only see a blank page and your colleague Garry tells you that he had had an update scheduled for Sunday noon – who doesn’t like updates?
gp_minimize
did not finish, and there is no res
variable with the
actual results!
Restoring the last checkpoint¶
Luckily we employed the callbacks.CheckpointSaver
and can now restore the latest
result with skopt.load
(see Store and load skopt optimization results for more
information on that)
Out:
-0.17524445239614728
Continue the search¶
The previous results can then be used to continue the optimization process:
x0 = res.x_iters
y0 = res.func_vals
gp_minimize(obj_fun, # the function to minimize
[(-20.0, 20.0)], # the bounds on each dimension of x
x0=x0, # already examined values for x
y0=y0, # observed values for x0
acq_func="LCB", # the acquisition function (optional)
n_calls=10, # number of evaluations of f including at x0
n_random_starts=3, # the number of random initialization points
callback=[checkpoint_saver],
random_state=777)
Out:
fun: -0.17524445239614728
func_vals: array([-0.04682088, -0.08228249, -0.00653801, -0.07133619, 0.09063509,
0.07662367, 0.08260541, -0.13236828, -0.17524445, 0.10024491,
0.05448095, 0.18951609, -0.07693575, -0.14030959, -0.06324675,
-0.05588737, -0.12332314, -0.04395035, 0.09147873, 0.02650409])
models: [GaussianProcessRegressor(kernel=1**2 * Matern(length_scale=1, nu=2.5) + WhiteKernel(noise_level=1),
n_restarts_optimizer=2, noise='gaussian',
normalize_y=True, random_state=655685735), GaussianProcessRegressor(kernel=1**2 * Matern(length_scale=1, nu=2.5) + WhiteKernel(noise_level=1),
n_restarts_optimizer=2, noise='gaussian',
normalize_y=True, random_state=655685735), GaussianProcessRegressor(kernel=1**2 * Matern(length_scale=1, nu=2.5) + WhiteKernel(noise_level=1),
n_restarts_optimizer=2, noise='gaussian',
normalize_y=True, random_state=655685735), GaussianProcessRegressor(kernel=1**2 * Matern(length_scale=1, nu=2.5) + WhiteKernel(noise_level=1),
n_restarts_optimizer=2, noise='gaussian',
normalize_y=True, random_state=655685735), GaussianProcessRegressor(kernel=1**2 * Matern(length_scale=1, nu=2.5) + WhiteKernel(noise_level=1),
n_restarts_optimizer=2, noise='gaussian',
normalize_y=True, random_state=655685735), GaussianProcessRegressor(kernel=1**2 * Matern(length_scale=1, nu=2.5) + WhiteKernel(noise_level=1),
n_restarts_optimizer=2, noise='gaussian',
normalize_y=True, random_state=655685735), GaussianProcessRegressor(kernel=1**2 * Matern(length_scale=1, nu=2.5) + WhiteKernel(noise_level=1),
n_restarts_optimizer=2, noise='gaussian',
normalize_y=True, random_state=655685735), GaussianProcessRegressor(kernel=1**2 * Matern(length_scale=1, nu=2.5) + WhiteKernel(noise_level=1),
n_restarts_optimizer=2, noise='gaussian',
normalize_y=True, random_state=655685735)]
random_state: RandomState(MT19937) at 0x7F900BD12B40
space: Space([Real(low=-20.0, high=20.0, prior='uniform', transform='normalize')])
specs: {'args': {'func': <function obj_fun at 0x7f900af80280>, 'dimensions': Space([Real(low=-20.0, high=20.0, prior='uniform', transform='normalize')]), 'base_estimator': GaussianProcessRegressor(kernel=1**2 * Matern(length_scale=1, nu=2.5),
n_restarts_optimizer=2, noise='gaussian',
normalize_y=True, random_state=655685735), 'n_calls': 10, 'n_random_starts': 3, 'n_initial_points': 10, 'initial_point_generator': 'random', 'acq_func': 'LCB', 'acq_optimizer': 'auto', 'x0': [[-20.0], [5.857990176187936], [-11.97095004855501], [5.450171667295798], [10.524218484749973], [-17.111120867646513], [7.251301450238415], [-19.16709880491886], [-18.660711622818603], [-18.284297243496926]], 'y0': array([-0.04682088, -0.08228249, -0.00653801, -0.07133619, 0.09063509,
0.07662367, 0.08260541, -0.13236828, -0.17524445, 0.10024491]), 'random_state': RandomState(MT19937) at 0x7F900BD12B40, 'verbose': False, 'callback': [<skopt.callbacks.CheckpointSaver object at 0x7f900b0e0ee0>], 'n_points': 10000, 'n_restarts_optimizer': 5, 'xi': 0.01, 'kappa': 1.96, 'n_jobs': 1, 'model_queue_size': None}, 'function': 'base_minimize'}
x: [-18.660711622818603]
x_iters: [[-20.0], [5.857990176187936], [-11.97095004855501], [5.450171667295798], [10.524218484749973], [-17.111120867646513], [7.251301450238415], [-19.16709880491886], [-18.660711622818603], [-18.284297243496926], [5.857990176187936], [-11.97095004855501], [5.450171667295798], [-19.095152570513417], [-18.99431276746093], [-19.303491085633596], [-18.902401743872336], [-18.828069913611525], [-19.391720111674047], [-18.851948436512373]]
Possible problems¶
changes in search space: You can use this technique to interrupt the search, tune the search space and continue the optimization. Note that the optimizers will complain if
x0
contains parameter values not covered by the dimension definitions, so in many cases shrinking the search space will not work without deleting the offending runs fromx0
andy0
.
for more information on how the results get saved and possible caveats
Total running time of the script: ( 0 minutes 2.554 seconds)
Estimated memory usage: 16 MB