.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/hyperparameter-optimization.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_examples_hyperparameter-optimization.py>`
        to download the full example code or to run this example in your browser via Binder

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_hyperparameter-optimization.py:


============================================
Tuning a scikit-learn estimator with `skopt`
============================================

Gilles Louppe, July 2016
Katie Malone, August 2016
Reformatted by Holger Nahrstaedt 2020

.. currentmodule:: skopt

If you are looking for a :obj:`sklearn.model_selection.GridSearchCV` replacement checkout
:ref:`sphx_glr_auto_examples_sklearn-gridsearchcv-replacement.py` instead.

Problem statement
=================

Tuning the hyper-parameters of a machine learning model is often carried out
using an exhaustive exploration of (a subset of) the space all hyper-parameter
configurations (e.g., using :obj:`sklearn.model_selection.GridSearchCV`), which
often results in a very time consuming operation.

In this notebook, we illustrate how to couple :class:`gp_minimize` with sklearn's
estimators to tune hyper-parameters using sequential model-based optimisation,
hopefully resulting in equivalent or better solutions, but within fewer
evaluations.

Note: scikit-optimize provides a dedicated interface for estimator tuning via
:class:`BayesSearchCV` class which has a similar interface to those of
:obj:`sklearn.model_selection.GridSearchCV`. This class uses functions of skopt to perform hyperparameter
search efficiently. For example usage of this class, see
:ref:`sphx_glr_auto_examples_sklearn-gridsearchcv-replacement.py`
example notebook.

.. GENERATED FROM PYTHON SOURCE LINES 35-38

.. code-block:: default

    print(__doc__)
    import numpy as np


.. GENERATED FROM PYTHON SOURCE LINES 39-44

Objective
=========
To tune the hyper-parameters of our model we need to define a model,
decide which parameters to optimize, and define the objective function
we want to minimize.

.. GENERATED FROM PYTHON SOURCE LINES 44-56

.. code-block:: default


    from sklearn.datasets import load_boston
    from sklearn.ensemble import GradientBoostingRegressor
    from sklearn.model_selection import cross_val_score

    boston = load_boston()
    X, y = boston.data, boston.target
    n_features = X.shape[1]

    # gradient boosted trees tend to do well on problems like this
    reg = GradientBoostingRegressor(n_estimators=50, random_state=0)


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    /home/circleci/miniconda/envs/testenv/lib/python3.9/site-packages/scikit_learn-1.0-py3.9-linux-x86_64.egg/sklearn/utils/deprecation.py:87: FutureWarning: Function load_boston is deprecated; `load_boston` is deprecated in 1.0 and will be removed in 1.2.

        The Boston housing prices dataset has an ethical problem. You can refer to
        the documentation of this function for further details.

        The scikit-learn maintainers therefore strongly discourage the use of this
        dataset unless the purpose of the code is to study and educate about
        ethical issues in data science and machine learning.

        In this case special case, you can fetch the dataset from the original
        source::

            import pandas as pd
            import numpy as np


            data_url = "http://lib.stat.cmu.edu/datasets/boston"
            raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
            data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
            target = raw_df.values[1::2, 2]

        Alternative datasets include the California housing dataset (i.e.
        func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
        dataset. You can load the datasets as follows:

            from sklearn.datasets import fetch_california_housing
            housing = fetch_california_housing()

        for the California housing dataset and:

            from sklearn.datasets import fetch_openml
            housing = fetch_openml(name="house_prices", as_frame=True)

        for the Ames housing dataset.
    
      warnings.warn(msg, category=FutureWarning)


.. GENERATED FROM PYTHON SOURCE LINES 57-61

Next, we need to define the bounds of the dimensions of the search space
we want to explore and pick the objective. In this case the cross-validation
mean absolute error of a gradient boosting regressor over the Boston
dataset, as a function of its hyper-parameters.

.. GENERATED FROM PYTHON SOURCE LINES 61-85

.. code-block:: default


    from skopt.space import Real, Integer
    from skopt.utils import use_named_args


    # The list of hyper-parameters we want to optimize. For each one we define the
    # bounds, the corresponding scikit-learn parameter name, as well as how to
    # sample values from that dimension (`'log-uniform'` for the learning rate)
    space  = [Integer(1, 5, name='max_depth'),
              Real(10**-5, 10**0, "log-uniform", name='learning_rate'),
              Integer(1, n_features, name='max_features'),
              Integer(2, 100, name='min_samples_split'),
              Integer(1, 100, name='min_samples_leaf')]

    # this decorator allows your objective function to receive a the parameters as
    # keyword arguments. This is particularly convenient when you want to set
    # scikit-learn estimator parameters
    @use_named_args(space)
    def objective(**params):
        reg.set_params(**params)

        return -np.mean(cross_val_score(reg, X, y, cv=5, n_jobs=-1,
                                        scoring="neg_mean_absolute_error"))


.. GENERATED FROM PYTHON SOURCE LINES 86-90

Optimize all the things!
========================
With these two pieces, we are now ready for sequential model-based
optimisation. Here we use gaussian process-based optimisation.

.. GENERATED FROM PYTHON SOURCE LINES 90-96

.. code-block:: default


    from skopt import gp_minimize
    res_gp = gp_minimize(objective, space, n_calls=50, random_state=0)

    "Best score=%.4f" % res_gp.fun


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none


    'Best score=2.9062'


.. GENERATED FROM PYTHON SOURCE LINES 97-107

.. code-block:: default


    print("""Best parameters:
    - max_depth=%d
    - learning_rate=%.6f
    - max_features=%d
    - min_samples_split=%d
    - min_samples_leaf=%d""" % (res_gp.x[0], res_gp.x[1],
                                res_gp.x[2], res_gp.x[3],
                                res_gp.x[4]))


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Best parameters:
    - max_depth=5
    - learning_rate=0.143650
    - max_features=9
    - min_samples_split=100
    - min_samples_leaf=1


.. GENERATED FROM PYTHON SOURCE LINES 108-110

Convergence plot
================

.. GENERATED FROM PYTHON SOURCE LINES 110-115

.. code-block:: default


    from skopt.plots import plot_convergence

    plot_convergence(res_gp)


.. image-sg:: /auto_examples/images/sphx_glr_hyperparameter-optimization_001.png
   :alt: Convergence plot
   :srcset: /auto_examples/images/sphx_glr_hyperparameter-optimization_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none


    <AxesSubplot:title={'center':'Convergence plot'}, xlabel='Number of calls $n$', ylabel='$\\min f(x)$ after $n$ calls'>


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  24.026 seconds)

**Estimated memory usage:**  31 MB


.. _sphx_glr_download_auto_examples_hyperparameter-optimization.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: binder-badge

    .. image:: images/binder_badge_logo.svg
      :target: https://mybinder.org/v2/gh/scikit-optimize/scikit-optimize/master?urlpath=lab/tree/notebooks/auto_examples/hyperparameter-optimization.ipynb
      :alt: Launch binder
      :width: 150 px


  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: hyperparameter-optimization.py <hyperparameter-optimization.py>`


  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: hyperparameter-optimization.ipynb <hyperparameter-optimization.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_