Fork me on GitHub Top

skopt.learning module

Machine learning extensions for model-based optimization.

"""Machine learning extensions for model-based optimization."""

from skgarden import RandomForestRegressor
from skgarden import ExtraTreesRegressor

from .gbrt import GradientBoostingQuantileRegressor
from .gaussian_process import GaussianProcessRegressor


__all__ = ("RandomForestRegressor",
           "ExtraTreesRegressor",
           "GradientBoostingQuantileRegressor",
           "GaussianProcessRegressor")

Classes

class ExtraTreesRegressor

ExtraTreesRegressor that supports conditional standard deviation.

Parameters

n_estimators : integer, optional (default=10) The number of trees in the forest.

criterion : string, optional (default="mse") The function to measure the quality of a split. Supported criteria are "mse" for the mean squared error, which is equal to variance reduction as feature selection criterion, and "mae" for the mean absolute error.

max_features : int, float, string or None, optional (default="auto") The number of features to consider when looking for the best split: - If int, then consider max_features features at each split. - If float, then max_features is a percentage and int(max_features * n_features) features are considered at each split. - If "auto", then max_features=n_features. - If "sqrt", then max_features=sqrt(n_features). - If "log2", then max_features=log2(n_features). - If None, then max_features=n_features. Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

max_depth : integer or None, optional (default=None) The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_split : int, float, optional (default=2) The minimum number of samples required to split an internal node: - If int, then consider min_samples_split as the minimum number. - If float, then min_samples_split is a percentage and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

min_samples_leaf : int, float, optional (default=1) The minimum number of samples required to be at a leaf node: - If int, then consider min_samples_leaf as the minimum number. - If float, then min_samples_leaf is a percentage and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

min_weight_fraction_leaf : float, optional (default=0.) The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

max_leaf_nodes : int or None, optional (default=None) Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

min_impurity_decrease : float, optional (default=0.) A node will be split if this split induces a decrease of the impurity greater than or equal to this value. The weighted impurity decrease equation is the following:: N_t / N * (impurity - N_t_R / N_t * right_impurity - N_t_L / N_t * left_impurity) where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child. N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

bootstrap : boolean, optional (default=True) Whether bootstrap samples are used when building trees.

oob_score : bool, optional (default=False) whether to use out-of-bag samples to estimate the R^2 on unseen data.

n_jobs : integer, optional (default=1) The number of jobs to run in parallel for both fit and predict. If -1, then the number of jobs is set to the number of cores.

random_state : int, RandomState instance or None, optional (default=None) If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

verbose : int, optional (default=0) Controls the verbosity of the tree building process.

warm_start : bool, optional (default=False) When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest.

Attributes

estimators_ : list of DecisionTreeRegressor The collection of fitted sub-estimators.

feature_importances_ : array of shape = [n_features] The feature importances (the higher, the more important the feature).

n_features_ : int The number of features when fit is performed.

n_outputs_ : int The number of outputs when fit is performed.

oob_score_ : float Score of the training dataset obtained using an out-of-bag estimate.

oob_prediction_ : array of shape = [n_samples] Prediction computed with out-of-bag estimate on the training set.

Notes

The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values. The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data, max_features=n_features and bootstrap=False, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a deterministic behaviour during fitting, random_state has to be fixed.

References

.. [1] L. Breiman, "Random Forests", Machine Learning, 45(1), 5-32, 2001.

class ExtraTreesRegressor(_sk_ExtraTreesRegressor):
    """
    ExtraTreesRegressor that supports conditional standard deviation.

    Parameters
    ----------
    n_estimators : integer, optional (default=10)
        The number of trees in the forest.

    criterion : string, optional (default="mse")
        The function to measure the quality of a split. Supported criteria
        are "mse" for the mean squared error, which is equal to variance
        reduction as feature selection criterion, and "mae" for the mean
        absolute error.

    max_features : int, float, string or None, optional (default="auto")
        The number of features to consider when looking for the best split:
        - If int, then consider `max_features` features at each split.
        - If float, then `max_features` is a percentage and
          `int(max_features * n_features)` features are considered at each
          split.
        - If "auto", then `max_features=n_features`.
        - If "sqrt", then `max_features=sqrt(n_features)`.
        - If "log2", then `max_features=log2(n_features)`.
        - If None, then `max_features=n_features`.
        Note: the search for a split does not stop until at least one
        valid partition of the node samples is found, even if it requires to
        effectively inspect more than ``max_features`` features.

    max_depth : integer or None, optional (default=None)
        The maximum depth of the tree. If None, then nodes are expanded until
        all leaves are pure or until all leaves contain less than
        min_samples_split samples.

    min_samples_split : int, float, optional (default=2)
        The minimum number of samples required to split an internal node:
        - If int, then consider `min_samples_split` as the minimum number.
        - If float, then `min_samples_split` is a percentage and
          `ceil(min_samples_split * n_samples)` are the minimum
          number of samples for each split.

    min_samples_leaf : int, float, optional (default=1)
        The minimum number of samples required to be at a leaf node:
        - If int, then consider `min_samples_leaf` as the minimum number.
        - If float, then `min_samples_leaf` is a percentage and
          `ceil(min_samples_leaf * n_samples)` are the minimum
          number of samples for each node.

    min_weight_fraction_leaf : float, optional (default=0.)
        The minimum weighted fraction of the sum total of weights (of all
        the input samples) required to be at a leaf node. Samples have
        equal weight when sample_weight is not provided.

    max_leaf_nodes : int or None, optional (default=None)
        Grow trees with ``max_leaf_nodes`` in best-first fashion.
        Best nodes are defined as relative reduction in impurity.
        If None then unlimited number of leaf nodes.

    min_impurity_decrease : float, optional (default=0.)
        A node will be split if this split induces a decrease of the impurity
        greater than or equal to this value.
        The weighted impurity decrease equation is the following::
            N_t / N * (impurity - N_t_R / N_t * right_impurity
                                - N_t_L / N_t * left_impurity)
        where ``N`` is the total number of samples, ``N_t`` is the number of
        samples at the current node, ``N_t_L`` is the number of samples in the
        left child, and ``N_t_R`` is the number of samples in the right child.
        ``N``, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum,
        if ``sample_weight`` is passed.

    bootstrap : boolean, optional (default=True)
        Whether bootstrap samples are used when building trees.

    oob_score : bool, optional (default=False)
        whether to use out-of-bag samples to estimate
        the R^2 on unseen data.

    n_jobs : integer, optional (default=1)
        The number of jobs to run in parallel for both `fit` and `predict`.
        If -1, then the number of jobs is set to the number of cores.

    random_state : int, RandomState instance or None, optional (default=None)
        If int, random_state is the seed used by the random number generator;
        If RandomState instance, random_state is the random number generator;
        If None, the random number generator is the RandomState instance used
        by `np.random`.

    verbose : int, optional (default=0)
        Controls the verbosity of the tree building process.

    warm_start : bool, optional (default=False)
        When set to ``True``, reuse the solution of the previous call to fit
        and add more estimators to the ensemble, otherwise, just fit a whole
        new forest.

    Attributes
    ----------
    estimators_ : list of DecisionTreeRegressor
        The collection of fitted sub-estimators.

    feature_importances_ : array of shape = [n_features]
        The feature importances (the higher, the more important the feature).

    n_features_ : int
        The number of features when ``fit`` is performed.

    n_outputs_ : int
        The number of outputs when ``fit`` is performed.

    oob_score_ : float
        Score of the training dataset obtained using an out-of-bag estimate.

    oob_prediction_ : array of shape = [n_samples]
        Prediction computed with out-of-bag estimate on the training set.

    Notes
    -----
    The default values for the parameters controlling the size of the trees
    (e.g. ``max_depth``, ``min_samples_leaf``, etc.) lead to fully grown and
    unpruned trees which can potentially be very large on some data sets. To
    reduce memory consumption, the complexity and size of the trees should be
    controlled by setting those parameter values.
    The features are always randomly permuted at each split. Therefore,
    the best found split may vary, even with the same training data,
    ``max_features=n_features`` and ``bootstrap=False``, if the improvement
    of the criterion is identical for several splits enumerated during the
    search of the best split. To obtain a deterministic behaviour during
    fitting, ``random_state`` has to be fixed.

    References
    ----------
    .. [1] L. Breiman, "Random Forests", Machine Learning, 45(1), 5-32, 2001.
    """
    def __init__(self, n_estimators=10, criterion='mse', max_depth=None,
                 min_samples_split=2, min_samples_leaf=1,
                 min_weight_fraction_leaf=0.0, max_features='auto',
                 max_leaf_nodes=None, bootstrap=False, oob_score=False,
                 n_jobs=1, random_state=None, verbose=0, warm_start=False,
                 min_variance=0.0):
        self.min_variance = min_variance
        super(ExtraTreesRegressor, self).__init__(
            n_estimators=n_estimators, criterion=criterion,
            max_depth=max_depth,
            min_samples_split=min_samples_split,
            min_samples_leaf=min_samples_leaf,
            min_weight_fraction_leaf=min_weight_fraction_leaf,
            max_features=max_features, max_leaf_nodes=max_leaf_nodes,
            bootstrap=bootstrap, oob_score=oob_score,
            n_jobs=n_jobs, random_state=random_state,
            verbose=verbose, warm_start=warm_start)

    def predict(self, X, return_std=False):
        """
        Predict continuous output for X.

        Parameters
        ----------
        X : array-like of shape=(n_samples, n_features)
            Input data.

        return_std : boolean
            Whether or not to return the standard deviation.

        Returns
        -------
        predictions : array-like of shape=(n_samples,)
            Predicted values for X. If criterion is set to "mse",
            then `predictions[i] ~= mean(y | X[i])`.

        std : array-like of shape=(n_samples,)
            Standard deviation of `y` at `X`. If criterion
            is set to "mse", then `std[i] ~= std(y | X[i])`.
        """
        mean = super(ExtraTreesRegressor, self).predict(X)

        if return_std:
            if self.criterion != "mse":
                raise ValueError(
                    "Expected impurity to be 'mse', got %s instead"
                    % self.criterion)
            std = _return_std(X, self.estimators_, mean, self.min_variance)
            return mean, std

        return mean

Ancestors (in MRO)

  • ExtraTreesRegressor
  • sklearn.ensemble.forest.ExtraTreesRegressor
  • sklearn.ensemble.forest.ForestRegressor
  • abc.NewBase
  • sklearn.ensemble.forest.BaseForest
  • abc.NewBase
  • sklearn.ensemble.base.BaseEnsemble
  • abc.NewBase
  • sklearn.base.BaseEstimator
  • sklearn.base.MetaEstimatorMixin
  • sklearn.base.RegressorMixin
  • builtins.object

Instance variables

var feature_importances_

Return the feature importances (the higher, the more important the feature).

Returns

feature_importances_ : array, shape = [n_features]

var min_variance

class GaussianProcessRegressor

GaussianProcessRegressor that allows noise tunability.

The implementation is based on Algorithm 2.1 of Gaussian Processes for Machine Learning (GPML) by Rasmussen and Williams.

In addition to standard scikit-learn estimator API, GaussianProcessRegressor:

  • allows prediction without prior fitting (based on the GP prior);
  • provides an additional method sample_y(X), which evaluates samples drawn from the GPR (prior or posterior) at given inputs;
  • exposes a method log_marginal_likelihood(theta), which can be used externally for other ways of selecting hyperparameters, e.g., via Markov chain Monte Carlo.

Parameters

  • kernel [kernel object]: The kernel specifying the covariance function of the GP. If None is passed, the kernel "1.0 * RBF(1.0)" is used as default. Note that the kernel's hyperparameters are optimized during fitting.

  • alpha [float or array-like, optional (default: 1e-10)]: Value added to the diagonal of the kernel matrix during fitting. Larger values correspond to increased noise level in the observations and reduce potential numerical issue during fitting. If an array is passed, it must have the same number of entries as the data used for fitting and is used as datapoint-dependent noise level. Note that this is equivalent to adding a WhiteKernel with c=alpha. Allowing to specify the noise level directly as a parameter is mainly for convenience and for consistency with Ridge.

  • optimizer [string or callable, optional (default: "fmin_l_bfgs_b")]: Can either be one of the internally supported optimizers for optimizing the kernel's parameters, specified by a string, or an externally defined optimizer passed as a callable. If a callable is passed, it must have the signature::

    def optimizer(obj_func, initial_theta, bounds):
        # * 'obj_func' is the objective function to be maximized, which
        #   takes the hyperparameters theta as parameter and an
        #   optional flag eval_gradient, which determines if the
        #   gradient is returned additionally to the function value
        # * 'initial_theta': the initial value for theta, which can be
        #   used by local optimizers
        # * 'bounds': the bounds on the values of theta
        ....
        # Returned are the best found hyperparameters theta and
        # the corresponding value of the target function.
        return theta_opt, func_min
    

    Per default, the 'fmin_l_bfgs_b' algorithm from scipy.optimize is used. If None is passed, the kernel's parameters are kept fixed. Available internal optimizers are::

    'fmin_l_bfgs_b'
    
  • n_restarts_optimizer [int, optional (default: 0)]: The number of restarts of the optimizer for finding the kernel's parameters which maximize the log-marginal likelihood. The first run of the optimizer is performed from the kernel's initial parameters, the remaining ones (if any) from thetas sampled log-uniform randomly from the space of allowed theta-values. If greater than 0, all bounds must be finite. Note that n_restarts_optimizer == 0 implies that one run is performed.

  • normalize_y [boolean, optional (default: False)]: Whether the target values y are normalized, i.e., the mean of the observed target values become zero. This parameter should be set to True if the target values' mean is expected to differ considerable from zero. When enabled, the normalization effectively modifies the GP's prior based on the data, which contradicts the likelihood principle; normalization is thus disabled per default.

  • copy_X_train [bool, optional (default: True)]: If True, a persistent copy of the training data is stored in the object. Otherwise, just a reference to the training data is stored, which might cause predictions to change if the data is modified externally.

  • random_state [integer or numpy.RandomState, optional]: The generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator.

  • noise [string, "gaussian", optional]: If set to "gaussian", then it is assumed that y is a noisy estimate of f(x) where the noise is gaussian.

Attributes

  • X_train_ [array-like, shape = (n_samples, n_features)]: Feature values in training data (also required for prediction)

  • y_train_ [array-like, shape = (n_samples, [n_output_dims])]: Target values in training data (also required for prediction)

  • kernel_ [kernel object]: The kernel used for prediction. The structure of the kernel is the same as the one passed as parameter but with optimized hyperparameters

  • L_ [array-like, shape = (n_samples, n_samples)]: Lower-triangular Cholesky decomposition of the kernel in X_train_

  • alpha_ [array-like, shape = (n_samples,)]: Dual coefficients of training data points in kernel space

  • log_marginal_likelihood_value_ [float]: The log-marginal-likelihood of self.kernel_.theta

  • noise_ [float]: Estimate of the gaussian noise. Useful only when noise is set to "gaussian".

class GaussianProcessRegressor(sk_GaussianProcessRegressor):
    """
    GaussianProcessRegressor that allows noise tunability.

    The implementation is based on Algorithm 2.1 of Gaussian Processes
    for Machine Learning (GPML) by Rasmussen and Williams.

    In addition to standard scikit-learn estimator API,
    GaussianProcessRegressor:

       * allows prediction without prior fitting (based on the GP prior);
       * provides an additional method sample_y(X), which evaluates samples
         drawn from the GPR (prior or posterior) at given inputs;
       * exposes a method log_marginal_likelihood(theta), which can be used
         externally for other ways of selecting hyperparameters, e.g., via
         Markov chain Monte Carlo.

    Parameters
    ----------
    * `kernel` [kernel object]:
        The kernel specifying the covariance function of the GP. If None is
        passed, the kernel "1.0 * RBF(1.0)" is used as default. Note that
        the kernel's hyperparameters are optimized during fitting.

    * `alpha` [float or array-like, optional (default: 1e-10)]:
        Value added to the diagonal of the kernel matrix during fitting.
        Larger values correspond to increased noise level in the observations
        and reduce potential numerical issue during fitting. If an array is
        passed, it must have the same number of entries as the data used for
        fitting and is used as datapoint-dependent noise level. Note that this
        is equivalent to adding a WhiteKernel with c=alpha. Allowing to specify
        the noise level directly as a parameter is mainly for convenience and
        for consistency with Ridge.

    * `optimizer` [string or callable, optional (default: "fmin_l_bfgs_b")]:
        Can either be one of the internally supported optimizers for optimizing
        the kernel's parameters, specified by a string, or an externally
        defined optimizer passed as a callable. If a callable is passed, it
        must have the signature::

            def optimizer(obj_func, initial_theta, bounds):
                # * 'obj_func' is the objective function to be maximized, which
                #   takes the hyperparameters theta as parameter and an
                #   optional flag eval_gradient, which determines if the
                #   gradient is returned additionally to the function value
                # * 'initial_theta': the initial value for theta, which can be
                #   used by local optimizers
                # * 'bounds': the bounds on the values of theta
                ....
                # Returned are the best found hyperparameters theta and
                # the corresponding value of the target function.
                return theta_opt, func_min

        Per default, the 'fmin_l_bfgs_b' algorithm from scipy.optimize
        is used. If None is passed, the kernel's parameters are kept fixed.
        Available internal optimizers are::

            'fmin_l_bfgs_b'

    * `n_restarts_optimizer` [int, optional (default: 0)]:
        The number of restarts of the optimizer for finding the kernel's
        parameters which maximize the log-marginal likelihood. The first run
        of the optimizer is performed from the kernel's initial parameters,
        the remaining ones (if any) from thetas sampled log-uniform randomly
        from the space of allowed theta-values. If greater than 0, all bounds
        must be finite. Note that n_restarts_optimizer == 0 implies that one
        run is performed.

    * `normalize_y` [boolean, optional (default: False)]:
        Whether the target values y are normalized, i.e., the mean of the
        observed target values become zero. This parameter should be set to
        True if the target values' mean is expected to differ considerable from
        zero. When enabled, the normalization effectively modifies the GP's
        prior based on the data, which contradicts the likelihood principle;
        normalization is thus disabled per default.

    * `copy_X_train` [bool, optional (default: True)]:
        If True, a persistent copy of the training data is stored in the
        object. Otherwise, just a reference to the training data is stored,
        which might cause predictions to change if the data is modified
        externally.

    * `random_state` [integer or numpy.RandomState, optional]:
        The generator used to initialize the centers. If an integer is
        given, it fixes the seed. Defaults to the global numpy random
        number generator.

    * `noise` [string, "gaussian", optional]:
        If set to "gaussian", then it is assumed that `y` is a noisy
        estimate of `f(x)` where the noise is gaussian.

    Attributes
    ----------
    * `X_train_` [array-like, shape = (n_samples, n_features)]:
        Feature values in training data (also required for prediction)

    * `y_train_` [array-like, shape = (n_samples, [n_output_dims])]:
        Target values in training data (also required for prediction)

    * `kernel_` [kernel object]:
        The kernel used for prediction. The structure of the kernel is the
        same as the one passed as parameter but with optimized hyperparameters

    * `L_` [array-like, shape = (n_samples, n_samples)]:
        Lower-triangular Cholesky decomposition of the kernel in ``X_train_``

    * `alpha_` [array-like, shape = (n_samples,)]:
        Dual coefficients of training data points in kernel space

    * `log_marginal_likelihood_value_` [float]:
        The log-marginal-likelihood of ``self.kernel_.theta``

    * `noise_` [float]:
        Estimate of the gaussian noise. Useful only when noise is set to
        "gaussian".
    """
    def __init__(self, kernel=None, alpha=0.0,
                 optimizer="fmin_l_bfgs_b", n_restarts_optimizer=0,
                 normalize_y=False, copy_X_train=True, random_state=None,
                 noise=None):
        self.noise = noise
        super(GaussianProcessRegressor, self).__init__(
            kernel=kernel, alpha=alpha, optimizer=optimizer,
            n_restarts_optimizer=n_restarts_optimizer,
            normalize_y=normalize_y, copy_X_train=copy_X_train,
            random_state=random_state)

    def fit(self, X, y):
        """Fit Gaussian process regression model.

        Parameters
        ----------
        * `X` [array-like, shape = (n_samples, n_features)]:
            Training data

        * `y` [array-like, shape = (n_samples, [n_output_dims])]:
            Target values

        Returns
        -------
        * `self`:
            Returns an instance of self.
        """
        if isinstance(self.noise, str) and self.noise != "gaussian":
            raise ValueError("expected noise to be 'gaussian', got %s"
                             % self.noise)

        if self.kernel is None:
            self.kernel = ConstantKernel(1.0, constant_value_bounds="fixed") \
                          * RBF(1.0, length_scale_bounds="fixed")
        elif self.noise == "gaussian":
            self.kernel = self.kernel + WhiteKernel()
        elif self.noise:
            self.kernel = self.kernel + WhiteKernel(
                noise_level=self.noise, noise_level_bounds="fixed"
            )
        super(GaussianProcessRegressor, self).fit(X, y)

        self.noise_ = None

        if self.noise:
            # The noise component of this kernel should be set to zero
            # while estimating K(X_test, X_test)
            # Note that the term K(X, X) should include the noise but
            # this (K(X, X))^-1y is precomputed as the attribute `alpha_`.
            # (Notice the underscore).
            # This has been described in Eq 2.24 of
            # http://www.gaussianprocess.org/gpml/chapters/RW2.pdf
            # Hence this hack
            if isinstance(self.kernel_, WhiteKernel):
                self.kernel_.set_params(noise_level=0.0)

            else:
                white_present, white_param = _param_for_white_kernel_in_Sum(
                    self.kernel_)

                # This should always be true. Just in case.
                if white_present:
                    noise_kernel = self.kernel_.get_params()[white_param]
                    self.noise_ = noise_kernel.noise_level
                    self.kernel_.set_params(
                        **{white_param: WhiteKernel(noise_level=0.0)})

        # Precompute arrays needed at prediction
        L_inv = solve_triangular(self.L_.T, np.eye(self.L_.shape[0]))
        self.K_inv_ = L_inv.dot(L_inv.T)

        # Fix deprecation warning #462
        if int(sklearn.__version__[2:4]) >= 19:
            self.y_train_mean_ = self._y_train_mean
        else:
            self.y_train_mean_ = self.y_train_mean

        return self

    def predict(self, X, return_std=False, return_cov=False,
                return_mean_grad=False, return_std_grad=False):
        """
        Predict output for X.

        In addition to the mean of the predictive distribution, also its
        standard deviation (return_std=True) or covariance (return_cov=True),
        the gradient of the mean and the standard-deviation with respect to X
        can be optionally provided.

        Parameters
        ----------
        * `X` [array-like, shape = (n_samples, n_features)]:
            Query points where the GP is evaluated.

        * `return_std` [bool, default: False]:
            If True, the standard-deviation of the predictive distribution at
            the query points is returned along with the mean.

        * `return_cov` [bool, default: False]:
            If True, the covariance of the joint predictive distribution at
            the query points is returned along with the mean.

        * `return_mean_grad` [bool, default: False]:
            Whether or not to return the gradient of the mean.
            Only valid when X is a single point.

        * `return_std_grad` [bool, default: False]:
            Whether or not to return the gradient of the std.
            Only valid when X is a single point.

        Returns
        -------
        * `y_mean` [array, shape = (n_samples, [n_output_dims]):
            Mean of predictive distribution a query points

        * `y_std` [array, shape = (n_samples,), optional]:
            Standard deviation of predictive distribution at query points.
            Only returned when return_std is True.

        * `y_cov` [array, shape = (n_samples, n_samples), optional]:
            Covariance of joint predictive distribution a query points.
            Only returned when return_cov is True.

        * `y_mean_grad` [shape = (n_samples, n_features)]:
            The gradient of the predicted mean

        * `y_std_grad` [shape = (n_samples, n_features)]:
            The gradient of the predicted std.
        """
        if return_std and return_cov:
            raise RuntimeError(
                "Not returning standard deviation of predictions when "
                "returning full covariance.")

        if return_std_grad and not return_std:
            raise ValueError(
                "Not returning std_gradient without returning "
                "the std.")

        X = check_array(X)
        if X.shape[0] != 1 and (return_mean_grad or return_std_grad):
            raise ValueError("Not implemented for n_samples > 1")

        if not hasattr(self, "X_train_"):  # Not fit; predict based on GP prior
            y_mean = np.zeros(X.shape[0])
            if return_cov:
                y_cov = self.kernel(X)
                return y_mean, y_cov
            elif return_std:
                y_var = self.kernel.diag(X)
                return y_mean, np.sqrt(y_var)
            else:
                return y_mean

        else:  # Predict based on GP posterior
            K_trans = self.kernel_(X, self.X_train_)
            y_mean = K_trans.dot(self.alpha_)    # Line 4 (y_mean = f_star)
            y_mean = self.y_train_mean_ + y_mean  # undo normal.

            if return_cov:
                v = cho_solve((self.L_, True), K_trans.T)  # Line 5
                y_cov = self.kernel_(X) - K_trans.dot(v)   # Line 6
                return y_mean, y_cov

            elif return_std:
                K_inv = self.K_inv_

                # Compute variance of predictive distribution
                y_var = self.kernel_.diag(X)
                y_var -= np.einsum("ki,kj,ij->k", K_trans, K_trans, K_inv)

                # Check if any of the variances is negative because of
                # numerical issues. If yes: set the variance to 0.
                y_var_negative = y_var < 0
                if np.any(y_var_negative):
                    warnings.warn("Predicted variances smaller than 0. "
                                  "Setting those variances to 0.")
                    y_var[y_var_negative] = 0.0
                y_std = np.sqrt(y_var)

            if return_mean_grad:
                grad = self.kernel_.gradient_x(X[0], self.X_train_)
                grad_mean = np.dot(grad.T, self.alpha_)

                if return_std_grad:
                    grad_std = np.zeros(X.shape[1])
                    if not np.allclose(y_std, grad_std):
                        grad_std = -np.dot(K_trans,
                                           np.dot(K_inv, grad))[0] / y_std
                    return y_mean, y_std, grad_mean, grad_std

                if return_std:
                    return y_mean, y_std, grad_mean
                else:
                    return y_mean, grad_mean

            else:
                if return_std:
                    return y_mean, y_std
                else:
                    return y_mean

Ancestors (in MRO)

  • GaussianProcessRegressor
  • sklearn.gaussian_process.gpr.GaussianProcessRegressor
  • sklearn.base.BaseEstimator
  • sklearn.base.RegressorMixin
  • builtins.object

Static methods

def __init__(

self, kernel=None, alpha=0.0, optimizer='fmin_l_bfgs_b', n_restarts_optimizer=0, normalize_y=False, copy_X_train=True, random_state=None, noise=None)

Initialize self. See help(type(self)) for accurate signature.

def __init__(self, kernel=None, alpha=0.0,
             optimizer="fmin_l_bfgs_b", n_restarts_optimizer=0,
             normalize_y=False, copy_X_train=True, random_state=None,
             noise=None):
    self.noise = noise
    super(GaussianProcessRegressor, self).__init__(
        kernel=kernel, alpha=alpha, optimizer=optimizer,
        n_restarts_optimizer=n_restarts_optimizer,
        normalize_y=normalize_y, copy_X_train=copy_X_train,
        random_state=random_state)

def fit(

self, X, y)

Fit Gaussian process regression model.

Parameters

  • X [array-like, shape = (n_samples, n_features)]: Training data

  • y [array-like, shape = (n_samples, [n_output_dims])]: Target values

Returns

  • self: Returns an instance of self.
def fit(self, X, y):
    """Fit Gaussian process regression model.
    Parameters
    ----------
    * `X` [array-like, shape = (n_samples, n_features)]:
        Training data
    * `y` [array-like, shape = (n_samples, [n_output_dims])]:
        Target values
    Returns
    -------
    * `self`:
        Returns an instance of self.
    """
    if isinstance(self.noise, str) and self.noise != "gaussian":
        raise ValueError("expected noise to be 'gaussian', got %s"
                         % self.noise)
    if self.kernel is None:
        self.kernel = ConstantKernel(1.0, constant_value_bounds="fixed") \
                      * RBF(1.0, length_scale_bounds="fixed")
    elif self.noise == "gaussian":
        self.kernel = self.kernel + WhiteKernel()
    elif self.noise:
        self.kernel = self.kernel + WhiteKernel(
            noise_level=self.noise, noise_level_bounds="fixed"
        )
    super(GaussianProcessRegressor, self).fit(X, y)
    self.noise_ = None
    if self.noise:
        # The noise component of this kernel should be set to zero
        # while estimating K(X_test, X_test)
        # Note that the term K(X, X) should include the noise but
        # this (K(X, X))^-1y is precomputed as the attribute `alpha_`.
        # (Notice the underscore).
        # This has been described in Eq 2.24 of
        # http://www.gaussianprocess.org/gpml/chapters/RW2.pdf
        # Hence this hack
        if isinstance(self.kernel_, WhiteKernel):
            self.kernel_.set_params(noise_level=0.0)
        else:
            white_present, white_param = _param_for_white_kernel_in_Sum(
                self.kernel_)
            # This should always be true. Just in case.
            if white_present:
                noise_kernel = self.kernel_.get_params()[white_param]
                self.noise_ = noise_kernel.noise_level
                self.kernel_.set_params(
                    **{white_param: WhiteKernel(noise_level=0.0)})
    # Precompute arrays needed at prediction
    L_inv = solve_triangular(self.L_.T, np.eye(self.L_.shape[0]))
    self.K_inv_ = L_inv.dot(L_inv.T)
    # Fix deprecation warning #462
    if int(sklearn.__version__[2:4]) >= 19:
        self.y_train_mean_ = self._y_train_mean
    else:
        self.y_train_mean_ = self.y_train_mean
    return self

def predict(

self, X, return_std=False, return_cov=False, return_mean_grad=False, return_std_grad=False)

Predict output for X.

In addition to the mean of the predictive distribution, also its standard deviation (return_std=True) or covariance (return_cov=True), the gradient of the mean and the standard-deviation with respect to X can be optionally provided.

Parameters

  • X [array-like, shape = (n_samples, n_features)]: Query points where the GP is evaluated.

  • return_std [bool, default: False]: If True, the standard-deviation of the predictive distribution at the query points is returned along with the mean.

  • return_cov [bool, default: False]: If True, the covariance of the joint predictive distribution at the query points is returned along with the mean.

  • return_mean_grad [bool, default: False]: Whether or not to return the gradient of the mean. Only valid when X is a single point.

  • return_std_grad [bool, default: False]: Whether or not to return the gradient of the std. Only valid when X is a single point.

Returns

  • y_mean [array, shape = (n_samples, [n_output_dims]): Mean of predictive distribution a query points

  • y_std [array, shape = (n_samples,), optional]: Standard deviation of predictive distribution at query points. Only returned when return_std is True.

  • y_cov [array, shape = (n_samples, n_samples), optional]: Covariance of joint predictive distribution a query points. Only returned when return_cov is True.

  • y_mean_grad [shape = (n_samples, n_features)]: The gradient of the predicted mean

  • y_std_grad [shape = (n_samples, n_features)]: The gradient of the predicted std.

def predict(self, X, return_std=False, return_cov=False,
            return_mean_grad=False, return_std_grad=False):
    """
    Predict output for X.
    In addition to the mean of the predictive distribution, also its
    standard deviation (return_std=True) or covariance (return_cov=True),
    the gradient of the mean and the standard-deviation with respect to X
    can be optionally provided.
    Parameters
    ----------
    * `X` [array-like, shape = (n_samples, n_features)]:
        Query points where the GP is evaluated.
    * `return_std` [bool, default: False]:
        If True, the standard-deviation of the predictive distribution at
        the query points is returned along with the mean.
    * `return_cov` [bool, default: False]:
        If True, the covariance of the joint predictive distribution at
        the query points is returned along with the mean.
    * `return_mean_grad` [bool, default: False]:
        Whether or not to return the gradient of the mean.
        Only valid when X is a single point.
    * `return_std_grad` [bool, default: False]:
        Whether or not to return the gradient of the std.
        Only valid when X is a single point.
    Returns
    -------
    * `y_mean` [array, shape = (n_samples, [n_output_dims]):
        Mean of predictive distribution a query points
    * `y_std` [array, shape = (n_samples,), optional]:
        Standard deviation of predictive distribution at query points.
        Only returned when return_std is True.
    * `y_cov` [array, shape = (n_samples, n_samples), optional]:
        Covariance of joint predictive distribution a query points.
        Only returned when return_cov is True.
    * `y_mean_grad` [shape = (n_samples, n_features)]:
        The gradient of the predicted mean
    * `y_std_grad` [shape = (n_samples, n_features)]:
        The gradient of the predicted std.
    """
    if return_std and return_cov:
        raise RuntimeError(
            "Not returning standard deviation of predictions when "
            "returning full covariance.")
    if return_std_grad and not return_std:
        raise ValueError(
            "Not returning std_gradient without returning "
            "the std.")
    X = check_array(X)
    if X.shape[0] != 1 and (return_mean_grad or return_std_grad):
        raise ValueError("Not implemented for n_samples > 1")
    if not hasattr(self, "X_train_"):  # Not fit; predict based on GP prior
        y_mean = np.zeros(X.shape[0])
        if return_cov:
            y_cov = self.kernel(X)
            return y_mean, y_cov
        elif return_std:
            y_var = self.kernel.diag(X)
            return y_mean, np.sqrt(y_var)
        else:
            return y_mean
    else:  # Predict based on GP posterior
        K_trans = self.kernel_(X, self.X_train_)
        y_mean = K_trans.dot(self.alpha_)    # Line 4 (y_mean = f_star)
        y_mean = self.y_train_mean_ + y_mean  # undo normal.
        if return_cov:
            v = cho_solve((self.L_, True), K_trans.T)  # Line 5
            y_cov = self.kernel_(X) - K_trans.dot(v)   # Line 6
            return y_mean, y_cov
        elif return_std:
            K_inv = self.K_inv_
            # Compute variance of predictive distribution
            y_var = self.kernel_.diag(X)
            y_var -= np.einsum("ki,kj,ij->k", K_trans, K_trans, K_inv)
            # Check if any of the variances is negative because of
            # numerical issues. If yes: set the variance to 0.
            y_var_negative = y_var < 0
            if np.any(y_var_negative):
                warnings.warn("Predicted variances smaller than 0. "
                              "Setting those variances to 0.")
                y_var[y_var_negative] = 0.0
            y_std = np.sqrt(y_var)
        if return_mean_grad:
            grad = self.kernel_.gradient_x(X[0], self.X_train_)
            grad_mean = np.dot(grad.T, self.alpha_)
            if return_std_grad:
                grad_std = np.zeros(X.shape[1])
                if not np.allclose(y_std, grad_std):
                    grad_std = -np.dot(K_trans,
                                       np.dot(K_inv, grad))[0] / y_std
                return y_mean, y_std, grad_mean, grad_std
            if return_std:
                return y_mean, y_std, grad_mean
            else:
                return y_mean, grad_mean
        else:
            if return_std:
                return y_mean, y_std
            else:
                return y_mean

Instance variables

var noise

var rng

DEPRECATED: Attribute rng was deprecated in version 0.19 and will be removed in 0.21.

var y_train_mean

DEPRECATED: Attribute y_train_mean was deprecated in version 0.19 and will be removed in 0.21.

class GradientBoostingQuantileRegressor

Predict several quantiles with one estimator.

This is a wrapper around GradientBoostingRegressor's quantile regression that allows you to predict several quantiles in one go.

Parameters

  • quantiles [array-like]: Quantiles to predict. By default the 16, 50 and 84% quantiles are predicted.

  • base_estimator [GradientBoostingRegressor instance or None (default)]: Quantile regressor used to make predictions. Only instances of GradientBoostingRegressor are supported. Use this to change the hyper-parameters of the estimator.

  • n_jobs [int, default=1]: The number of jobs to run in parallel for fit. If -1, then the number of jobs is set to the number of cores.

  • random_state [int, RandomState instance, or None (default)]: Set random state to something other than None for reproducible results.

class GradientBoostingQuantileRegressor(BaseEstimator, RegressorMixin):
    """Predict several quantiles with one estimator.

    This is a wrapper around `GradientBoostingRegressor`'s quantile
    regression that allows you to predict several `quantiles` in
    one go.

    Parameters
    ----------
    * `quantiles` [array-like]:
        Quantiles to predict. By default the 16, 50 and 84%
        quantiles are predicted.

    * `base_estimator` [GradientBoostingRegressor instance or None (default)]:
        Quantile regressor used to make predictions. Only instances
        of `GradientBoostingRegressor` are supported. Use this to change
        the hyper-parameters of the estimator.

    * `n_jobs` [int, default=1]:
        The number of jobs to run in parallel for `fit`.
        If -1, then the number of jobs is set to the number of cores.

    * `random_state` [int, RandomState instance, or None (default)]:
        Set random state to something other than None for reproducible
        results.
    """

    def __init__(self, quantiles=[0.16, 0.5, 0.84], base_estimator=None,
                 n_jobs=1, random_state=None):
        self.quantiles = quantiles
        self.random_state = random_state
        self.base_estimator = base_estimator
        self.n_jobs = n_jobs

    def fit(self, X, y):
        """Fit one regressor for each quantile.

        Parameters
        ----------
        * `X` [array-like, shape=(n_samples, n_features)]:
            Training vectors, where `n_samples` is the number of samples
            and `n_features` is the number of features.

        * `y` [array-like, shape=(n_samples,)]:
            Target values (real numbers in regression)
        """
        rng = check_random_state(self.random_state)

        if self.base_estimator is None:
            base_estimator = GradientBoostingRegressor(loss='quantile')
        else:
            base_estimator = self.base_estimator

            if not isinstance(base_estimator, GradientBoostingRegressor):
                raise ValueError('base_estimator has to be of type'
                                 ' GradientBoostingRegressor.')

            if not base_estimator.loss == 'quantile':
                raise ValueError('base_estimator has to use quantile'
                                 ' loss not %s' % base_estimator.loss)

        # The predictions for different quantiles should be sorted.
        # Therefore each of the regressors need the same seed.
        base_estimator.set_params(random_state=rng)
        regressors = []
        for q in self.quantiles:
            regressor = clone(base_estimator)
            regressor.set_params(alpha=q)

            regressors.append(regressor)

        self.regressors_ = Parallel(n_jobs=self.n_jobs, backend='threading')(
            delayed(_parallel_fit)(regressor, X, y)
            for regressor in regressors)

        return self

    def predict(self, X, return_std=False, return_quantiles=False):
        """Predict.

        Predict `X` at every quantile if `return_std` is set to False.
        If `return_std` is set to True, then return the mean
        and the predicted standard deviation, which is approximated as
        the (0.84th quantile - 0.16th quantile) divided by 2.0

        Parameters
        ----------
        * `X` [array-like, shape=(n_samples, n_features)]:
            where `n_samples` is the number of samples
            and `n_features` is the number of features.
        """
        predicted_quantiles = np.asarray(
            [rgr.predict(X) for rgr in self.regressors_])
        if return_quantiles:
            return predicted_quantiles.T

        elif return_std:
            std_quantiles = [0.16, 0.5, 0.84]
            is_present_mask = np.in1d(std_quantiles, self.quantiles)
            if not np.all(is_present_mask):
                raise ValueError(
                    "return_std works only if the quantiles during "
                    "instantiation include 0.16, 0.5 and 0.84")
            low = self.regressors_[self.quantiles.index(0.16)].predict(X)
            high = self.regressors_[self.quantiles.index(0.84)].predict(X)
            mean = self.regressors_[self.quantiles.index(0.5)].predict(X)
            return mean, ((high - low) / 2.0)

        # return the mean
        return self.regressors_[self.quantiles.index(0.5)].predict(X)

Ancestors (in MRO)

Static methods

def __init__(

self, quantiles=[0.16, 0.5, 0.84], base_estimator=None, n_jobs=1, random_state=None)

Initialize self. See help(type(self)) for accurate signature.

def __init__(self, quantiles=[0.16, 0.5, 0.84], base_estimator=None,
             n_jobs=1, random_state=None):
    self.quantiles = quantiles
    self.random_state = random_state
    self.base_estimator = base_estimator
    self.n_jobs = n_jobs

def fit(

self, X, y)

Fit one regressor for each quantile.

Parameters

  • X [array-like, shape=(n_samples, n_features)]: Training vectors, where n_samples is the number of samples and n_features is the number of features.

  • y [array-like, shape=(n_samples,)]: Target values (real numbers in regression)

def fit(self, X, y):
    """Fit one regressor for each quantile.
    Parameters
    ----------
    * `X` [array-like, shape=(n_samples, n_features)]:
        Training vectors, where `n_samples` is the number of samples
        and `n_features` is the number of features.
    * `y` [array-like, shape=(n_samples,)]:
        Target values (real numbers in regression)
    """
    rng = check_random_state(self.random_state)
    if self.base_estimator is None:
        base_estimator = GradientBoostingRegressor(loss='quantile')
    else:
        base_estimator = self.base_estimator
        if not isinstance(base_estimator, GradientBoostingRegressor):
            raise ValueError('base_estimator has to be of type'
                             ' GradientBoostingRegressor.')
        if not base_estimator.loss == 'quantile':
            raise ValueError('base_estimator has to use quantile'
                             ' loss not %s' % base_estimator.loss)
    # The predictions for different quantiles should be sorted.
    # Therefore each of the regressors need the same seed.
    base_estimator.set_params(random_state=rng)
    regressors = []
    for q in self.quantiles:
        regressor = clone(base_estimator)
        regressor.set_params(alpha=q)
        regressors.append(regressor)
    self.regressors_ = Parallel(n_jobs=self.n_jobs, backend='threading')(
        delayed(_parallel_fit)(regressor, X, y)
        for regressor in regressors)
    return self

def predict(

self, X, return_std=False, return_quantiles=False)

Predict.

Predict X at every quantile if return_std is set to False. If return_std is set to True, then return the mean and the predicted standard deviation, which is approximated as the (0.84th quantile - 0.16th quantile) divided by 2.0

Parameters

  • X [array-like, shape=(n_samples, n_features)]: where n_samples is the number of samples and n_features is the number of features.
def predict(self, X, return_std=False, return_quantiles=False):
    """Predict.
    Predict `X` at every quantile if `return_std` is set to False.
    If `return_std` is set to True, then return the mean
    and the predicted standard deviation, which is approximated as
    the (0.84th quantile - 0.16th quantile) divided by 2.0
    Parameters
    ----------
    * `X` [array-like, shape=(n_samples, n_features)]:
        where `n_samples` is the number of samples
        and `n_features` is the number of features.
    """
    predicted_quantiles = np.asarray(
        [rgr.predict(X) for rgr in self.regressors_])
    if return_quantiles:
        return predicted_quantiles.T
    elif return_std:
        std_quantiles = [0.16, 0.5, 0.84]
        is_present_mask = np.in1d(std_quantiles, self.quantiles)
        if not np.all(is_present_mask):
            raise ValueError(
                "return_std works only if the quantiles during "
                "instantiation include 0.16, 0.5 and 0.84")
        low = self.regressors_[self.quantiles.index(0.16)].predict(X)
        high = self.regressors_[self.quantiles.index(0.84)].predict(X)
        mean = self.regressors_[self.quantiles.index(0.5)].predict(X)
        return mean, ((high - low) / 2.0)
    # return the mean
    return self.regressors_[self.quantiles.index(0.5)].predict(X)

Instance variables

var base_estimator

var n_jobs

var quantiles

var random_state

class RandomForestRegressor

RandomForestRegressor that supports conditional std computation.

Parameters

n_estimators : integer, optional (default=10) The number of trees in the forest.

criterion : string, optional (default="mse") The function to measure the quality of a split. Supported criteria are "mse" for the mean squared error, which is equal to variance reduction as feature selection criterion, and "mae" for the mean absolute error.

max_features : int, float, string or None, optional (default="auto") The number of features to consider when looking for the best split: - If int, then consider max_features features at each split. - If float, then max_features is a percentage and int(max_features * n_features) features are considered at each split. - If "auto", then max_features=n_features. - If "sqrt", then max_features=sqrt(n_features). - If "log2", then max_features=log2(n_features). - If None, then max_features=n_features. Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

max_depth : integer or None, optional (default=None) The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_split : int, float, optional (default=2) The minimum number of samples required to split an internal node: - If int, then consider min_samples_split as the minimum number. - If float, then min_samples_split is a percentage and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

min_samples_leaf : int, float, optional (default=1) The minimum number of samples required to be at a leaf node: - If int, then consider min_samples_leaf as the minimum number. - If float, then min_samples_leaf is a percentage and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

min_weight_fraction_leaf : float, optional (default=0.) The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

max_leaf_nodes : int or None, optional (default=None) Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

min_impurity_decrease : float, optional (default=0.) A node will be split if this split induces a decrease of the impurity greater than or equal to this value. The weighted impurity decrease equation is the following:: N_t / N * (impurity - N_t_R / N_t * right_impurity - N_t_L / N_t * left_impurity) where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child. N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

bootstrap : boolean, optional (default=True) Whether bootstrap samples are used when building trees.

oob_score : bool, optional (default=False) whether to use out-of-bag samples to estimate the R^2 on unseen data.

n_jobs : integer, optional (default=1) The number of jobs to run in parallel for both fit and predict. If -1, then the number of jobs is set to the number of cores.

random_state : int, RandomState instance or None, optional (default=None) If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

verbose : int, optional (default=0) Controls the verbosity of the tree building process.

warm_start : bool, optional (default=False) When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest.

Attributes

estimators_ : list of DecisionTreeRegressor The collection of fitted sub-estimators.

feature_importances_ : array of shape = [n_features] The feature importances (the higher, the more important the feature).

n_features_ : int The number of features when fit is performed.

n_outputs_ : int The number of outputs when fit is performed.

oob_score_ : float Score of the training dataset obtained using an out-of-bag estimate.

oob_prediction_ : array of shape = [n_samples] Prediction computed with out-of-bag estimate on the training set.

Notes

The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values. The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data, max_features=n_features and bootstrap=False, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a deterministic behaviour during fitting, random_state has to be fixed.

References

.. [1] L. Breiman, "Random Forests", Machine Learning, 45(1), 5-32, 2001.

class RandomForestRegressor(_sk_RandomForestRegressor):
    """
    RandomForestRegressor that supports conditional std computation.

    Parameters
    ----------
    n_estimators : integer, optional (default=10)
        The number of trees in the forest.

    criterion : string, optional (default="mse")
        The function to measure the quality of a split. Supported criteria
        are "mse" for the mean squared error, which is equal to variance
        reduction as feature selection criterion, and "mae" for the mean
        absolute error.

    max_features : int, float, string or None, optional (default="auto")
        The number of features to consider when looking for the best split:
        - If int, then consider `max_features` features at each split.
        - If float, then `max_features` is a percentage and
          `int(max_features * n_features)` features are considered at each
          split.
        - If "auto", then `max_features=n_features`.
        - If "sqrt", then `max_features=sqrt(n_features)`.
        - If "log2", then `max_features=log2(n_features)`.
        - If None, then `max_features=n_features`.
        Note: the search for a split does not stop until at least one
        valid partition of the node samples is found, even if it requires to
        effectively inspect more than ``max_features`` features.

    max_depth : integer or None, optional (default=None)
        The maximum depth of the tree. If None, then nodes are expanded until
        all leaves are pure or until all leaves contain less than
        min_samples_split samples.

    min_samples_split : int, float, optional (default=2)
        The minimum number of samples required to split an internal node:
        - If int, then consider `min_samples_split` as the minimum number.
        - If float, then `min_samples_split` is a percentage and
          `ceil(min_samples_split * n_samples)` are the minimum
          number of samples for each split.

    min_samples_leaf : int, float, optional (default=1)
        The minimum number of samples required to be at a leaf node:
        - If int, then consider `min_samples_leaf` as the minimum number.
        - If float, then `min_samples_leaf` is a percentage and
          `ceil(min_samples_leaf * n_samples)` are the minimum
          number of samples for each node.

    min_weight_fraction_leaf : float, optional (default=0.)
        The minimum weighted fraction of the sum total of weights (of all
        the input samples) required to be at a leaf node. Samples have
        equal weight when sample_weight is not provided.

    max_leaf_nodes : int or None, optional (default=None)
        Grow trees with ``max_leaf_nodes`` in best-first fashion.
        Best nodes are defined as relative reduction in impurity.
        If None then unlimited number of leaf nodes.

    min_impurity_decrease : float, optional (default=0.)
        A node will be split if this split induces a decrease of the impurity
        greater than or equal to this value.
        The weighted impurity decrease equation is the following::
            N_t / N * (impurity - N_t_R / N_t * right_impurity
                                - N_t_L / N_t * left_impurity)
        where ``N`` is the total number of samples, ``N_t`` is the number of
        samples at the current node, ``N_t_L`` is the number of samples in the
        left child, and ``N_t_R`` is the number of samples in the right child.
        ``N``, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum,
        if ``sample_weight`` is passed.

    bootstrap : boolean, optional (default=True)
        Whether bootstrap samples are used when building trees.

    oob_score : bool, optional (default=False)
        whether to use out-of-bag samples to estimate
        the R^2 on unseen data.

    n_jobs : integer, optional (default=1)
        The number of jobs to run in parallel for both `fit` and `predict`.
        If -1, then the number of jobs is set to the number of cores.

    random_state : int, RandomState instance or None, optional (default=None)
        If int, random_state is the seed used by the random number generator;
        If RandomState instance, random_state is the random number generator;
        If None, the random number generator is the RandomState instance used
        by `np.random`.

    verbose : int, optional (default=0)
        Controls the verbosity of the tree building process.

    warm_start : bool, optional (default=False)
        When set to ``True``, reuse the solution of the previous call to fit
        and add more estimators to the ensemble, otherwise, just fit a whole
        new forest.

    Attributes
    ----------
    estimators_ : list of DecisionTreeRegressor
        The collection of fitted sub-estimators.

    feature_importances_ : array of shape = [n_features]
        The feature importances (the higher, the more important the feature).

    n_features_ : int
        The number of features when ``fit`` is performed.

    n_outputs_ : int
        The number of outputs when ``fit`` is performed.

    oob_score_ : float
        Score of the training dataset obtained using an out-of-bag estimate.

    oob_prediction_ : array of shape = [n_samples]
        Prediction computed with out-of-bag estimate on the training set.

    Notes
    -----
    The default values for the parameters controlling the size of the trees
    (e.g. ``max_depth``, ``min_samples_leaf``, etc.) lead to fully grown and
    unpruned trees which can potentially be very large on some data sets. To
    reduce memory consumption, the complexity and size of the trees should be
    controlled by setting those parameter values.
    The features are always randomly permuted at each split. Therefore,
    the best found split may vary, even with the same training data,
    ``max_features=n_features`` and ``bootstrap=False``, if the improvement
    of the criterion is identical for several splits enumerated during the
    search of the best split. To obtain a deterministic behaviour during
    fitting, ``random_state`` has to be fixed.

    References
    ----------
    .. [1] L. Breiman, "Random Forests", Machine Learning, 45(1), 5-32, 2001.
    """
    def __init__(self, n_estimators=10, criterion='mse', max_depth=None,
                 min_samples_split=2, min_samples_leaf=1,
                 min_weight_fraction_leaf=0.0, max_features='auto',
                 max_leaf_nodes=None, bootstrap=True, oob_score=False,
                 n_jobs=1, random_state=None, verbose=0, warm_start=False,
                 min_variance=0.0):
        self.min_variance = min_variance
        super(RandomForestRegressor, self).__init__(
            n_estimators=n_estimators, criterion=criterion,
            max_depth=max_depth,
            min_samples_split=min_samples_split,
            min_samples_leaf=min_samples_leaf,
            min_weight_fraction_leaf=min_weight_fraction_leaf,
            max_features=max_features, max_leaf_nodes=max_leaf_nodes,
            bootstrap=bootstrap, oob_score=oob_score,
            n_jobs=n_jobs, random_state=random_state,
            verbose=verbose, warm_start=warm_start)

    def predict(self, X, return_std=False):
        """Predict continuous output for X.

        Parameters
        ----------
        X : array of shape = (n_samples, n_features)
            Input data.

        return_std : boolean
            Whether or not to return the standard deviation.

        Returns
        -------
        predictions : array-like of shape = (n_samples,)
            Predicted values for X. If criterion is set to "mse",
            then `predictions[i] ~= mean(y | X[i])`.

        std : array-like of shape=(n_samples,)
            Standard deviation of `y` at `X`. If criterion
            is set to "mse", then `std[i] ~= std(y | X[i])`.
        """
        mean = super(RandomForestRegressor, self).predict(X)

        if return_std:
            if self.criterion != "mse":
                raise ValueError(
                    "Expected impurity to be 'mse', got %s instead"
                    % self.criterion)
            std = _return_std(X, self.estimators_, mean, self.min_variance)
            return mean, std
        return mean

Ancestors (in MRO)

  • RandomForestRegressor
  • sklearn.ensemble.forest.RandomForestRegressor
  • sklearn.ensemble.forest.ForestRegressor
  • abc.NewBase
  • sklearn.ensemble.forest.BaseForest
  • abc.NewBase
  • sklearn.ensemble.base.BaseEnsemble
  • abc.NewBase
  • sklearn.base.BaseEstimator
  • sklearn.base.MetaEstimatorMixin
  • sklearn.base.RegressorMixin
  • builtins.object

Instance variables

var feature_importances_

Return the feature importances (the higher, the more important the feature).

Returns

feature_importances_ : array, shape = [n_features]

var min_variance