Top

# skopt.learning module

Machine learning extensions for model-based optimization.

"""Machine learning extensions for model-based optimization."""

from .forest import RandomForestRegressor
from .forest import ExtraTreesRegressor
from .gaussian_process import GaussianProcessRegressor

__all__ = ("RandomForestRegressor",
"ExtraTreesRegressor",
"GaussianProcessRegressor")


## Classes

class ExtraTreesRegressor

ExtraTreesRegressor that supports conditional standard deviation.

## Parameters

n_estimators : integer, optional (default=10) The number of trees in the forest.

criterion : string, optional (default="mse") The function to measure the quality of a split. Supported criteria are "mse" for the mean squared error, which is equal to variance reduction as feature selection criterion, and "mae" for the mean absolute error.

max_features : int, float, string or None, optional (default="auto") The number of features to consider when looking for the best split: - If int, then consider max_features features at each split. - If float, then max_features is a percentage and int(max_features * n_features) features are considered at each split. - If "auto", then max_features=n_features. - If "sqrt", then max_features=sqrt(n_features). - If "log2", then max_features=log2(n_features). - If None, then max_features=n_features. Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

max_depth : integer or None, optional (default=None) The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_split : int, float, optional (default=2) The minimum number of samples required to split an internal node: - If int, then consider min_samples_split as the minimum number. - If float, then min_samples_split is a percentage and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

min_samples_leaf : int, float, optional (default=1) The minimum number of samples required to be at a leaf node: - If int, then consider min_samples_leaf as the minimum number. - If float, then min_samples_leaf is a percentage and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

min_weight_fraction_leaf : float, optional (default=0.) The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

max_leaf_nodes : int or None, optional (default=None) Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

min_impurity_decrease : float, optional (default=0.) A node will be split if this split induces a decrease of the impurity greater than or equal to this value. The weighted impurity decrease equation is the following:: N_t / N * (impurity - N_t_R / N_t * right_impurity - N_t_L / N_t * left_impurity) where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child. N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

bootstrap : boolean, optional (default=True) Whether bootstrap samples are used when building trees.

oob_score : bool, optional (default=False) whether to use out-of-bag samples to estimate the R^2 on unseen data.

n_jobs : integer, optional (default=1) The number of jobs to run in parallel for both fit and predict. If -1, then the number of jobs is set to the number of cores.

random_state : int, RandomState instance or None, optional (default=None) If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

verbose : int, optional (default=0) Controls the verbosity of the tree building process.

warm_start : bool, optional (default=False) When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest.

## Attributes

estimators_ : list of DecisionTreeRegressor The collection of fitted sub-estimators.

feature_importances_ : array of shape = [n_features] The feature importances (the higher, the more important the feature).

n_features_ : int The number of features when fit is performed.

n_outputs_ : int The number of outputs when fit is performed.

oob_score_ : float Score of the training dataset obtained using an out-of-bag estimate.

oob_prediction_ : array of shape = [n_samples] Prediction computed with out-of-bag estimate on the training set.

## Notes

The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values. The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data, max_features=n_features and bootstrap=False, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a deterministic behaviour during fitting, random_state has to be fixed.

## References

.. [1] L. Breiman, "Random Forests", Machine Learning, 45(1), 5-32, 2001.

class ExtraTreesRegressor(_sk_ExtraTreesRegressor):
"""
ExtraTreesRegressor that supports conditional standard deviation.

Parameters
----------
n_estimators : integer, optional (default=10)
The number of trees in the forest.

criterion : string, optional (default="mse")
The function to measure the quality of a split. Supported criteria
are "mse" for the mean squared error, which is equal to variance
reduction as feature selection criterion, and "mae" for the mean
absolute error.

max_features : int, float, string or None, optional (default="auto")
The number of features to consider when looking for the best split:
- If int, then consider max_features features at each split.
- If float, then max_features is a percentage and
int(max_features * n_features) features are considered at each
split.
- If "auto", then max_features=n_features.
- If "sqrt", then max_features=sqrt(n_features).
- If "log2", then max_features=log2(n_features).
- If None, then max_features=n_features.
Note: the search for a split does not stop until at least one
valid partition of the node samples is found, even if it requires to
effectively inspect more than max_features features.

max_depth : integer or None, optional (default=None)
The maximum depth of the tree. If None, then nodes are expanded until
all leaves are pure or until all leaves contain less than
min_samples_split samples.

min_samples_split : int, float, optional (default=2)
The minimum number of samples required to split an internal node:
- If int, then consider min_samples_split as the minimum number.
- If float, then min_samples_split is a percentage and
ceil(min_samples_split * n_samples) are the minimum
number of samples for each split.

min_samples_leaf : int, float, optional (default=1)
The minimum number of samples required to be at a leaf node:
- If int, then consider min_samples_leaf as the minimum number.
- If float, then min_samples_leaf is a percentage and
ceil(min_samples_leaf * n_samples) are the minimum
number of samples for each node.

min_weight_fraction_leaf : float, optional (default=0.)
The minimum weighted fraction of the sum total of weights (of all
the input samples) required to be at a leaf node. Samples have
equal weight when sample_weight is not provided.

max_leaf_nodes : int or None, optional (default=None)
Grow trees with max_leaf_nodes in best-first fashion.
Best nodes are defined as relative reduction in impurity.
If None then unlimited number of leaf nodes.

min_impurity_decrease : float, optional (default=0.)
A node will be split if this split induces a decrease of the impurity
greater than or equal to this value.
The weighted impurity decrease equation is the following::
N_t / N * (impurity - N_t_R / N_t * right_impurity
- N_t_L / N_t * left_impurity)
where N is the total number of samples, N_t is the number of
samples at the current node, N_t_L is the number of samples in the
left child, and N_t_R is the number of samples in the right child.
N, N_t, N_t_R and N_t_L all refer to the weighted sum,
if sample_weight is passed.

bootstrap : boolean, optional (default=True)
Whether bootstrap samples are used when building trees.

oob_score : bool, optional (default=False)
whether to use out-of-bag samples to estimate
the R^2 on unseen data.

n_jobs : integer, optional (default=1)
The number of jobs to run in parallel for both fit and predict.
If -1, then the number of jobs is set to the number of cores.

random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by np.random.

verbose : int, optional (default=0)
Controls the verbosity of the tree building process.

warm_start : bool, optional (default=False)
When set to True, reuse the solution of the previous call to fit
and add more estimators to the ensemble, otherwise, just fit a whole
new forest.

Attributes
----------
estimators_ : list of DecisionTreeRegressor
The collection of fitted sub-estimators.

feature_importances_ : array of shape = [n_features]
The feature importances (the higher, the more important the feature).

n_features_ : int
The number of features when fit is performed.

n_outputs_ : int
The number of outputs when fit is performed.

oob_score_ : float
Score of the training dataset obtained using an out-of-bag estimate.

oob_prediction_ : array of shape = [n_samples]
Prediction computed with out-of-bag estimate on the training set.

Notes
-----
The default values for the parameters controlling the size of the trees
(e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and
unpruned trees which can potentially be very large on some data sets. To
reduce memory consumption, the complexity and size of the trees should be
controlled by setting those parameter values.
The features are always randomly permuted at each split. Therefore,
the best found split may vary, even with the same training data,
max_features=n_features and bootstrap=False, if the improvement
of the criterion is identical for several splits enumerated during the
search of the best split. To obtain a deterministic behaviour during
fitting, random_state has to be fixed.

References
----------
.. [1] L. Breiman, "Random Forests", Machine Learning, 45(1), 5-32, 2001.
"""
def __init__(self, n_estimators=10, criterion='mse', max_depth=None,
min_samples_split=2, min_samples_leaf=1,
min_weight_fraction_leaf=0.0, max_features='auto',
max_leaf_nodes=None, bootstrap=False, oob_score=False,
n_jobs=1, random_state=None, verbose=0, warm_start=False,
min_variance=0.0):
self.min_variance = min_variance
super(ExtraTreesRegressor, self).__init__(
n_estimators=n_estimators, criterion=criterion,
max_depth=max_depth,
min_samples_split=min_samples_split,
min_samples_leaf=min_samples_leaf,
min_weight_fraction_leaf=min_weight_fraction_leaf,
max_features=max_features, max_leaf_nodes=max_leaf_nodes,
bootstrap=bootstrap, oob_score=oob_score,
n_jobs=n_jobs, random_state=random_state,
verbose=verbose, warm_start=warm_start)

def predict(self, X, return_std=False):
"""
Predict continuous output for X.

Parameters
----------
X : array-like of shape=(n_samples, n_features)
Input data.

return_std : boolean
Whether or not to return the standard deviation.

Returns
-------
predictions : array-like of shape=(n_samples,)
Predicted values for X. If criterion is set to "mse",
then predictions[i] ~= mean(y | X[i]).

std : array-like of shape=(n_samples,)
Standard deviation of y at X. If criterion
is set to "mse", then std[i] ~= std(y | X[i]).
"""
mean = super(ExtraTreesRegressor, self).predict(X)

if return_std:
if self.criterion != "mse":
raise ValueError(
"Expected impurity to be 'mse', got %s instead"
% self.criterion)
std = _return_std(X, self.estimators_, mean, self.min_variance)
return mean, std

return mean


### Ancestors (in MRO)

• ExtraTreesRegressor
• sklearn.ensemble.forest.ExtraTreesRegressor
• sklearn.ensemble.forest.ForestRegressor
• abc.NewBase
• sklearn.ensemble.forest.BaseForest
• abc.NewBase
• sklearn.ensemble.base.BaseEnsemble
• abc.NewBase
• sklearn.base.BaseEstimator
• sklearn.base.MetaEstimatorMixin
• sklearn.base.RegressorMixin
• builtins.object

### Static methods

def __init__(

self, n_estimators=10, criterion='mse', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=False, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, min_variance=0.0)

Initialize self. See help(type(self)) for accurate signature.

def __init__(self, n_estimators=10, criterion='mse', max_depth=None,
min_samples_split=2, min_samples_leaf=1,
min_weight_fraction_leaf=0.0, max_features='auto',
max_leaf_nodes=None, bootstrap=False, oob_score=False,
n_jobs=1, random_state=None, verbose=0, warm_start=False,
min_variance=0.0):
self.min_variance = min_variance
super(ExtraTreesRegressor, self).__init__(
n_estimators=n_estimators, criterion=criterion,
max_depth=max_depth,
min_samples_split=min_samples_split,
min_samples_leaf=min_samples_leaf,
min_weight_fraction_leaf=min_weight_fraction_leaf,
max_features=max_features, max_leaf_nodes=max_leaf_nodes,
bootstrap=bootstrap, oob_score=oob_score,
n_jobs=n_jobs, random_state=random_state,
verbose=verbose, warm_start=warm_start)


def predict(

self, X, return_std=False)

Predict continuous output for X.

## Parameters

X : array-like of shape=(n_samples, n_features) Input data.

return_std : boolean Whether or not to return the standard deviation.

## Returns

predictions : array-like of shape=(n_samples,) Predicted values for X. If criterion is set to "mse", then predictions[i] ~= mean(y | X[i]).

std : array-like of shape=(n_samples,) Standard deviation of y at X. If criterion is set to "mse", then std[i] ~= std(y | X[i]).

def predict(self, X, return_std=False):
"""
Predict continuous output for X.
Parameters
----------
X : array-like of shape=(n_samples, n_features)
Input data.
return_std : boolean
Whether or not to return the standard deviation.
Returns
-------
predictions : array-like of shape=(n_samples,)
Predicted values for X. If criterion is set to "mse",
then predictions[i] ~= mean(y | X[i]).
std : array-like of shape=(n_samples,)
Standard deviation of y at X. If criterion
is set to "mse", then std[i] ~= std(y | X[i]).
"""
mean = super(ExtraTreesRegressor, self).predict(X)
if return_std:
if self.criterion != "mse":
raise ValueError(
"Expected impurity to be 'mse', got %s instead"
% self.criterion)
std = _return_std(X, self.estimators_, mean, self.min_variance)
return mean, std
return mean


### Instance variables

var feature_importances_

Return the feature importances (the higher, the more important the feature).

## Returns

feature_importances_ : array, shape = [n_features]

var min_variance

class GaussianProcessRegressor

GaussianProcessRegressor that allows noise tunability.

The implementation is based on Algorithm 2.1 of Gaussian Processes for Machine Learning (GPML) by Rasmussen and Williams.

In addition to standard scikit-learn estimator API, GaussianProcessRegressor:

• allows prediction without prior fitting (based on the GP prior);
• provides an additional method sample_y(X), which evaluates samples drawn from the GPR (prior or posterior) at given inputs;
• exposes a method log_marginal_likelihood(theta), which can be used externally for other ways of selecting hyperparameters, e.g., via Markov chain Monte Carlo.

## Parameters

• kernel [kernel object]: The kernel specifying the covariance function of the GP. If None is passed, the kernel "1.0 * RBF(1.0)" is used as default. Note that the kernel's hyperparameters are optimized during fitting.

• alpha [float or array-like, optional (default: 1e-10)]: Value added to the diagonal of the kernel matrix during fitting. Larger values correspond to increased noise level in the observations and reduce potential numerical issue during fitting. If an array is passed, it must have the same number of entries as the data used for fitting and is used as datapoint-dependent noise level. Note that this is equivalent to adding a WhiteKernel with c=alpha. Allowing to specify the noise level directly as a parameter is mainly for convenience and for consistency with Ridge.

• optimizer [string or callable, optional (default: "fmin_l_bfgs_b")]: Can either be one of the internally supported optimizers for optimizing the kernel's parameters, specified by a string, or an externally defined optimizer passed as a callable. If a callable is passed, it must have the signature::

def optimizer(obj_func, initial_theta, bounds):
# * 'obj_func' is the objective function to be maximized, which
#   takes the hyperparameters theta as parameter and an
#   optional flag eval_gradient, which determines if the
# * 'initial_theta': the initial value for theta, which can be
#   used by local optimizers
# * 'bounds': the bounds on the values of theta
....
# Returned are the best found hyperparameters theta and
# the corresponding value of the target function.
return theta_opt, func_min


Per default, the 'fmin_l_bfgs_b' algorithm from scipy.optimize is used. If None is passed, the kernel's parameters are kept fixed. Available internal optimizers are::

'fmin_l_bfgs_b'

• n_restarts_optimizer [int, optional (default: 0)]: The number of restarts of the optimizer for finding the kernel's parameters which maximize the log-marginal likelihood. The first run of the optimizer is performed from the kernel's initial parameters, the remaining ones (if any) from thetas sampled log-uniform randomly from the space of allowed theta-values. If greater than 0, all bounds must be finite. Note that n_restarts_optimizer == 0 implies that one run is performed.

• normalize_y [boolean, optional (default: False)]: Whether the target values y are normalized, i.e., the mean of the observed target values become zero. This parameter should be set to True if the target values' mean is expected to differ considerable from zero. When enabled, the normalization effectively modifies the GP's prior based on the data, which contradicts the likelihood principle; normalization is thus disabled per default.

• copy_X_train [bool, optional (default: True)]: If True, a persistent copy of the training data is stored in the object. Otherwise, just a reference to the training data is stored, which might cause predictions to change if the data is modified externally.

• random_state [integer or numpy.RandomState, optional]: The generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator.

• noise [string, "gaussian", optional]: If set to "gaussian", then it is assumed that y is a noisy estimate of f(x) where the noise is gaussian.

## Attributes

• X_train_ [array-like, shape = (n_samples, n_features)]: Feature values in training data (also required for prediction)

• y_train_ [array-like, shape = (n_samples, [n_output_dims])]: Target values in training data (also required for prediction)

• kernel_ [kernel object]: The kernel used for prediction. The structure of the kernel is the same as the one passed as parameter but with optimized hyperparameters

• L_ [array-like, shape = (n_samples, n_samples)]: Lower-triangular Cholesky decomposition of the kernel in X_train_

• alpha_ [array-like, shape = (n_samples,)]: Dual coefficients of training data points in kernel space

• log_marginal_likelihood_value_ [float]: The log-marginal-likelihood of self.kernel_.theta

• noise_ [float]: Estimate of the gaussian noise. Useful only when noise is set to "gaussian".

class GaussianProcessRegressor(sk_GaussianProcessRegressor):
"""
GaussianProcessRegressor that allows noise tunability.

The implementation is based on Algorithm 2.1 of Gaussian Processes
for Machine Learning (GPML) by Rasmussen and Williams.

In addition to standard scikit-learn estimator API,
GaussianProcessRegressor:

* allows prediction without prior fitting (based on the GP prior);
* provides an additional method sample_y(X), which evaluates samples
drawn from the GPR (prior or posterior) at given inputs;
* exposes a method log_marginal_likelihood(theta), which can be used
externally for other ways of selecting hyperparameters, e.g., via
Markov chain Monte Carlo.

Parameters
----------
* kernel [kernel object]:
The kernel specifying the covariance function of the GP. If None is
passed, the kernel "1.0 * RBF(1.0)" is used as default. Note that
the kernel's hyperparameters are optimized during fitting.

* alpha [float or array-like, optional (default: 1e-10)]:
Value added to the diagonal of the kernel matrix during fitting.
Larger values correspond to increased noise level in the observations
and reduce potential numerical issue during fitting. If an array is
passed, it must have the same number of entries as the data used for
fitting and is used as datapoint-dependent noise level. Note that this
is equivalent to adding a WhiteKernel with c=alpha. Allowing to specify
the noise level directly as a parameter is mainly for convenience and
for consistency with Ridge.

* optimizer [string or callable, optional (default: "fmin_l_bfgs_b")]:
Can either be one of the internally supported optimizers for optimizing
the kernel's parameters, specified by a string, or an externally
defined optimizer passed as a callable. If a callable is passed, it
must have the signature::

def optimizer(obj_func, initial_theta, bounds):
# * 'obj_func' is the objective function to be maximized, which
#   takes the hyperparameters theta as parameter and an
#   optional flag eval_gradient, which determines if the
# * 'initial_theta': the initial value for theta, which can be
#   used by local optimizers
# * 'bounds': the bounds on the values of theta
....
# Returned are the best found hyperparameters theta and
# the corresponding value of the target function.
return theta_opt, func_min

Per default, the 'fmin_l_bfgs_b' algorithm from scipy.optimize
is used. If None is passed, the kernel's parameters are kept fixed.
Available internal optimizers are::

'fmin_l_bfgs_b'

* n_restarts_optimizer [int, optional (default: 0)]:
The number of restarts of the optimizer for finding the kernel's
parameters which maximize the log-marginal likelihood. The first run
of the optimizer is performed from the kernel's initial parameters,
the remaining ones (if any) from thetas sampled log-uniform randomly
from the space of allowed theta-values. If greater than 0, all bounds
must be finite. Note that n_restarts_optimizer == 0 implies that one
run is performed.

* normalize_y [boolean, optional (default: False)]:
Whether the target values y are normalized, i.e., the mean of the
observed target values become zero. This parameter should be set to
True if the target values' mean is expected to differ considerable from
zero. When enabled, the normalization effectively modifies the GP's
prior based on the data, which contradicts the likelihood principle;
normalization is thus disabled per default.

* copy_X_train [bool, optional (default: True)]:
If True, a persistent copy of the training data is stored in the
object. Otherwise, just a reference to the training data is stored,
which might cause predictions to change if the data is modified
externally.

* random_state [integer or numpy.RandomState, optional]:
The generator used to initialize the centers. If an integer is
given, it fixes the seed. Defaults to the global numpy random
number generator.

* noise [string, "gaussian", optional]:
If set to "gaussian", then it is assumed that y is a noisy
estimate of f(x) where the noise is gaussian.

Attributes
----------
* X_train_ [array-like, shape = (n_samples, n_features)]:
Feature values in training data (also required for prediction)

* y_train_ [array-like, shape = (n_samples, [n_output_dims])]:
Target values in training data (also required for prediction)

* kernel_ [kernel object]:
The kernel used for prediction. The structure of the kernel is the
same as the one passed as parameter but with optimized hyperparameters

* L_ [array-like, shape = (n_samples, n_samples)]:
Lower-triangular Cholesky decomposition of the kernel in X_train_

* alpha_ [array-like, shape = (n_samples,)]:
Dual coefficients of training data points in kernel space

* log_marginal_likelihood_value_ [float]:
The log-marginal-likelihood of self.kernel_.theta

* noise_ [float]:
Estimate of the gaussian noise. Useful only when noise is set to
"gaussian".
"""
def __init__(self, kernel=None, alpha=1e-10,
optimizer="fmin_l_bfgs_b", n_restarts_optimizer=0,
normalize_y=False, copy_X_train=True, random_state=None,
noise=None):
self.noise = noise
super(GaussianProcessRegressor, self).__init__(
kernel=kernel, alpha=alpha, optimizer=optimizer,
n_restarts_optimizer=n_restarts_optimizer,
normalize_y=normalize_y, copy_X_train=copy_X_train,
random_state=random_state)

def fit(self, X, y):
"""Fit Gaussian process regression model.

Parameters
----------
* X [array-like, shape = (n_samples, n_features)]:
Training data

* y [array-like, shape = (n_samples, [n_output_dims])]:
Target values

Returns
-------
* self:
Returns an instance of self.
"""
if isinstance(self.noise, str) and self.noise != "gaussian":
raise ValueError("expected noise to be 'gaussian', got %s"
% self.noise)

if self.kernel is None:
self.kernel = ConstantKernel(1.0, constant_value_bounds="fixed") \
* RBF(1.0, length_scale_bounds="fixed")
if self.noise == "gaussian":
self.kernel = self.kernel + WhiteKernel()
elif self.noise:
self.kernel = self.kernel + WhiteKernel(
noise_level=self.noise, noise_level_bounds="fixed"
)
super(GaussianProcessRegressor, self).fit(X, y)

self.noise_ = None

if self.noise:
# The noise component of this kernel should be set to zero
# while estimating K(X_test, X_test)
# Note that the term K(X, X) should include the noise but
# this (K(X, X))^-1y is precomputed as the attribute alpha_.
# (Notice the underscore).
# This has been described in Eq 2.24 of
# http://www.gaussianprocess.org/gpml/chapters/RW2.pdf
# Hence this hack
if isinstance(self.kernel_, WhiteKernel):
self.kernel_.set_params(noise_level=0.0)

else:
white_present, white_param = _param_for_white_kernel_in_Sum(
self.kernel_)

# This should always be true. Just in case.
if white_present:
noise_kernel = self.kernel_.get_params()[white_param]
self.noise_ = noise_kernel.noise_level
self.kernel_.set_params(
**{white_param: WhiteKernel(noise_level=0.0)})

# Precompute arrays needed at prediction
L_inv = solve_triangular(self.L_.T, np.eye(self.L_.shape[0]))
self.K_inv_ = L_inv.dot(L_inv.T)

# Fix deprecation warning #462
if int(sklearn.__version__[2:4]) >= 19:
self.y_train_mean_ = self._y_train_mean
else:
self.y_train_mean_ = self.y_train_mean

return self

def predict(self, X, return_std=False, return_cov=False,
"""
Predict output for X.

In addition to the mean of the predictive distribution, also its
standard deviation (return_std=True) or covariance (return_cov=True),
the gradient of the mean and the standard-deviation with respect to X
can be optionally provided.

Parameters
----------
* X [array-like, shape = (n_samples, n_features)]:
Query points where the GP is evaluated.

* return_std [bool, default: False]:
If True, the standard-deviation of the predictive distribution at
the query points is returned along with the mean.

* return_cov [bool, default: False]:
If True, the covariance of the joint predictive distribution at
the query points is returned along with the mean.

* return_mean_grad [bool, default: False]:
Whether or not to return the gradient of the mean.
Only valid when X is a single point.

* return_std_grad [bool, default: False]:
Whether or not to return the gradient of the std.
Only valid when X is a single point.

Returns
-------
* y_mean [array, shape = (n_samples, [n_output_dims]):
Mean of predictive distribution a query points

* y_std [array, shape = (n_samples,), optional]:
Standard deviation of predictive distribution at query points.
Only returned when return_std is True.

* y_cov [array, shape = (n_samples, n_samples), optional]:
Covariance of joint predictive distribution a query points.
Only returned when return_cov is True.

* y_mean_grad [shape = (n_samples, n_features)]:
The gradient of the predicted mean

* y_std_grad [shape = (n_samples, n_features)]:
The gradient of the predicted std.
"""
if return_std and return_cov:
raise RuntimeError(
"Not returning standard deviation of predictions when "
"returning full covariance.")

raise ValueError(
"Not returning std_gradient without returning "
"the std.")

X = check_array(X)
raise ValueError("Not implemented for n_samples > 1")

if not hasattr(self, "X_train_"):  # Not fit; predict based on GP prior
y_mean = np.zeros(X.shape[0])
if return_cov:
y_cov = self.kernel(X)
return y_mean, y_cov
elif return_std:
y_var = self.kernel.diag(X)
return y_mean, np.sqrt(y_var)
else:
return y_mean

else:  # Predict based on GP posterior
K_trans = self.kernel_(X, self.X_train_)
y_mean = K_trans.dot(self.alpha_)    # Line 4 (y_mean = f_star)
y_mean = self.y_train_mean_ + y_mean  # undo normal.

if return_cov:
v = cho_solve((self.L_, True), K_trans.T)  # Line 5
y_cov = self.kernel_(X) - K_trans.dot(v)   # Line 6
return y_mean, y_cov

elif return_std:
K_inv = self.K_inv_

# Compute variance of predictive distribution
y_var = self.kernel_.diag(X)
y_var -= np.einsum("ki,kj,ij->k", K_trans, K_trans, K_inv)

# Check if any of the variances is negative because of
# numerical issues. If yes: set the variance to 0.
y_var_negative = y_var < 0
if np.any(y_var_negative):
warnings.warn("Predicted variances smaller than 0. "
"Setting those variances to 0.")
y_var[y_var_negative] = 0.0
y_std = np.sqrt(y_var)

if return_std:
else:

else:
if return_std:
return y_mean, y_std
else:
return y_mean


### Ancestors (in MRO)

• GaussianProcessRegressor
• sklearn.gaussian_process.gpr.GaussianProcessRegressor
• sklearn.base.BaseEstimator
• sklearn.base.RegressorMixin
• builtins.object

### Static methods

def __init__(

self, kernel=None, alpha=1e-10, optimizer='fmin_l_bfgs_b', n_restarts_optimizer=0, normalize_y=False, copy_X_train=True, random_state=None, noise=None)

Initialize self. See help(type(self)) for accurate signature.

def __init__(self, kernel=None, alpha=1e-10,
optimizer="fmin_l_bfgs_b", n_restarts_optimizer=0,
normalize_y=False, copy_X_train=True, random_state=None,
noise=None):
self.noise = noise
super(GaussianProcessRegressor, self).__init__(
kernel=kernel, alpha=alpha, optimizer=optimizer,
n_restarts_optimizer=n_restarts_optimizer,
normalize_y=normalize_y, copy_X_train=copy_X_train,
random_state=random_state)


def fit(

self, X, y)

Fit Gaussian process regression model.

## Parameters

• X [array-like, shape = (n_samples, n_features)]: Training data

• y [array-like, shape = (n_samples, [n_output_dims])]: Target values

## Returns

• self: Returns an instance of self.
def fit(self, X, y):
"""Fit Gaussian process regression model.
Parameters
----------
* X [array-like, shape = (n_samples, n_features)]:
Training data
* y [array-like, shape = (n_samples, [n_output_dims])]:
Target values
Returns
-------
* self:
Returns an instance of self.
"""
if isinstance(self.noise, str) and self.noise != "gaussian":
raise ValueError("expected noise to be 'gaussian', got %s"
% self.noise)
if self.kernel is None:
self.kernel = ConstantKernel(1.0, constant_value_bounds="fixed") \
* RBF(1.0, length_scale_bounds="fixed")
if self.noise == "gaussian":
self.kernel = self.kernel + WhiteKernel()
elif self.noise:
self.kernel = self.kernel + WhiteKernel(
noise_level=self.noise, noise_level_bounds="fixed"
)
super(GaussianProcessRegressor, self).fit(X, y)
self.noise_ = None
if self.noise:
# The noise component of this kernel should be set to zero
# while estimating K(X_test, X_test)
# Note that the term K(X, X) should include the noise but
# this (K(X, X))^-1y is precomputed as the attribute alpha_.
# (Notice the underscore).
# This has been described in Eq 2.24 of
# http://www.gaussianprocess.org/gpml/chapters/RW2.pdf
# Hence this hack
if isinstance(self.kernel_, WhiteKernel):
self.kernel_.set_params(noise_level=0.0)
else:
white_present, white_param = _param_for_white_kernel_in_Sum(
self.kernel_)
# This should always be true. Just in case.
if white_present:
noise_kernel = self.kernel_.get_params()[white_param]
self.noise_ = noise_kernel.noise_level
self.kernel_.set_params(
**{white_param: WhiteKernel(noise_level=0.0)})
# Precompute arrays needed at prediction
L_inv = solve_triangular(self.L_.T, np.eye(self.L_.shape[0]))
self.K_inv_ = L_inv.dot(L_inv.T)
# Fix deprecation warning #462
if int(sklearn.__version__[2:4]) >= 19:
self.y_train_mean_ = self._y_train_mean
else:
self.y_train_mean_ = self.y_train_mean
return self


def predict(

Predict output for X.

In addition to the mean of the predictive distribution, also its standard deviation (return_std=True) or covariance (return_cov=True), the gradient of the mean and the standard-deviation with respect to X can be optionally provided.

## Parameters

• X [array-like, shape = (n_samples, n_features)]: Query points where the GP is evaluated.

• return_std [bool, default: False]: If True, the standard-deviation of the predictive distribution at the query points is returned along with the mean.

• return_cov [bool, default: False]: If True, the covariance of the joint predictive distribution at the query points is returned along with the mean.

• return_mean_grad [bool, default: False]: Whether or not to return the gradient of the mean. Only valid when X is a single point.

• return_std_grad [bool, default: False]: Whether or not to return the gradient of the std. Only valid when X is a single point.

## Returns

• y_mean [array, shape = (n_samples, [n_output_dims]): Mean of predictive distribution a query points

• y_std [array, shape = (n_samples,), optional]: Standard deviation of predictive distribution at query points. Only returned when return_std is True.

• y_cov [array, shape = (n_samples, n_samples), optional]: Covariance of joint predictive distribution a query points. Only returned when return_cov is True.

• y_mean_grad [shape = (n_samples, n_features)]: The gradient of the predicted mean

• y_std_grad [shape = (n_samples, n_features)]: The gradient of the predicted std.

def predict(self, X, return_std=False, return_cov=False,
"""
Predict output for X.
In addition to the mean of the predictive distribution, also its
standard deviation (return_std=True) or covariance (return_cov=True),
the gradient of the mean and the standard-deviation with respect to X
can be optionally provided.
Parameters
----------
* X [array-like, shape = (n_samples, n_features)]:
Query points where the GP is evaluated.
* return_std [bool, default: False]:
If True, the standard-deviation of the predictive distribution at
the query points is returned along with the mean.
* return_cov [bool, default: False]:
If True, the covariance of the joint predictive distribution at
the query points is returned along with the mean.
* return_mean_grad [bool, default: False]:
Whether or not to return the gradient of the mean.
Only valid when X is a single point.
* return_std_grad [bool, default: False]:
Whether or not to return the gradient of the std.
Only valid when X is a single point.
Returns
-------
* y_mean [array, shape = (n_samples, [n_output_dims]):
Mean of predictive distribution a query points
* y_std [array, shape = (n_samples,), optional]:
Standard deviation of predictive distribution at query points.
Only returned when return_std is True.
* y_cov [array, shape = (n_samples, n_samples), optional]:
Covariance of joint predictive distribution a query points.
Only returned when return_cov is True.
* y_mean_grad [shape = (n_samples, n_features)]:
The gradient of the predicted mean
* y_std_grad [shape = (n_samples, n_features)]:
The gradient of the predicted std.
"""
if return_std and return_cov:
raise RuntimeError(
"Not returning standard deviation of predictions when "
"returning full covariance.")
raise ValueError(
"Not returning std_gradient without returning "
"the std.")
X = check_array(X)
raise ValueError("Not implemented for n_samples > 1")
if not hasattr(self, "X_train_"):  # Not fit; predict based on GP prior
y_mean = np.zeros(X.shape[0])
if return_cov:
y_cov = self.kernel(X)
return y_mean, y_cov
elif return_std:
y_var = self.kernel.diag(X)
return y_mean, np.sqrt(y_var)
else:
return y_mean
else:  # Predict based on GP posterior
K_trans = self.kernel_(X, self.X_train_)
y_mean = K_trans.dot(self.alpha_)    # Line 4 (y_mean = f_star)
y_mean = self.y_train_mean_ + y_mean  # undo normal.
if return_cov:
v = cho_solve((self.L_, True), K_trans.T)  # Line 5
y_cov = self.kernel_(X) - K_trans.dot(v)   # Line 6
return y_mean, y_cov
elif return_std:
K_inv = self.K_inv_
# Compute variance of predictive distribution
y_var = self.kernel_.diag(X)
y_var -= np.einsum("ki,kj,ij->k", K_trans, K_trans, K_inv)
# Check if any of the variances is negative because of
# numerical issues. If yes: set the variance to 0.
y_var_negative = y_var < 0
if np.any(y_var_negative):
warnings.warn("Predicted variances smaller than 0. "
"Setting those variances to 0.")
y_var[y_var_negative] = 0.0
y_std = np.sqrt(y_var)
if return_std:
else:
else:
if return_std:
return y_mean, y_std
else:
return y_mean


### Instance variables

var noise

var rng

DEPRECATED: Attribute rng was deprecated in version 0.19 and will be removed in 0.21.

var y_train_mean

DEPRECATED: Attribute y_train_mean was deprecated in version 0.19 and will be removed in 0.21.

Predict several quantiles with one estimator.

This is a wrapper around GradientBoostingRegressor's quantile regression that allows you to predict several quantiles in one go.

## Parameters

• quantiles [array-like]: Quantiles to predict. By default the 16, 50 and 84% quantiles are predicted.

• base_estimator [GradientBoostingRegressor instance or None (default)]: Quantile regressor used to make predictions. Only instances of GradientBoostingRegressor are supported. Use this to change the hyper-parameters of the estimator.

• n_jobs [int, default=1]: The number of jobs to run in parallel for fit. If -1, then the number of jobs is set to the number of cores.

• random_state [int, RandomState instance, or None (default)]: Set random state to something other than None for reproducible results.

class GradientBoostingQuantileRegressor(BaseEstimator, RegressorMixin):
"""Predict several quantiles with one estimator.

This is a wrapper around GradientBoostingRegressor's quantile
regression that allows you to predict several quantiles in
one go.

Parameters
----------
* quantiles [array-like]:
Quantiles to predict. By default the 16, 50 and 84%
quantiles are predicted.

* base_estimator [GradientBoostingRegressor instance or None (default)]:
Quantile regressor used to make predictions. Only instances
of GradientBoostingRegressor are supported. Use this to change
the hyper-parameters of the estimator.

* n_jobs [int, default=1]:
The number of jobs to run in parallel for fit.
If -1, then the number of jobs is set to the number of cores.

* random_state [int, RandomState instance, or None (default)]:
Set random state to something other than None for reproducible
results.
"""

def __init__(self, quantiles=[0.16, 0.5, 0.84], base_estimator=None,
n_jobs=1, random_state=None):
self.quantiles = quantiles
self.random_state = random_state
self.base_estimator = base_estimator
self.n_jobs = n_jobs

def fit(self, X, y):
"""Fit one regressor for each quantile.

Parameters
----------
* X [array-like, shape=(n_samples, n_features)]:
Training vectors, where n_samples is the number of samples
and n_features is the number of features.

* y [array-like, shape=(n_samples,)]:
Target values (real numbers in regression)
"""
rng = check_random_state(self.random_state)

if self.base_estimator is None:
else:
base_estimator = self.base_estimator

raise ValueError('base_estimator has to be of type'

if not base_estimator.loss == 'quantile':
raise ValueError('base_estimator has to use quantile'
' loss not %s' % base_estimator.loss)

# The predictions for different quantiles should be sorted.
# Therefore each of the regressors need the same seed.
base_estimator.set_params(random_state=rng)
regressors = []
for q in self.quantiles:
regressor = clone(base_estimator)
regressor.set_params(alpha=q)

regressors.append(regressor)

delayed(_parallel_fit)(regressor, X, y)
for regressor in regressors)

return self

def predict(self, X, return_std=False, return_quantiles=False):
"""Predict.

Predict X at every quantile if return_std is set to False.
If return_std is set to True, then return the mean
and the predicted standard deviation, which is approximated as
the (0.84th quantile - 0.16th quantile) divided by 2.0

Parameters
----------
* X [array-like, shape=(n_samples, n_features)]:
where n_samples is the number of samples
and n_features is the number of features.
"""
predicted_quantiles = np.asarray(
[rgr.predict(X) for rgr in self.regressors_])
if return_quantiles:
return predicted_quantiles.T

elif return_std:
std_quantiles = [0.16, 0.5, 0.84]
raise ValueError(
"return_std works only if the quantiles during "
"instantiation include 0.16, 0.5 and 0.84")
low = self.regressors_[self.quantiles.index(0.16)].predict(X)
high = self.regressors_[self.quantiles.index(0.84)].predict(X)
mean = self.regressors_[self.quantiles.index(0.5)].predict(X)
return mean, ((high - low) / 2.0)

# return the mean
return self.regressors_[self.quantiles.index(0.5)].predict(X)


### Static methods

def __init__(

self, quantiles=[0.16, 0.5, 0.84], base_estimator=None, n_jobs=1, random_state=None)

Initialize self. See help(type(self)) for accurate signature.

def __init__(self, quantiles=[0.16, 0.5, 0.84], base_estimator=None,
n_jobs=1, random_state=None):
self.quantiles = quantiles
self.random_state = random_state
self.base_estimator = base_estimator
self.n_jobs = n_jobs


def fit(

self, X, y)

Fit one regressor for each quantile.

## Parameters

• X [array-like, shape=(n_samples, n_features)]: Training vectors, where n_samples is the number of samples and n_features is the number of features.

• y [array-like, shape=(n_samples,)]: Target values (real numbers in regression)

def fit(self, X, y):
"""Fit one regressor for each quantile.
Parameters
----------
* X [array-like, shape=(n_samples, n_features)]:
Training vectors, where n_samples is the number of samples
and n_features is the number of features.
* y [array-like, shape=(n_samples,)]:
Target values (real numbers in regression)
"""
rng = check_random_state(self.random_state)
if self.base_estimator is None:
else:
base_estimator = self.base_estimator
raise ValueError('base_estimator has to be of type'
if not base_estimator.loss == 'quantile':
raise ValueError('base_estimator has to use quantile'
' loss not %s' % base_estimator.loss)
# The predictions for different quantiles should be sorted.
# Therefore each of the regressors need the same seed.
base_estimator.set_params(random_state=rng)
regressors = []
for q in self.quantiles:
regressor = clone(base_estimator)
regressor.set_params(alpha=q)
regressors.append(regressor)
delayed(_parallel_fit)(regressor, X, y)
for regressor in regressors)
return self


def predict(

self, X, return_std=False, return_quantiles=False)

Predict.

Predict X at every quantile if return_std is set to False. If return_std is set to True, then return the mean and the predicted standard deviation, which is approximated as the (0.84th quantile - 0.16th quantile) divided by 2.0

## Parameters

• X [array-like, shape=(n_samples, n_features)]: where n_samples is the number of samples and n_features is the number of features.
def predict(self, X, return_std=False, return_quantiles=False):
"""Predict.
Predict X at every quantile if return_std is set to False.
If return_std is set to True, then return the mean
and the predicted standard deviation, which is approximated as
the (0.84th quantile - 0.16th quantile) divided by 2.0
Parameters
----------
* X [array-like, shape=(n_samples, n_features)]:
where n_samples is the number of samples
and n_features is the number of features.
"""
predicted_quantiles = np.asarray(
[rgr.predict(X) for rgr in self.regressors_])
if return_quantiles:
return predicted_quantiles.T
elif return_std:
std_quantiles = [0.16, 0.5, 0.84]
raise ValueError(
"return_std works only if the quantiles during "
"instantiation include 0.16, 0.5 and 0.84")
low = self.regressors_[self.quantiles.index(0.16)].predict(X)
high = self.regressors_[self.quantiles.index(0.84)].predict(X)
mean = self.regressors_[self.quantiles.index(0.5)].predict(X)
return mean, ((high - low) / 2.0)
# return the mean
return self.regressors_[self.quantiles.index(0.5)].predict(X)


### Instance variables

var base_estimator

var n_jobs

var quantiles

var random_state

class RandomForestRegressor

RandomForestRegressor that supports conditional std computation.

## Parameters

n_estimators : integer, optional (default=10) The number of trees in the forest.

criterion : string, optional (default="mse") The function to measure the quality of a split. Supported criteria are "mse" for the mean squared error, which is equal to variance reduction as feature selection criterion, and "mae" for the mean absolute error.

max_features : int, float, string or None, optional (default="auto") The number of features to consider when looking for the best split: - If int, then consider max_features features at each split. - If float, then max_features is a percentage and int(max_features * n_features) features are considered at each split. - If "auto", then max_features=n_features. - If "sqrt", then max_features=sqrt(n_features). - If "log2", then max_features=log2(n_features). - If None, then max_features=n_features. Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

max_depth : integer or None, optional (default=None) The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_split : int, float, optional (default=2) The minimum number of samples required to split an internal node: - If int, then consider min_samples_split as the minimum number. - If float, then min_samples_split is a percentage and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

min_samples_leaf : int, float, optional (default=1) The minimum number of samples required to be at a leaf node: - If int, then consider min_samples_leaf as the minimum number. - If float, then min_samples_leaf is a percentage and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

min_weight_fraction_leaf : float, optional (default=0.) The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

max_leaf_nodes : int or None, optional (default=None) Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

min_impurity_decrease : float, optional (default=0.) A node will be split if this split induces a decrease of the impurity greater than or equal to this value. The weighted impurity decrease equation is the following:: N_t / N * (impurity - N_t_R / N_t * right_impurity - N_t_L / N_t * left_impurity) where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child. N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

bootstrap : boolean, optional (default=True) Whether bootstrap samples are used when building trees.

oob_score : bool, optional (default=False) whether to use out-of-bag samples to estimate the R^2 on unseen data.

n_jobs : integer, optional (default=1) The number of jobs to run in parallel for both fit and predict. If -1, then the number of jobs is set to the number of cores.

random_state : int, RandomState instance or None, optional (default=None) If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

verbose : int, optional (default=0) Controls the verbosity of the tree building process.

warm_start : bool, optional (default=False) When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest.

## Attributes

estimators_ : list of DecisionTreeRegressor The collection of fitted sub-estimators.

feature_importances_ : array of shape = [n_features] The feature importances (the higher, the more important the feature).

n_features_ : int The number of features when fit is performed.

n_outputs_ : int The number of outputs when fit is performed.

oob_score_ : float Score of the training dataset obtained using an out-of-bag estimate.

oob_prediction_ : array of shape = [n_samples] Prediction computed with out-of-bag estimate on the training set.

## Notes

The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values. The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data, max_features=n_features and bootstrap=False, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a deterministic behaviour during fitting, random_state has to be fixed.

## References

.. [1] L. Breiman, "Random Forests", Machine Learning, 45(1), 5-32, 2001.

class RandomForestRegressor(_sk_RandomForestRegressor):
"""
RandomForestRegressor that supports conditional std computation.

Parameters
----------
n_estimators : integer, optional (default=10)
The number of trees in the forest.

criterion : string, optional (default="mse")
The function to measure the quality of a split. Supported criteria
are "mse" for the mean squared error, which is equal to variance
reduction as feature selection criterion, and "mae" for the mean
absolute error.

max_features : int, float, string or None, optional (default="auto")
The number of features to consider when looking for the best split:
- If int, then consider max_features features at each split.
- If float, then max_features is a percentage and
int(max_features * n_features) features are considered at each
split.
- If "auto", then max_features=n_features.
- If "sqrt", then max_features=sqrt(n_features).
- If "log2", then max_features=log2(n_features).
- If None, then max_features=n_features.
Note: the search for a split does not stop until at least one
valid partition of the node samples is found, even if it requires to
effectively inspect more than max_features features.

max_depth : integer or None, optional (default=None)
The maximum depth of the tree. If None, then nodes are expanded until
all leaves are pure or until all leaves contain less than
min_samples_split samples.

min_samples_split : int, float, optional (default=2)
The minimum number of samples required to split an internal node:
- If int, then consider min_samples_split as the minimum number.
- If float, then min_samples_split is a percentage and
ceil(min_samples_split * n_samples) are the minimum
number of samples for each split.

min_samples_leaf : int, float, optional (default=1)
The minimum number of samples required to be at a leaf node:
- If int, then consider min_samples_leaf as the minimum number.
- If float, then min_samples_leaf is a percentage and
ceil(min_samples_leaf * n_samples) are the minimum
number of samples for each node.

min_weight_fraction_leaf : float, optional (default=0.)
The minimum weighted fraction of the sum total of weights (of all
the input samples) required to be at a leaf node. Samples have
equal weight when sample_weight is not provided.

max_leaf_nodes : int or None, optional (default=None)
Grow trees with max_leaf_nodes in best-first fashion.
Best nodes are defined as relative reduction in impurity.
If None then unlimited number of leaf nodes.

min_impurity_decrease : float, optional (default=0.)
A node will be split if this split induces a decrease of the impurity
greater than or equal to this value.
The weighted impurity decrease equation is the following::
N_t / N * (impurity - N_t_R / N_t * right_impurity
- N_t_L / N_t * left_impurity)
where N is the total number of samples, N_t is the number of
samples at the current node, N_t_L is the number of samples in the
left child, and N_t_R is the number of samples in the right child.
N, N_t, N_t_R and N_t_L all refer to the weighted sum,
if sample_weight is passed.

bootstrap : boolean, optional (default=True)
Whether bootstrap samples are used when building trees.

oob_score : bool, optional (default=False)
whether to use out-of-bag samples to estimate
the R^2 on unseen data.

n_jobs : integer, optional (default=1)
The number of jobs to run in parallel for both fit and predict.
If -1, then the number of jobs is set to the number of cores.

random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by np.random.

verbose : int, optional (default=0)
Controls the verbosity of the tree building process.

warm_start : bool, optional (default=False)
When set to True, reuse the solution of the previous call to fit
and add more estimators to the ensemble, otherwise, just fit a whole
new forest.

Attributes
----------
estimators_ : list of DecisionTreeRegressor
The collection of fitted sub-estimators.

feature_importances_ : array of shape = [n_features]
The feature importances (the higher, the more important the feature).

n_features_ : int
The number of features when fit is performed.

n_outputs_ : int
The number of outputs when fit is performed.

oob_score_ : float
Score of the training dataset obtained using an out-of-bag estimate.

oob_prediction_ : array of shape = [n_samples]
Prediction computed with out-of-bag estimate on the training set.

Notes
-----
The default values for the parameters controlling the size of the trees
(e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and
unpruned trees which can potentially be very large on some data sets. To
reduce memory consumption, the complexity and size of the trees should be
controlled by setting those parameter values.
The features are always randomly permuted at each split. Therefore,
the best found split may vary, even with the same training data,
max_features=n_features and bootstrap=False, if the improvement
of the criterion is identical for several splits enumerated during the
search of the best split. To obtain a deterministic behaviour during
fitting, random_state has to be fixed.

References
----------
.. [1] L. Breiman, "Random Forests", Machine Learning, 45(1), 5-32, 2001.
"""
def __init__(self, n_estimators=10, criterion='mse', max_depth=None,
min_samples_split=2, min_samples_leaf=1,
min_weight_fraction_leaf=0.0, max_features='auto',
max_leaf_nodes=None, bootstrap=True, oob_score=False,
n_jobs=1, random_state=None, verbose=0, warm_start=False,
min_variance=0.0):
self.min_variance = min_variance
super(RandomForestRegressor, self).__init__(
n_estimators=n_estimators, criterion=criterion,
max_depth=max_depth,
min_samples_split=min_samples_split,
min_samples_leaf=min_samples_leaf,
min_weight_fraction_leaf=min_weight_fraction_leaf,
max_features=max_features, max_leaf_nodes=max_leaf_nodes,
bootstrap=bootstrap, oob_score=oob_score,
n_jobs=n_jobs, random_state=random_state,
verbose=verbose, warm_start=warm_start)

def predict(self, X, return_std=False):
"""Predict continuous output for X.

Parameters
----------
X : array of shape = (n_samples, n_features)
Input data.

return_std : boolean
Whether or not to return the standard deviation.

Returns
-------
predictions : array-like of shape = (n_samples,)
Predicted values for X. If criterion is set to "mse",
then predictions[i] ~= mean(y | X[i]).

std : array-like of shape=(n_samples,)
Standard deviation of y at X. If criterion
is set to "mse", then std[i] ~= std(y | X[i]).
"""
mean = super(RandomForestRegressor, self).predict(X)

if return_std:
if self.criterion != "mse":
raise ValueError(
"Expected impurity to be 'mse', got %s instead"
% self.criterion)
std = _return_std(X, self.estimators_, mean, self.min_variance)
return mean, std
return mean


### Ancestors (in MRO)

• RandomForestRegressor
• sklearn.ensemble.forest.RandomForestRegressor
• sklearn.ensemble.forest.ForestRegressor
• abc.NewBase
• sklearn.ensemble.forest.BaseForest
• abc.NewBase
• sklearn.ensemble.base.BaseEnsemble
• abc.NewBase
• sklearn.base.BaseEstimator
• sklearn.base.MetaEstimatorMixin
• sklearn.base.RegressorMixin
• builtins.object

### Static methods

def __init__(

self, n_estimators=10, criterion='mse', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, min_variance=0.0)

Initialize self. See help(type(self)) for accurate signature.

def __init__(self, n_estimators=10, criterion='mse', max_depth=None,
min_samples_split=2, min_samples_leaf=1,
min_weight_fraction_leaf=0.0, max_features='auto',
max_leaf_nodes=None, bootstrap=True, oob_score=False,
n_jobs=1, random_state=None, verbose=0, warm_start=False,
min_variance=0.0):
self.min_variance = min_variance
super(RandomForestRegressor, self).__init__(
n_estimators=n_estimators, criterion=criterion,
max_depth=max_depth,
min_samples_split=min_samples_split,
min_samples_leaf=min_samples_leaf,
min_weight_fraction_leaf=min_weight_fraction_leaf,
max_features=max_features, max_leaf_nodes=max_leaf_nodes,
bootstrap=bootstrap, oob_score=oob_score,
n_jobs=n_jobs, random_state=random_state,
verbose=verbose, warm_start=warm_start)


def predict(

self, X, return_std=False)

Predict continuous output for X.

## Parameters

X : array of shape = (n_samples, n_features) Input data.

return_std : boolean Whether or not to return the standard deviation.

## Returns

predictions : array-like of shape = (n_samples,) Predicted values for X. If criterion is set to "mse", then predictions[i] ~= mean(y | X[i]).

std : array-like of shape=(n_samples,) Standard deviation of y at X. If criterion is set to "mse", then std[i] ~= std(y | X[i]).

def predict(self, X, return_std=False):
"""Predict continuous output for X.
Parameters
----------
X : array of shape = (n_samples, n_features)
Input data.
return_std : boolean
Whether or not to return the standard deviation.
Returns
-------
predictions : array-like of shape = (n_samples,)
Predicted values for X. If criterion is set to "mse",
then predictions[i] ~= mean(y | X[i]).
std : array-like of shape=(n_samples,)
Standard deviation of y at X. If criterion
is set to "mse", then std[i] ~= std(y | X[i]).
"""
mean = super(RandomForestRegressor, self).predict(X)
if return_std:
if self.criterion != "mse":
raise ValueError(
"Expected impurity to be 'mse', got %s instead"
% self.criterion)
std = _return_std(X, self.estimators_, mean, self.min_variance)
return mean, std
return mean


### Instance variables

var feature_importances_

Return the feature importances (the higher, the more important the feature).

## Returns

feature_importances_ : array, shape = [n_features]

var min_variance