skopt.learning module
Machine learning extensions for modelbased optimization.
"""Machine learning extensions for modelbased optimization.""" from .forest import RandomForestRegressor from .forest import ExtraTreesRegressor from .gaussian_process import GaussianProcessRegressor from .gbrt import GradientBoostingQuantileRegressor __all__ = ("RandomForestRegressor", "ExtraTreesRegressor", "GradientBoostingQuantileRegressor", "GaussianProcessRegressor")
Classes
class ExtraTreesRegressor
ExtraTreesRegressor that supports conditional standard deviation.
Parameters
n_estimators : integer, optional (default=10) The number of trees in the forest.
criterion : string, optional (default="mse") The function to measure the quality of a split. Supported criteria are "mse" for the mean squared error, which is equal to variance reduction as feature selection criterion, and "mae" for the mean absolute error.
max_features : int, float, string or None, optional (default="auto")
The number of features to consider when looking for the best split:
 If int, then consider max_features
features at each split.
 If float, then max_features
is a percentage and
int(max_features * n_features)
features are considered at each
split.
 If "auto", then max_features=n_features
.
 If "sqrt", then max_features=sqrt(n_features)
.
 If "log2", then max_features=log2(n_features)
.
 If None, then max_features=n_features
.
Note: the search for a split does not stop until at least one
valid partition of the node samples is found, even if it requires to
effectively inspect more than max_features
features.
max_depth : integer or None, optional (default=None) The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split : int, float, optional (default=2)
The minimum number of samples required to split an internal node:
 If int, then consider min_samples_split
as the minimum number.
 If float, then min_samples_split
is a percentage and
ceil(min_samples_split * n_samples)
are the minimum
number of samples for each split.
min_samples_leaf : int, float, optional (default=1)
The minimum number of samples required to be at a leaf node:
 If int, then consider min_samples_leaf
as the minimum number.
 If float, then min_samples_leaf
is a percentage and
ceil(min_samples_leaf * n_samples)
are the minimum
number of samples for each node.
min_weight_fraction_leaf : float, optional (default=0.) The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
max_leaf_nodes : int or None, optional (default=None)
Grow trees with max_leaf_nodes
in bestfirst fashion.
Best nodes are defined as relative reduction in impurity.
If None then unlimited number of leaf nodes.
min_impurity_decrease : float, optional (default=0.)
A node will be split if this split induces a decrease of the impurity
greater than or equal to this value.
The weighted impurity decrease equation is the following::
N_t / N * (impurity  N_t_R / N_t * right_impurity
 N_t_L / N_t * left_impurity)
where N
is the total number of samples, N_t
is the number of
samples at the current node, N_t_L
is the number of samples in the
left child, and N_t_R
is the number of samples in the right child.
N
, N_t
, N_t_R
and N_t_L
all refer to the weighted sum,
if sample_weight
is passed.
bootstrap : boolean, optional (default=True) Whether bootstrap samples are used when building trees.
oob_score : bool, optional (default=False) whether to use outofbag samples to estimate the R^2 on unseen data.
n_jobs : integer, optional (default=1)
The number of jobs to run in parallel for both fit
and predict
.
If 1, then the number of jobs is set to the number of cores.
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by np.random
.
verbose : int, optional (default=0) Controls the verbosity of the tree building process.
warm_start : bool, optional (default=False)
When set to True
, reuse the solution of the previous call to fit
and add more estimators to the ensemble, otherwise, just fit a whole
new forest.
Attributes
estimators_ : list of DecisionTreeRegressor The collection of fitted subestimators.
feature_importances_ : array of shape = [n_features] The feature importances (the higher, the more important the feature).
n_features_ : int
The number of features when fit
is performed.
n_outputs_ : int
The number of outputs when fit
is performed.
oob_score_ : float Score of the training dataset obtained using an outofbag estimate.
oob_prediction_ : array of shape = [n_samples] Prediction computed with outofbag estimate on the training set.
Notes
The default values for the parameters controlling the size of the trees
(e.g. max_depth
, min_samples_leaf
, etc.) lead to fully grown and
unpruned trees which can potentially be very large on some data sets. To
reduce memory consumption, the complexity and size of the trees should be
controlled by setting those parameter values.
The features are always randomly permuted at each split. Therefore,
the best found split may vary, even with the same training data,
max_features=n_features
and bootstrap=False
, if the improvement
of the criterion is identical for several splits enumerated during the
search of the best split. To obtain a deterministic behaviour during
fitting, random_state
has to be fixed.
References
.. [1] L. Breiman, "Random Forests", Machine Learning, 45(1), 532, 2001.
class ExtraTreesRegressor(_sk_ExtraTreesRegressor): """ ExtraTreesRegressor that supports conditional standard deviation. Parameters  n_estimators : integer, optional (default=10) The number of trees in the forest. criterion : string, optional (default="mse") The function to measure the quality of a split. Supported criteria are "mse" for the mean squared error, which is equal to variance reduction as feature selection criterion, and "mae" for the mean absolute error. max_features : int, float, string or None, optional (default="auto") The number of features to consider when looking for the best split:  If int, then consider `max_features` features at each split.  If float, then `max_features` is a percentage and `int(max_features * n_features)` features are considered at each split.  If "auto", then `max_features=n_features`.  If "sqrt", then `max_features=sqrt(n_features)`.  If "log2", then `max_features=log2(n_features)`.  If None, then `max_features=n_features`. Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than ``max_features`` features. max_depth : integer or None, optional (default=None) The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. min_samples_split : int, float, optional (default=2) The minimum number of samples required to split an internal node:  If int, then consider `min_samples_split` as the minimum number.  If float, then `min_samples_split` is a percentage and `ceil(min_samples_split * n_samples)` are the minimum number of samples for each split. min_samples_leaf : int, float, optional (default=1) The minimum number of samples required to be at a leaf node:  If int, then consider `min_samples_leaf` as the minimum number.  If float, then `min_samples_leaf` is a percentage and `ceil(min_samples_leaf * n_samples)` are the minimum number of samples for each node. min_weight_fraction_leaf : float, optional (default=0.) The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided. max_leaf_nodes : int or None, optional (default=None) Grow trees with ``max_leaf_nodes`` in bestfirst fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes. min_impurity_decrease : float, optional (default=0.) A node will be split if this split induces a decrease of the impurity greater than or equal to this value. The weighted impurity decrease equation is the following:: N_t / N * (impurity  N_t_R / N_t * right_impurity  N_t_L / N_t * left_impurity) where ``N`` is the total number of samples, ``N_t`` is the number of samples at the current node, ``N_t_L`` is the number of samples in the left child, and ``N_t_R`` is the number of samples in the right child. ``N``, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum, if ``sample_weight`` is passed. bootstrap : boolean, optional (default=True) Whether bootstrap samples are used when building trees. oob_score : bool, optional (default=False) whether to use outofbag samples to estimate the R^2 on unseen data. n_jobs : integer, optional (default=1) The number of jobs to run in parallel for both `fit` and `predict`. If 1, then the number of jobs is set to the number of cores. random_state : int, RandomState instance or None, optional (default=None) If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by `np.random`. verbose : int, optional (default=0) Controls the verbosity of the tree building process. warm_start : bool, optional (default=False) When set to ``True``, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest. Attributes  estimators_ : list of DecisionTreeRegressor The collection of fitted subestimators. feature_importances_ : array of shape = [n_features] The feature importances (the higher, the more important the feature). n_features_ : int The number of features when ``fit`` is performed. n_outputs_ : int The number of outputs when ``fit`` is performed. oob_score_ : float Score of the training dataset obtained using an outofbag estimate. oob_prediction_ : array of shape = [n_samples] Prediction computed with outofbag estimate on the training set. Notes  The default values for the parameters controlling the size of the trees (e.g. ``max_depth``, ``min_samples_leaf``, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values. The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data, ``max_features=n_features`` and ``bootstrap=False``, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a deterministic behaviour during fitting, ``random_state`` has to be fixed. References  .. [1] L. Breiman, "Random Forests", Machine Learning, 45(1), 532, 2001. """ def __init__(self, n_estimators=10, criterion='mse', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=False, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, min_variance=0.0): self.min_variance = min_variance super(ExtraTreesRegressor, self).__init__( n_estimators=n_estimators, criterion=criterion, max_depth=max_depth, min_samples_split=min_samples_split, min_samples_leaf=min_samples_leaf, min_weight_fraction_leaf=min_weight_fraction_leaf, max_features=max_features, max_leaf_nodes=max_leaf_nodes, bootstrap=bootstrap, oob_score=oob_score, n_jobs=n_jobs, random_state=random_state, verbose=verbose, warm_start=warm_start) def predict(self, X, return_std=False): """ Predict continuous output for X. Parameters  X : arraylike of shape=(n_samples, n_features) Input data. return_std : boolean Whether or not to return the standard deviation. Returns  predictions : arraylike of shape=(n_samples,) Predicted values for X. If criterion is set to "mse", then `predictions[i] ~= mean(y  X[i])`. std : arraylike of shape=(n_samples,) Standard deviation of `y` at `X`. If criterion is set to "mse", then `std[i] ~= std(y  X[i])`. """ mean = super(ExtraTreesRegressor, self).predict(X) if return_std: if self.criterion != "mse": raise ValueError( "Expected impurity to be 'mse', got %s instead" % self.criterion) std = _return_std(X, self.estimators_, mean, self.min_variance) return mean, std return mean
Ancestors (in MRO)
 ExtraTreesRegressor
 sklearn.ensemble.forest.ExtraTreesRegressor
 sklearn.ensemble.forest.ForestRegressor
 abc.NewBase
 sklearn.ensemble.forest.BaseForest
 abc.NewBase
 sklearn.ensemble.base.BaseEnsemble
 abc.NewBase
 sklearn.base.BaseEstimator
 sklearn.base.MetaEstimatorMixin
 sklearn.base.RegressorMixin
 builtins.object
Static methods
def __init__(
self, n_estimators=10, criterion='mse', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=False, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, min_variance=0.0)
Initialize self. See help(type(self)) for accurate signature.
def __init__(self, n_estimators=10, criterion='mse', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=False, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, min_variance=0.0): self.min_variance = min_variance super(ExtraTreesRegressor, self).__init__( n_estimators=n_estimators, criterion=criterion, max_depth=max_depth, min_samples_split=min_samples_split, min_samples_leaf=min_samples_leaf, min_weight_fraction_leaf=min_weight_fraction_leaf, max_features=max_features, max_leaf_nodes=max_leaf_nodes, bootstrap=bootstrap, oob_score=oob_score, n_jobs=n_jobs, random_state=random_state, verbose=verbose, warm_start=warm_start)
def predict(
self, X, return_std=False)
Predict continuous output for X.
Parameters
X : arraylike of shape=(n_samples, n_features) Input data.
return_std : boolean Whether or not to return the standard deviation.
Returns
predictions : arraylike of shape=(n_samples,)
Predicted values for X. If criterion is set to "mse",
then predictions[i] ~= mean(y  X[i])
.
std : arraylike of shape=(n_samples,)
Standard deviation of y
at X
. If criterion
is set to "mse", then std[i] ~= std(y  X[i])
.
def predict(self, X, return_std=False): """ Predict continuous output for X. Parameters  X : arraylike of shape=(n_samples, n_features) Input data. return_std : boolean Whether or not to return the standard deviation. Returns  predictions : arraylike of shape=(n_samples,) Predicted values for X. If criterion is set to "mse", then `predictions[i] ~= mean(y  X[i])`. std : arraylike of shape=(n_samples,) Standard deviation of `y` at `X`. If criterion is set to "mse", then `std[i] ~= std(y  X[i])`. """ mean = super(ExtraTreesRegressor, self).predict(X) if return_std: if self.criterion != "mse": raise ValueError( "Expected impurity to be 'mse', got %s instead" % self.criterion) std = _return_std(X, self.estimators_, mean, self.min_variance) return mean, std return mean
Instance variables
var feature_importances_
Return the feature importances (the higher, the more important the feature).
Returns
feature_importances_ : array, shape = [n_features]
var min_variance
class GaussianProcessRegressor
GaussianProcessRegressor that allows noise tunability.
The implementation is based on Algorithm 2.1 of Gaussian Processes for Machine Learning (GPML) by Rasmussen and Williams.
In addition to standard scikitlearn estimator API, GaussianProcessRegressor:
 allows prediction without prior fitting (based on the GP prior);
 provides an additional method sample_y(X), which evaluates samples drawn from the GPR (prior or posterior) at given inputs;
 exposes a method log_marginal_likelihood(theta), which can be used externally for other ways of selecting hyperparameters, e.g., via Markov chain Monte Carlo.
Parameters

kernel
[kernel object]: The kernel specifying the covariance function of the GP. If None is passed, the kernel "1.0 * RBF(1.0)" is used as default. Note that the kernel's hyperparameters are optimized during fitting. 
alpha
[float or arraylike, optional (default: 1e10)]: Value added to the diagonal of the kernel matrix during fitting. Larger values correspond to increased noise level in the observations and reduce potential numerical issue during fitting. If an array is passed, it must have the same number of entries as the data used for fitting and is used as datapointdependent noise level. Note that this is equivalent to adding a WhiteKernel with c=alpha. Allowing to specify the noise level directly as a parameter is mainly for convenience and for consistency with Ridge. 
optimizer
[string or callable, optional (default: "fmin_l_bfgs_b")]: Can either be one of the internally supported optimizers for optimizing the kernel's parameters, specified by a string, or an externally defined optimizer passed as a callable. If a callable is passed, it must have the signature::def optimizer(obj_func, initial_theta, bounds): # * 'obj_func' is the objective function to be maximized, which # takes the hyperparameters theta as parameter and an # optional flag eval_gradient, which determines if the # gradient is returned additionally to the function value # * 'initial_theta': the initial value for theta, which can be # used by local optimizers # * 'bounds': the bounds on the values of theta .... # Returned are the best found hyperparameters theta and # the corresponding value of the target function. return theta_opt, func_min
Per default, the 'fmin_l_bfgs_b' algorithm from scipy.optimize is used. If None is passed, the kernel's parameters are kept fixed. Available internal optimizers are::
'fmin_l_bfgs_b'

n_restarts_optimizer
[int, optional (default: 0)]: The number of restarts of the optimizer for finding the kernel's parameters which maximize the logmarginal likelihood. The first run of the optimizer is performed from the kernel's initial parameters, the remaining ones (if any) from thetas sampled loguniform randomly from the space of allowed thetavalues. If greater than 0, all bounds must be finite. Note that n_restarts_optimizer == 0 implies that one run is performed. 
normalize_y
[boolean, optional (default: False)]: Whether the target values y are normalized, i.e., the mean of the observed target values become zero. This parameter should be set to True if the target values' mean is expected to differ considerable from zero. When enabled, the normalization effectively modifies the GP's prior based on the data, which contradicts the likelihood principle; normalization is thus disabled per default. 
copy_X_train
[bool, optional (default: True)]: If True, a persistent copy of the training data is stored in the object. Otherwise, just a reference to the training data is stored, which might cause predictions to change if the data is modified externally. 
random_state
[integer or numpy.RandomState, optional]: The generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator. 
noise
[string, "gaussian", optional]: If set to "gaussian", then it is assumed thaty
is a noisy estimate off(x)
where the noise is gaussian.
Attributes

X_train_
[arraylike, shape = (n_samples, n_features)]: Feature values in training data (also required for prediction) 
y_train_
[arraylike, shape = (n_samples, [n_output_dims])]: Target values in training data (also required for prediction) 
kernel_
[kernel object]: The kernel used for prediction. The structure of the kernel is the same as the one passed as parameter but with optimized hyperparameters 
L_
[arraylike, shape = (n_samples, n_samples)]: Lowertriangular Cholesky decomposition of the kernel inX_train_

alpha_
[arraylike, shape = (n_samples,)]: Dual coefficients of training data points in kernel space 
log_marginal_likelihood_value_
[float]: The logmarginallikelihood ofself.kernel_.theta

noise_
[float]: Estimate of the gaussian noise. Useful only when noise is set to "gaussian".
class GaussianProcessRegressor(sk_GaussianProcessRegressor): """ GaussianProcessRegressor that allows noise tunability. The implementation is based on Algorithm 2.1 of Gaussian Processes for Machine Learning (GPML) by Rasmussen and Williams. In addition to standard scikitlearn estimator API, GaussianProcessRegressor: * allows prediction without prior fitting (based on the GP prior); * provides an additional method sample_y(X), which evaluates samples drawn from the GPR (prior or posterior) at given inputs; * exposes a method log_marginal_likelihood(theta), which can be used externally for other ways of selecting hyperparameters, e.g., via Markov chain Monte Carlo. Parameters  * `kernel` [kernel object]: The kernel specifying the covariance function of the GP. If None is passed, the kernel "1.0 * RBF(1.0)" is used as default. Note that the kernel's hyperparameters are optimized during fitting. * `alpha` [float or arraylike, optional (default: 1e10)]: Value added to the diagonal of the kernel matrix during fitting. Larger values correspond to increased noise level in the observations and reduce potential numerical issue during fitting. If an array is passed, it must have the same number of entries as the data used for fitting and is used as datapointdependent noise level. Note that this is equivalent to adding a WhiteKernel with c=alpha. Allowing to specify the noise level directly as a parameter is mainly for convenience and for consistency with Ridge. * `optimizer` [string or callable, optional (default: "fmin_l_bfgs_b")]: Can either be one of the internally supported optimizers for optimizing the kernel's parameters, specified by a string, or an externally defined optimizer passed as a callable. If a callable is passed, it must have the signature:: def optimizer(obj_func, initial_theta, bounds): # * 'obj_func' is the objective function to be maximized, which # takes the hyperparameters theta as parameter and an # optional flag eval_gradient, which determines if the # gradient is returned additionally to the function value # * 'initial_theta': the initial value for theta, which can be # used by local optimizers # * 'bounds': the bounds on the values of theta .... # Returned are the best found hyperparameters theta and # the corresponding value of the target function. return theta_opt, func_min Per default, the 'fmin_l_bfgs_b' algorithm from scipy.optimize is used. If None is passed, the kernel's parameters are kept fixed. Available internal optimizers are:: 'fmin_l_bfgs_b' * `n_restarts_optimizer` [int, optional (default: 0)]: The number of restarts of the optimizer for finding the kernel's parameters which maximize the logmarginal likelihood. The first run of the optimizer is performed from the kernel's initial parameters, the remaining ones (if any) from thetas sampled loguniform randomly from the space of allowed thetavalues. If greater than 0, all bounds must be finite. Note that n_restarts_optimizer == 0 implies that one run is performed. * `normalize_y` [boolean, optional (default: False)]: Whether the target values y are normalized, i.e., the mean of the observed target values become zero. This parameter should be set to True if the target values' mean is expected to differ considerable from zero. When enabled, the normalization effectively modifies the GP's prior based on the data, which contradicts the likelihood principle; normalization is thus disabled per default. * `copy_X_train` [bool, optional (default: True)]: If True, a persistent copy of the training data is stored in the object. Otherwise, just a reference to the training data is stored, which might cause predictions to change if the data is modified externally. * `random_state` [integer or numpy.RandomState, optional]: The generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator. * `noise` [string, "gaussian", optional]: If set to "gaussian", then it is assumed that `y` is a noisy estimate of `f(x)` where the noise is gaussian. Attributes  * `X_train_` [arraylike, shape = (n_samples, n_features)]: Feature values in training data (also required for prediction) * `y_train_` [arraylike, shape = (n_samples, [n_output_dims])]: Target values in training data (also required for prediction) * `kernel_` [kernel object]: The kernel used for prediction. The structure of the kernel is the same as the one passed as parameter but with optimized hyperparameters * `L_` [arraylike, shape = (n_samples, n_samples)]: Lowertriangular Cholesky decomposition of the kernel in ``X_train_`` * `alpha_` [arraylike, shape = (n_samples,)]: Dual coefficients of training data points in kernel space * `log_marginal_likelihood_value_` [float]: The logmarginallikelihood of ``self.kernel_.theta`` * `noise_` [float]: Estimate of the gaussian noise. Useful only when noise is set to "gaussian". """ def __init__(self, kernel=None, alpha=1e10, optimizer="fmin_l_bfgs_b", n_restarts_optimizer=0, normalize_y=False, copy_X_train=True, random_state=None, noise=None): self.noise = noise super(GaussianProcessRegressor, self).__init__( kernel=kernel, alpha=alpha, optimizer=optimizer, n_restarts_optimizer=n_restarts_optimizer, normalize_y=normalize_y, copy_X_train=copy_X_train, random_state=random_state) def fit(self, X, y): """Fit Gaussian process regression model. Parameters  * `X` [arraylike, shape = (n_samples, n_features)]: Training data * `y` [arraylike, shape = (n_samples, [n_output_dims])]: Target values Returns  * `self`: Returns an instance of self. """ if isinstance(self.noise, str) and self.noise != "gaussian": raise ValueError("expected noise to be 'gaussian', got %s" % self.noise) if self.kernel is None: self.kernel = ConstantKernel(1.0, constant_value_bounds="fixed") \ * RBF(1.0, length_scale_bounds="fixed") if self.noise == "gaussian": self.kernel = self.kernel + WhiteKernel() elif self.noise: self.kernel = self.kernel + WhiteKernel( noise_level=self.noise, noise_level_bounds="fixed" ) super(GaussianProcessRegressor, self).fit(X, y) self.noise_ = None if self.noise: # The noise component of this kernel should be set to zero # while estimating K(X_test, X_test) # Note that the term K(X, X) should include the noise but # this (K(X, X))^1y is precomputed as the attribute `alpha_`. # (Notice the underscore). # This has been described in Eq 2.24 of # http://www.gaussianprocess.org/gpml/chapters/RW2.pdf # Hence this hack if isinstance(self.kernel_, WhiteKernel): self.kernel_.set_params(noise_level=0.0) else: white_present, white_param = _param_for_white_kernel_in_Sum( self.kernel_) # This should always be true. Just in case. if white_present: noise_kernel = self.kernel_.get_params()[white_param] self.noise_ = noise_kernel.noise_level self.kernel_.set_params( **{white_param: WhiteKernel(noise_level=0.0)}) # Precompute arrays needed at prediction L_inv = solve_triangular(self.L_.T, np.eye(self.L_.shape[0])) self.K_inv_ = L_inv.dot(L_inv.T) # Fix deprecation warning #462 if int(sklearn.__version__[2:4]) >= 19: self.y_train_mean_ = self._y_train_mean else: self.y_train_mean_ = self.y_train_mean return self def predict(self, X, return_std=False, return_cov=False, return_mean_grad=False, return_std_grad=False): """ Predict output for X. In addition to the mean of the predictive distribution, also its standard deviation (return_std=True) or covariance (return_cov=True), the gradient of the mean and the standarddeviation with respect to X can be optionally provided. Parameters  * `X` [arraylike, shape = (n_samples, n_features)]: Query points where the GP is evaluated. * `return_std` [bool, default: False]: If True, the standarddeviation of the predictive distribution at the query points is returned along with the mean. * `return_cov` [bool, default: False]: If True, the covariance of the joint predictive distribution at the query points is returned along with the mean. * `return_mean_grad` [bool, default: False]: Whether or not to return the gradient of the mean. Only valid when X is a single point. * `return_std_grad` [bool, default: False]: Whether or not to return the gradient of the std. Only valid when X is a single point. Returns  * `y_mean` [array, shape = (n_samples, [n_output_dims]): Mean of predictive distribution a query points * `y_std` [array, shape = (n_samples,), optional]: Standard deviation of predictive distribution at query points. Only returned when return_std is True. * `y_cov` [array, shape = (n_samples, n_samples), optional]: Covariance of joint predictive distribution a query points. Only returned when return_cov is True. * `y_mean_grad` [shape = (n_samples, n_features)]: The gradient of the predicted mean * `y_std_grad` [shape = (n_samples, n_features)]: The gradient of the predicted std. """ if return_std and return_cov: raise RuntimeError( "Not returning standard deviation of predictions when " "returning full covariance.") if return_std_grad and not return_std: raise ValueError( "Not returning std_gradient without returning " "the std.") X = check_array(X) if X.shape[0] != 1 and (return_mean_grad or return_std_grad): raise ValueError("Not implemented for n_samples > 1") if not hasattr(self, "X_train_"): # Not fit; predict based on GP prior y_mean = np.zeros(X.shape[0]) if return_cov: y_cov = self.kernel(X) return y_mean, y_cov elif return_std: y_var = self.kernel.diag(X) return y_mean, np.sqrt(y_var) else: return y_mean else: # Predict based on GP posterior K_trans = self.kernel_(X, self.X_train_) y_mean = K_trans.dot(self.alpha_) # Line 4 (y_mean = f_star) y_mean = self.y_train_mean_ + y_mean # undo normal. if return_cov: v = cho_solve((self.L_, True), K_trans.T) # Line 5 y_cov = self.kernel_(X)  K_trans.dot(v) # Line 6 return y_mean, y_cov elif return_std: K_inv = self.K_inv_ # Compute variance of predictive distribution y_var = self.kernel_.diag(X) y_var = np.einsum("ki,kj,ij>k", K_trans, K_trans, K_inv) # Check if any of the variances is negative because of # numerical issues. If yes: set the variance to 0. y_var_negative = y_var < 0 if np.any(y_var_negative): warnings.warn("Predicted variances smaller than 0. " "Setting those variances to 0.") y_var[y_var_negative] = 0.0 y_std = np.sqrt(y_var) if return_mean_grad: grad = self.kernel_.gradient_x(X[0], self.X_train_) grad_mean = np.dot(grad.T, self.alpha_) if return_std_grad: grad_std = np.zeros(X.shape[1]) if not np.allclose(y_std, grad_std): grad_std = np.dot(K_trans, np.dot(K_inv, grad))[0] / y_std return y_mean, y_std, grad_mean, grad_std if return_std: return y_mean, y_std, grad_mean else: return y_mean, grad_mean else: if return_std: return y_mean, y_std else: return y_mean
Ancestors (in MRO)
 GaussianProcessRegressor
 sklearn.gaussian_process.gpr.GaussianProcessRegressor
 sklearn.base.BaseEstimator
 sklearn.base.RegressorMixin
 builtins.object
Static methods
def __init__(
self, kernel=None, alpha=1e10, optimizer='fmin_l_bfgs_b', n_restarts_optimizer=0, normalize_y=False, copy_X_train=True, random_state=None, noise=None)
Initialize self. See help(type(self)) for accurate signature.
def __init__(self, kernel=None, alpha=1e10, optimizer="fmin_l_bfgs_b", n_restarts_optimizer=0, normalize_y=False, copy_X_train=True, random_state=None, noise=None): self.noise = noise super(GaussianProcessRegressor, self).__init__( kernel=kernel, alpha=alpha, optimizer=optimizer, n_restarts_optimizer=n_restarts_optimizer, normalize_y=normalize_y, copy_X_train=copy_X_train, random_state=random_state)
def fit(
self, X, y)
Fit Gaussian process regression model.
Parameters

X
[arraylike, shape = (n_samples, n_features)]: Training data 
y
[arraylike, shape = (n_samples, [n_output_dims])]: Target values
Returns
self
: Returns an instance of self.
def fit(self, X, y): """Fit Gaussian process regression model. Parameters  * `X` [arraylike, shape = (n_samples, n_features)]: Training data * `y` [arraylike, shape = (n_samples, [n_output_dims])]: Target values Returns  * `self`: Returns an instance of self. """ if isinstance(self.noise, str) and self.noise != "gaussian": raise ValueError("expected noise to be 'gaussian', got %s" % self.noise) if self.kernel is None: self.kernel = ConstantKernel(1.0, constant_value_bounds="fixed") \ * RBF(1.0, length_scale_bounds="fixed") if self.noise == "gaussian": self.kernel = self.kernel + WhiteKernel() elif self.noise: self.kernel = self.kernel + WhiteKernel( noise_level=self.noise, noise_level_bounds="fixed" ) super(GaussianProcessRegressor, self).fit(X, y) self.noise_ = None if self.noise: # The noise component of this kernel should be set to zero # while estimating K(X_test, X_test) # Note that the term K(X, X) should include the noise but # this (K(X, X))^1y is precomputed as the attribute `alpha_`. # (Notice the underscore). # This has been described in Eq 2.24 of # http://www.gaussianprocess.org/gpml/chapters/RW2.pdf # Hence this hack if isinstance(self.kernel_, WhiteKernel): self.kernel_.set_params(noise_level=0.0) else: white_present, white_param = _param_for_white_kernel_in_Sum( self.kernel_) # This should always be true. Just in case. if white_present: noise_kernel = self.kernel_.get_params()[white_param] self.noise_ = noise_kernel.noise_level self.kernel_.set_params( **{white_param: WhiteKernel(noise_level=0.0)}) # Precompute arrays needed at prediction L_inv = solve_triangular(self.L_.T, np.eye(self.L_.shape[0])) self.K_inv_ = L_inv.dot(L_inv.T) # Fix deprecation warning #462 if int(sklearn.__version__[2:4]) >= 19: self.y_train_mean_ = self._y_train_mean else: self.y_train_mean_ = self.y_train_mean return self
def predict(
self, X, return_std=False, return_cov=False, return_mean_grad=False, return_std_grad=False)
Predict output for X.
In addition to the mean of the predictive distribution, also its standard deviation (return_std=True) or covariance (return_cov=True), the gradient of the mean and the standarddeviation with respect to X can be optionally provided.
Parameters

X
[arraylike, shape = (n_samples, n_features)]: Query points where the GP is evaluated. 
return_std
[bool, default: False]: If True, the standarddeviation of the predictive distribution at the query points is returned along with the mean. 
return_cov
[bool, default: False]: If True, the covariance of the joint predictive distribution at the query points is returned along with the mean. 
return_mean_grad
[bool, default: False]: Whether or not to return the gradient of the mean. Only valid when X is a single point. 
return_std_grad
[bool, default: False]: Whether or not to return the gradient of the std. Only valid when X is a single point.
Returns

y_mean
[array, shape = (n_samples, [n_output_dims]): Mean of predictive distribution a query points 
y_std
[array, shape = (n_samples,), optional]: Standard deviation of predictive distribution at query points. Only returned when return_std is True. 
y_cov
[array, shape = (n_samples, n_samples), optional]: Covariance of joint predictive distribution a query points. Only returned when return_cov is True. 
y_mean_grad
[shape = (n_samples, n_features)]: The gradient of the predicted mean 
y_std_grad
[shape = (n_samples, n_features)]: The gradient of the predicted std.
def predict(self, X, return_std=False, return_cov=False, return_mean_grad=False, return_std_grad=False): """ Predict output for X. In addition to the mean of the predictive distribution, also its standard deviation (return_std=True) or covariance (return_cov=True), the gradient of the mean and the standarddeviation with respect to X can be optionally provided. Parameters  * `X` [arraylike, shape = (n_samples, n_features)]: Query points where the GP is evaluated. * `return_std` [bool, default: False]: If True, the standarddeviation of the predictive distribution at the query points is returned along with the mean. * `return_cov` [bool, default: False]: If True, the covariance of the joint predictive distribution at the query points is returned along with the mean. * `return_mean_grad` [bool, default: False]: Whether or not to return the gradient of the mean. Only valid when X is a single point. * `return_std_grad` [bool, default: False]: Whether or not to return the gradient of the std. Only valid when X is a single point. Returns  * `y_mean` [array, shape = (n_samples, [n_output_dims]): Mean of predictive distribution a query points * `y_std` [array, shape = (n_samples,), optional]: Standard deviation of predictive distribution at query points. Only returned when return_std is True. * `y_cov` [array, shape = (n_samples, n_samples), optional]: Covariance of joint predictive distribution a query points. Only returned when return_cov is True. * `y_mean_grad` [shape = (n_samples, n_features)]: The gradient of the predicted mean * `y_std_grad` [shape = (n_samples, n_features)]: The gradient of the predicted std. """ if return_std and return_cov: raise RuntimeError( "Not returning standard deviation of predictions when " "returning full covariance.") if return_std_grad and not return_std: raise ValueError( "Not returning std_gradient without returning " "the std.") X = check_array(X) if X.shape[0] != 1 and (return_mean_grad or return_std_grad): raise ValueError("Not implemented for n_samples > 1") if not hasattr(self, "X_train_"): # Not fit; predict based on GP prior y_mean = np.zeros(X.shape[0]) if return_cov: y_cov = self.kernel(X) return y_mean, y_cov elif return_std: y_var = self.kernel.diag(X) return y_mean, np.sqrt(y_var) else: return y_mean else: # Predict based on GP posterior K_trans = self.kernel_(X, self.X_train_) y_mean = K_trans.dot(self.alpha_) # Line 4 (y_mean = f_star) y_mean = self.y_train_mean_ + y_mean # undo normal. if return_cov: v = cho_solve((self.L_, True), K_trans.T) # Line 5 y_cov = self.kernel_(X)  K_trans.dot(v) # Line 6 return y_mean, y_cov elif return_std: K_inv = self.K_inv_ # Compute variance of predictive distribution y_var = self.kernel_.diag(X) y_var = np.einsum("ki,kj,ij>k", K_trans, K_trans, K_inv) # Check if any of the variances is negative because of # numerical issues. If yes: set the variance to 0. y_var_negative = y_var < 0 if np.any(y_var_negative): warnings.warn("Predicted variances smaller than 0. " "Setting those variances to 0.") y_var[y_var_negative] = 0.0 y_std = np.sqrt(y_var) if return_mean_grad: grad = self.kernel_.gradient_x(X[0], self.X_train_) grad_mean = np.dot(grad.T, self.alpha_) if return_std_grad: grad_std = np.zeros(X.shape[1]) if not np.allclose(y_std, grad_std): grad_std = np.dot(K_trans, np.dot(K_inv, grad))[0] / y_std return y_mean, y_std, grad_mean, grad_std if return_std: return y_mean, y_std, grad_mean else: return y_mean, grad_mean else: if return_std: return y_mean, y_std else: return y_mean
Instance variables
var noise
var rng
DEPRECATED: Attribute rng was deprecated in version 0.19 and will be removed in 0.21.
var y_train_mean
DEPRECATED: Attribute y_train_mean was deprecated in version 0.19 and will be removed in 0.21.
class GradientBoostingQuantileRegressor
Predict several quantiles with one estimator.
This is a wrapper around GradientBoostingRegressor
's quantile
regression that allows you to predict several quantiles
in
one go.
Parameters

quantiles
[arraylike]: Quantiles to predict. By default the 16, 50 and 84% quantiles are predicted. 
base_estimator
[GradientBoostingRegressor instance or None (default)]: Quantile regressor used to make predictions. Only instances ofGradientBoostingRegressor
are supported. Use this to change the hyperparameters of the estimator. 
n_jobs
[int, default=1]: The number of jobs to run in parallel forfit
. If 1, then the number of jobs is set to the number of cores. 
random_state
[int, RandomState instance, or None (default)]: Set random state to something other than None for reproducible results.
class GradientBoostingQuantileRegressor(BaseEstimator, RegressorMixin): """Predict several quantiles with one estimator. This is a wrapper around `GradientBoostingRegressor`'s quantile regression that allows you to predict several `quantiles` in one go. Parameters  * `quantiles` [arraylike]: Quantiles to predict. By default the 16, 50 and 84% quantiles are predicted. * `base_estimator` [GradientBoostingRegressor instance or None (default)]: Quantile regressor used to make predictions. Only instances of `GradientBoostingRegressor` are supported. Use this to change the hyperparameters of the estimator. * `n_jobs` [int, default=1]: The number of jobs to run in parallel for `fit`. If 1, then the number of jobs is set to the number of cores. * `random_state` [int, RandomState instance, or None (default)]: Set random state to something other than None for reproducible results. """ def __init__(self, quantiles=[0.16, 0.5, 0.84], base_estimator=None, n_jobs=1, random_state=None): self.quantiles = quantiles self.random_state = random_state self.base_estimator = base_estimator self.n_jobs = n_jobs def fit(self, X, y): """Fit one regressor for each quantile. Parameters  * `X` [arraylike, shape=(n_samples, n_features)]: Training vectors, where `n_samples` is the number of samples and `n_features` is the number of features. * `y` [arraylike, shape=(n_samples,)]: Target values (real numbers in regression) """ rng = check_random_state(self.random_state) if self.base_estimator is None: base_estimator = GradientBoostingRegressor(loss='quantile') else: base_estimator = self.base_estimator if not isinstance(base_estimator, GradientBoostingRegressor): raise ValueError('base_estimator has to be of type' ' GradientBoostingRegressor.') if not base_estimator.loss == 'quantile': raise ValueError('base_estimator has to use quantile' ' loss not %s' % base_estimator.loss) # The predictions for different quantiles should be sorted. # Therefore each of the regressors need the same seed. base_estimator.set_params(random_state=rng) regressors = [] for q in self.quantiles: regressor = clone(base_estimator) regressor.set_params(alpha=q) regressors.append(regressor) self.regressors_ = Parallel(n_jobs=self.n_jobs, backend='threading')( delayed(_parallel_fit)(regressor, X, y) for regressor in regressors) return self def predict(self, X, return_std=False, return_quantiles=False): """Predict. Predict `X` at every quantile if `return_std` is set to False. If `return_std` is set to True, then return the mean and the predicted standard deviation, which is approximated as the (0.84th quantile  0.16th quantile) divided by 2.0 Parameters  * `X` [arraylike, shape=(n_samples, n_features)]: where `n_samples` is the number of samples and `n_features` is the number of features. """ predicted_quantiles = np.asarray( [rgr.predict(X) for rgr in self.regressors_]) if return_quantiles: return predicted_quantiles.T elif return_std: std_quantiles = [0.16, 0.5, 0.84] is_present_mask = np.in1d(std_quantiles, self.quantiles) if not np.all(is_present_mask): raise ValueError( "return_std works only if the quantiles during " "instantiation include 0.16, 0.5 and 0.84") low = self.regressors_[self.quantiles.index(0.16)].predict(X) high = self.regressors_[self.quantiles.index(0.84)].predict(X) mean = self.regressors_[self.quantiles.index(0.5)].predict(X) return mean, ((high  low) / 2.0) # return the mean return self.regressors_[self.quantiles.index(0.5)].predict(X)
Ancestors (in MRO)
 GradientBoostingQuantileRegressor
 sklearn.base.BaseEstimator
 sklearn.base.RegressorMixin
 builtins.object
Static methods
def __init__(
self, quantiles=[0.16, 0.5, 0.84], base_estimator=None, n_jobs=1, random_state=None)
Initialize self. See help(type(self)) for accurate signature.
def __init__(self, quantiles=[0.16, 0.5, 0.84], base_estimator=None, n_jobs=1, random_state=None): self.quantiles = quantiles self.random_state = random_state self.base_estimator = base_estimator self.n_jobs = n_jobs
def fit(
self, X, y)
Fit one regressor for each quantile.
Parameters

X
[arraylike, shape=(n_samples, n_features)]: Training vectors, wheren_samples
is the number of samples andn_features
is the number of features. 
y
[arraylike, shape=(n_samples,)]: Target values (real numbers in regression)
def fit(self, X, y): """Fit one regressor for each quantile. Parameters  * `X` [arraylike, shape=(n_samples, n_features)]: Training vectors, where `n_samples` is the number of samples and `n_features` is the number of features. * `y` [arraylike, shape=(n_samples,)]: Target values (real numbers in regression) """ rng = check_random_state(self.random_state) if self.base_estimator is None: base_estimator = GradientBoostingRegressor(loss='quantile') else: base_estimator = self.base_estimator if not isinstance(base_estimator, GradientBoostingRegressor): raise ValueError('base_estimator has to be of type' ' GradientBoostingRegressor.') if not base_estimator.loss == 'quantile': raise ValueError('base_estimator has to use quantile' ' loss not %s' % base_estimator.loss) # The predictions for different quantiles should be sorted. # Therefore each of the regressors need the same seed. base_estimator.set_params(random_state=rng) regressors = [] for q in self.quantiles: regressor = clone(base_estimator) regressor.set_params(alpha=q) regressors.append(regressor) self.regressors_ = Parallel(n_jobs=self.n_jobs, backend='threading')( delayed(_parallel_fit)(regressor, X, y) for regressor in regressors) return self
def predict(
self, X, return_std=False, return_quantiles=False)
Predict.
Predict X
at every quantile if return_std
is set to False.
If return_std
is set to True, then return the mean
and the predicted standard deviation, which is approximated as
the (0.84th quantile  0.16th quantile) divided by 2.0
Parameters
X
[arraylike, shape=(n_samples, n_features)]: wheren_samples
is the number of samples andn_features
is the number of features.
def predict(self, X, return_std=False, return_quantiles=False): """Predict. Predict `X` at every quantile if `return_std` is set to False. If `return_std` is set to True, then return the mean and the predicted standard deviation, which is approximated as the (0.84th quantile  0.16th quantile) divided by 2.0 Parameters  * `X` [arraylike, shape=(n_samples, n_features)]: where `n_samples` is the number of samples and `n_features` is the number of features. """ predicted_quantiles = np.asarray( [rgr.predict(X) for rgr in self.regressors_]) if return_quantiles: return predicted_quantiles.T elif return_std: std_quantiles = [0.16, 0.5, 0.84] is_present_mask = np.in1d(std_quantiles, self.quantiles) if not np.all(is_present_mask): raise ValueError( "return_std works only if the quantiles during " "instantiation include 0.16, 0.5 and 0.84") low = self.regressors_[self.quantiles.index(0.16)].predict(X) high = self.regressors_[self.quantiles.index(0.84)].predict(X) mean = self.regressors_[self.quantiles.index(0.5)].predict(X) return mean, ((high  low) / 2.0) # return the mean return self.regressors_[self.quantiles.index(0.5)].predict(X)
Instance variables
var base_estimator
var n_jobs
var quantiles
var random_state
class RandomForestRegressor
RandomForestRegressor that supports conditional std computation.
Parameters
n_estimators : integer, optional (default=10) The number of trees in the forest.
criterion : string, optional (default="mse") The function to measure the quality of a split. Supported criteria are "mse" for the mean squared error, which is equal to variance reduction as feature selection criterion, and "mae" for the mean absolute error.
max_features : int, float, string or None, optional (default="auto")
The number of features to consider when looking for the best split:
 If int, then consider max_features
features at each split.
 If float, then max_features
is a percentage and
int(max_features * n_features)
features are considered at each
split.
 If "auto", then max_features=n_features
.
 If "sqrt", then max_features=sqrt(n_features)
.
 If "log2", then max_features=log2(n_features)
.
 If None, then max_features=n_features
.
Note: the search for a split does not stop until at least one
valid partition of the node samples is found, even if it requires to
effectively inspect more than max_features
features.
max_depth : integer or None, optional (default=None) The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split : int, float, optional (default=2)
The minimum number of samples required to split an internal node:
 If int, then consider min_samples_split
as the minimum number.
 If float, then min_samples_split
is a percentage and
ceil(min_samples_split * n_samples)
are the minimum
number of samples for each split.
min_samples_leaf : int, float, optional (default=1)
The minimum number of samples required to be at a leaf node:
 If int, then consider min_samples_leaf
as the minimum number.
 If float, then min_samples_leaf
is a percentage and
ceil(min_samples_leaf * n_samples)
are the minimum
number of samples for each node.
min_weight_fraction_leaf : float, optional (default=0.) The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
max_leaf_nodes : int or None, optional (default=None)
Grow trees with max_leaf_nodes
in bestfirst fashion.
Best nodes are defined as relative reduction in impurity.
If None then unlimited number of leaf nodes.
min_impurity_decrease : float, optional (default=0.)
A node will be split if this split induces a decrease of the impurity
greater than or equal to this value.
The weighted impurity decrease equation is the following::
N_t / N * (impurity  N_t_R / N_t * right_impurity
 N_t_L / N_t * left_impurity)
where N
is the total number of samples, N_t
is the number of
samples at the current node, N_t_L
is the number of samples in the
left child, and N_t_R
is the number of samples in the right child.
N
, N_t
, N_t_R
and N_t_L
all refer to the weighted sum,
if sample_weight
is passed.
bootstrap : boolean, optional (default=True) Whether bootstrap samples are used when building trees.
oob_score : bool, optional (default=False) whether to use outofbag samples to estimate the R^2 on unseen data.
n_jobs : integer, optional (default=1)
The number of jobs to run in parallel for both fit
and predict
.
If 1, then the number of jobs is set to the number of cores.
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by np.random
.
verbose : int, optional (default=0) Controls the verbosity of the tree building process.
warm_start : bool, optional (default=False)
When set to True
, reuse the solution of the previous call to fit
and add more estimators to the ensemble, otherwise, just fit a whole
new forest.
Attributes
estimators_ : list of DecisionTreeRegressor The collection of fitted subestimators.
feature_importances_ : array of shape = [n_features] The feature importances (the higher, the more important the feature).
n_features_ : int
The number of features when fit
is performed.
n_outputs_ : int
The number of outputs when fit
is performed.
oob_score_ : float Score of the training dataset obtained using an outofbag estimate.
oob_prediction_ : array of shape = [n_samples] Prediction computed with outofbag estimate on the training set.
Notes
The default values for the parameters controlling the size of the trees
(e.g. max_depth
, min_samples_leaf
, etc.) lead to fully grown and
unpruned trees which can potentially be very large on some data sets. To
reduce memory consumption, the complexity and size of the trees should be
controlled by setting those parameter values.
The features are always randomly permuted at each split. Therefore,
the best found split may vary, even with the same training data,
max_features=n_features
and bootstrap=False
, if the improvement
of the criterion is identical for several splits enumerated during the
search of the best split. To obtain a deterministic behaviour during
fitting, random_state
has to be fixed.
References
.. [1] L. Breiman, "Random Forests", Machine Learning, 45(1), 532, 2001.
class RandomForestRegressor(_sk_RandomForestRegressor): """ RandomForestRegressor that supports conditional std computation. Parameters  n_estimators : integer, optional (default=10) The number of trees in the forest. criterion : string, optional (default="mse") The function to measure the quality of a split. Supported criteria are "mse" for the mean squared error, which is equal to variance reduction as feature selection criterion, and "mae" for the mean absolute error. max_features : int, float, string or None, optional (default="auto") The number of features to consider when looking for the best split:  If int, then consider `max_features` features at each split.  If float, then `max_features` is a percentage and `int(max_features * n_features)` features are considered at each split.  If "auto", then `max_features=n_features`.  If "sqrt", then `max_features=sqrt(n_features)`.  If "log2", then `max_features=log2(n_features)`.  If None, then `max_features=n_features`. Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than ``max_features`` features. max_depth : integer or None, optional (default=None) The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. min_samples_split : int, float, optional (default=2) The minimum number of samples required to split an internal node:  If int, then consider `min_samples_split` as the minimum number.  If float, then `min_samples_split` is a percentage and `ceil(min_samples_split * n_samples)` are the minimum number of samples for each split. min_samples_leaf : int, float, optional (default=1) The minimum number of samples required to be at a leaf node:  If int, then consider `min_samples_leaf` as the minimum number.  If float, then `min_samples_leaf` is a percentage and `ceil(min_samples_leaf * n_samples)` are the minimum number of samples for each node. min_weight_fraction_leaf : float, optional (default=0.) The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided. max_leaf_nodes : int or None, optional (default=None) Grow trees with ``max_leaf_nodes`` in bestfirst fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes. min_impurity_decrease : float, optional (default=0.) A node will be split if this split induces a decrease of the impurity greater than or equal to this value. The weighted impurity decrease equation is the following:: N_t / N * (impurity  N_t_R / N_t * right_impurity  N_t_L / N_t * left_impurity) where ``N`` is the total number of samples, ``N_t`` is the number of samples at the current node, ``N_t_L`` is the number of samples in the left child, and ``N_t_R`` is the number of samples in the right child. ``N``, ``N_t``, ``N_t_R`` and ``N_t_L`` all refer to the weighted sum, if ``sample_weight`` is passed. bootstrap : boolean, optional (default=True) Whether bootstrap samples are used when building trees. oob_score : bool, optional (default=False) whether to use outofbag samples to estimate the R^2 on unseen data. n_jobs : integer, optional (default=1) The number of jobs to run in parallel for both `fit` and `predict`. If 1, then the number of jobs is set to the number of cores. random_state : int, RandomState instance or None, optional (default=None) If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by `np.random`. verbose : int, optional (default=0) Controls the verbosity of the tree building process. warm_start : bool, optional (default=False) When set to ``True``, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest. Attributes  estimators_ : list of DecisionTreeRegressor The collection of fitted subestimators. feature_importances_ : array of shape = [n_features] The feature importances (the higher, the more important the feature). n_features_ : int The number of features when ``fit`` is performed. n_outputs_ : int The number of outputs when ``fit`` is performed. oob_score_ : float Score of the training dataset obtained using an outofbag estimate. oob_prediction_ : array of shape = [n_samples] Prediction computed with outofbag estimate on the training set. Notes  The default values for the parameters controlling the size of the trees (e.g. ``max_depth``, ``min_samples_leaf``, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values. The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data, ``max_features=n_features`` and ``bootstrap=False``, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a deterministic behaviour during fitting, ``random_state`` has to be fixed. References  .. [1] L. Breiman, "Random Forests", Machine Learning, 45(1), 532, 2001. """ def __init__(self, n_estimators=10, criterion='mse', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, min_variance=0.0): self.min_variance = min_variance super(RandomForestRegressor, self).__init__( n_estimators=n_estimators, criterion=criterion, max_depth=max_depth, min_samples_split=min_samples_split, min_samples_leaf=min_samples_leaf, min_weight_fraction_leaf=min_weight_fraction_leaf, max_features=max_features, max_leaf_nodes=max_leaf_nodes, bootstrap=bootstrap, oob_score=oob_score, n_jobs=n_jobs, random_state=random_state, verbose=verbose, warm_start=warm_start) def predict(self, X, return_std=False): """Predict continuous output for X. Parameters  X : array of shape = (n_samples, n_features) Input data. return_std : boolean Whether or not to return the standard deviation. Returns  predictions : arraylike of shape = (n_samples,) Predicted values for X. If criterion is set to "mse", then `predictions[i] ~= mean(y  X[i])`. std : arraylike of shape=(n_samples,) Standard deviation of `y` at `X`. If criterion is set to "mse", then `std[i] ~= std(y  X[i])`. """ mean = super(RandomForestRegressor, self).predict(X) if return_std: if self.criterion != "mse": raise ValueError( "Expected impurity to be 'mse', got %s instead" % self.criterion) std = _return_std(X, self.estimators_, mean, self.min_variance) return mean, std return mean
Ancestors (in MRO)
 RandomForestRegressor
 sklearn.ensemble.forest.RandomForestRegressor
 sklearn.ensemble.forest.ForestRegressor
 abc.NewBase
 sklearn.ensemble.forest.BaseForest
 abc.NewBase
 sklearn.ensemble.base.BaseEnsemble
 abc.NewBase
 sklearn.base.BaseEstimator
 sklearn.base.MetaEstimatorMixin
 sklearn.base.RegressorMixin
 builtins.object
Static methods
def __init__(
self, n_estimators=10, criterion='mse', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, min_variance=0.0)
Initialize self. See help(type(self)) for accurate signature.
def __init__(self, n_estimators=10, criterion='mse', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, min_variance=0.0): self.min_variance = min_variance super(RandomForestRegressor, self).__init__( n_estimators=n_estimators, criterion=criterion, max_depth=max_depth, min_samples_split=min_samples_split, min_samples_leaf=min_samples_leaf, min_weight_fraction_leaf=min_weight_fraction_leaf, max_features=max_features, max_leaf_nodes=max_leaf_nodes, bootstrap=bootstrap, oob_score=oob_score, n_jobs=n_jobs, random_state=random_state, verbose=verbose, warm_start=warm_start)
def predict(
self, X, return_std=False)
Predict continuous output for X.
Parameters
X : array of shape = (n_samples, n_features) Input data.
return_std : boolean Whether or not to return the standard deviation.
Returns
predictions : arraylike of shape = (n_samples,)
Predicted values for X. If criterion is set to "mse",
then predictions[i] ~= mean(y  X[i])
.
std : arraylike of shape=(n_samples,)
Standard deviation of y
at X
. If criterion
is set to "mse", then std[i] ~= std(y  X[i])
.
def predict(self, X, return_std=False): """Predict continuous output for X. Parameters  X : array of shape = (n_samples, n_features) Input data. return_std : boolean Whether or not to return the standard deviation. Returns  predictions : arraylike of shape = (n_samples,) Predicted values for X. If criterion is set to "mse", then `predictions[i] ~= mean(y  X[i])`. std : arraylike of shape=(n_samples,) Standard deviation of `y` at `X`. If criterion is set to "mse", then `std[i] ~= std(y  X[i])`. """ mean = super(RandomForestRegressor, self).predict(X) if return_std: if self.criterion != "mse": raise ValueError( "Expected impurity to be 'mse', got %s instead" % self.criterion) std = _return_std(X, self.estimators_, mean, self.min_variance) return mean, std return mean
Instance variables
var feature_importances_
Return the feature importances (the higher, the more important the feature).
Returns
feature_importances_ : array, shape = [n_features]
var min_variance