Isolation Forest

../../../../_images/isolation_forest.svg

Predicts outliers based on minimum path length of random trees with single nodes in the leafs.

Configuration:
  • n_estimators

    The number of base estimators in the ensemble.

  • max_samples

    The number of samples to draw from X to train each base estimator expressed as number of samples (int), or a fraction of all samples (float). If “auto” then a maximum of 256 samples will be used (less when fewer input samples given)

  • contamination

    The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.

  • max_features

    The number of features to draw from X to train each base estimator.

    • If int, then draw max_features features.
    • If float, then draw max_features * X.shape features.
  • bootstrap

    If True, individual trees are fit on random subsets of the training data sampled with replacement. If False, sampling without replacement is performed.

  • n_jobs

    The number of jobs to run in parallel for both fit and predict. If -1, then the number of jobs is set to the number of cores.

  • random_state

    If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Attributes:
  • estimators_samples_

    The subset of drawn samples (i.e., the in-bag samples) for each base estimator.

  • max_samples_

    The actual number of samples

Inputs:
Outputs:
model : model

Model

Ports:

Outputs:

model:

model

Model

Configuration:

n_estimators
The number of base estimators in the ensemble.
max_samples
The number of samples to draw from X to train each base estimator expressed as number of samples (int), or a fraction of all samples (float). If “auto” then a maximum of 256 samples will be used (less when fewer input samples given)
contamination
The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.
max_features

The number of features to draw from X to train each base estimator.

  • If int, then draw max_features features.
  • If float, then draw max_features * X.shape features.
bootstrap
If True, individual trees are fit on random subsets of the training data sampled with replacement. If False, sampling without replacement is performed.
n_jobs
The number of jobs to run in parallel for both fit and predict. If -1, then the number of jobs is set to the number of cores.
random_state
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Some of the docstrings for this module have been automatically extracted from the scikit-learn library and are covered by their respective licenses.

class node_isolationforest.IsolationForest[source]