Isolation Forest

../../../../_images/isolation_forest.svg

Predicts outliers based on minimum path length of random trees with single nodes in the leafs.

Documentation

Attributes

estimators_samples_

The subset of drawn samples (i.e., the in-bag samples) for each base estimator.

max_samples_

The actual number of samples.

Definition

Output ports

model model

Model

Configuration

Bootstrap (bootstrap)

If True, individual trees are fit on random subsets of the training data sampled with replacement. If False, sampling without replacement is performed.

Contamination (contamination)

The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the scores of the samples.

  • If ‘auto’, the threshold is determined as in the original paper.

  • If float, the contamination should be in the range (0, 0.5].

Changed in version 0.22: The default value of contamination changed from 0.1 to 'auto'.

Number of features (max_features)

The number of features to draw from X to train each base estimator.

  • If int, then draw max_features features.

  • If float, then draw max(1, int(max_features * n_features_in_)) features.

Note: using a float number less than 1.0 or integer less than number of features will enable feature subsampling and leads to a longer runtime.

Number of samples (max_samples)

The number of samples to draw from X to train each base estimator expressed as number of samples (int), or a fraction of all samples (float). If “auto” then a maximum of 256 samples will be used (less when fewer input samples given)

Number of estimators (n_estimators)

The number of base estimators in the ensemble.

Number of jobs (n_jobs)

The number of jobs to run in parallel for both fit() and predict(). None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See n_jobs for more details.

Random seed (random_state)

Controls the pseudo-randomness of the selection of the feature and split values for each branching step and each tree in the forest.

Pass an int for reproducible results across multiple function calls. See random_state.

Implementation

class node_isolationforest.IsolationForest[source]