.. _`Isolation Forest`: .. _`org.sysess.sympathy.machinelearning.isolation_forest`: Isolation Forest ~~~~~~~~~~~~~~~~ .. image:: isolation_forest.svg :width: 48 Predicts outliers based on minimum path length of random trees with single nodes in the leafs. *Configuration*: - *n_estimators* The number of base estimators in the ensemble. - *max_samples* The number of samples to draw from X to train each base estimator expressed as number of samples (int), or a fraction of all samples (float). If "auto" then a maximum of 256 samples will be used (less when fewer input samples given) - *contamination* The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. If 'auto', the decision function threshold is determined as in the original paper. .. versionchanged:: 0.20 The default value of ``contamination`` will change from 0.1 in 0.20 to ``'auto'`` in 0.22. - *max_features* The number of features to draw from X to train each base estimator. - If int, then draw `max_features` features. - If float, then draw `max_features * X.shape` features. - *bootstrap* If True, individual trees are fit on random subsets of the training data sampled with replacement. If False, sampling without replacement is performed. - *n_jobs* The number of jobs to run in parallel for both `fit` and `predict`. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See n_jobs for more details. - *random_state* If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by `np.random`. *Attributes*: - *estimators_samples_* The subset of drawn samples (i.e., the in-bag samples) for each base estimator. - *max_samples_* The actual number of samples *Input ports*: *Output ports*: **model** : model Model **n_estimators** (n_estimators) The number of base estimators in the ensemble. **max_samples** (max_samples) The number of samples to draw from X to train each base estimator expressed as number of samples (int), or a fraction of all samples (float). If "auto" then a maximum of 256 samples will be used (less when fewer input samples given) **contamination** (contamination) The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. If 'auto', the decision function threshold is determined as in the original paper. .. versionchanged:: 0.20 The default value of ``contamination`` will change from 0.1 in 0.20 to ``'auto'`` in 0.22. **max_features** (max_features) The number of features to draw from X to train each base estimator. - If int, then draw `max_features` features. - If float, then draw `max_features * X.shape` features. **bootstrap** (bootstrap) If True, individual trees are fit on random subsets of the training data sampled with replacement. If False, sampling without replacement is performed. **n_jobs** (n_jobs) The number of jobs to run in parallel for both `fit` and `predict`. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See n_jobs for more details. **random_state** (random_state) If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by `np.random`. .. automodule:: node_isolationforest .. class:: IsolationForest