.. _`Isolation Forest`: .. _`org.sysess.sympathy.machinelearning.isolation_forest`: Isolation Forest ~~~~~~~~~~~~~~~~ .. image:: isolation_forest.svg :width: 48 Predicts outliers based on minimum path length of random trees with single nodes in the leafs. :Configuration: - *n_estimators* The number of base estimators in the ensemble. - *max_samples* The number of samples to draw from X to train each base estimator expressed as number of samples (int), or a fraction of all samples (float). If "auto" then a maximum of 256 samples will be used (less when fewer input samples given) - *contamination* The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. - *max_features* The number of features to draw from X to train each base estimator. - If int, then draw `max_features` features. - If float, then draw `max_features * X.shape` features. - *bootstrap* If True, individual trees are fit on random subsets of the training data sampled with replacement. If False, sampling without replacement is performed. - *n_jobs* The number of jobs to run in parallel for both `fit` and `predict`. If -1, then the number of jobs is set to the number of cores. - *random_state* If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by `np.random`. :Attributes: - *estimators_samples_* The subset of drawn samples (i.e., the in-bag samples) for each base estimator. - *max_samples_* The actual number of samples :Inputs: :Outputs: **model** : model Model *Ports*: **Outputs**: :model: model Model *Configuration*: **n_estimators** The number of base estimators in the ensemble. **max_samples** The number of samples to draw from X to train each base estimator expressed as number of samples (int), or a fraction of all samples (float). If "auto" then a maximum of 256 samples will be used (less when fewer input samples given) **contamination** The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. **max_features** The number of features to draw from X to train each base estimator. - If int, then draw `max_features` features. - If float, then draw `max_features * X.shape` features. **bootstrap** If True, individual trees are fit on random subsets of the training data sampled with replacement. If False, sampling without replacement is performed. **n_jobs** The number of jobs to run in parallel for both `fit` and `predict`. If -1, then the number of jobs is set to the number of cores. **random_state** If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by `np.random`. .. automodule:: node_isolationforest .. class:: IsolationForest