One-Hot Encoder

../../../../_images/label_binarizer.svg

Encode categorical integer features using a one-hot aka one-of-K scheme.

For each categorical input feature, a number of output features will be given of which exactly one is marked as true and the rest as false. This encoding is needed for feeding categorical data to many scikit-learn estimators, notably linear models and SVMs with the standard kernels. Note: a one-hot encoding of y labels should use a LabelBinarizer instead.

Configuration:

  • n_values

    Number of values per feature.

    • ‘auto’ : determine value range from training data.

    • int : number of categorical values per feature.

      Each feature value should be in range(n_values)

    • array : n_values[i] is the number of categorical values in

      X[:, i]. Each feature value should be in range(n_values[i])

    Deprecated since version 0.20: The n_values keyword was deprecated in version 0.20 and will be removed in 0.22. Use categories instead.

  • categorical_features

    Specify what features are treated as categorical.

    • ‘all’: All features are treated as categorical.
    • array of indices: Array of categorical feature indices.
    • mask: Array of length n_features and with dtype=bool.

    Non-categorical features are always stacked to the right of the matrix.

    Deprecated since version 0.20: The categorical_features keyword was deprecated in version 0.20 and will be removed in 0.22. You can use the ColumnTransformer instead.

  • handle_unknown

    How to handle unknown categories during (non-fit) transform

  • sparse

    Will generate sparse matrix if true. Warning: sparse matrices are not handled by all Sympathy nodes and may be silently converted to non-sparse arrays

Attributes:

  • active_features_

    Indices for active features, meaning values that actually occur in the training set. Only available when n_values is 'auto'.

    Deprecated since version 0.20: The active_features_ attribute was deprecated in version 0.20 and will be removed in 0.22.

  • feature_indices_

    Indices to feature ranges. Feature i in the original data is mapped to features from feature_indices_[i] to feature_indices_[i+1] (and then potentially masked by active_features_ afterwards)

    Deprecated since version 0.20: The feature_indices_ attribute was deprecated in version 0.20 and will be removed in 0.22.

  • n_values_

    Maximum number of values per feature.

    Deprecated since version 0.20: The n_values_ attribute was deprecated in version 0.20 and will be removed in 0.22.

  • categories_

    The categories of each feature determined during fitting (in order of the features in X and corresponding with the output of transform).

Input ports:

Output ports:
model : model
Model
n_values (n_values)

Number of values per feature.

  • ‘auto’ : determine value range from training data.

  • int : number of categorical values per feature.

    Each feature value should be in range(n_values)

  • array : n_values[i] is the number of categorical values in

    X[:, i]. Each feature value should be in range(n_values[i])

Deprecated since version 0.20: The n_values keyword was deprecated in version 0.20 and will be removed in 0.22. Use categories instead.

categorical_features (categorical_features)

Specify what features are treated as categorical.

  • ‘all’: All features are treated as categorical.
  • array of indices: Array of categorical feature indices.
  • mask: Array of length n_features and with dtype=bool.

Non-categorical features are always stacked to the right of the matrix.

Deprecated since version 0.20: The categorical_features keyword was deprecated in version 0.20 and will be removed in 0.22. You can use the ColumnTransformer instead.

handle_unknown (handle_unknown)
How to handle unknown categories during (non-fit) transform
sparse (sparse)
Will generate sparse matrix if true. Warning: sparse matrices are not handled by all Sympathy nodes and may be silently converted to non-sparse arrays

Some of the docstrings for this module have been automatically extracted from the scikit-learn library and are covered by their respective licenses.

class node_preprocessing.OneHotEncoder[source]