.. _`One-Hot Encoder`:

.. _`org.sysess.sympathy.machinelearning.one_hot_encoder`:

One-Hot Encoder
~~~~~~~~~~~~~~~

.. image:: label_binarizer.svg
   :width: 48


Encode categorical integer features using a one-hot aka one-of-K scheme.

For each categorical input feature, a number of output features will
be given of which exactly one is marked as true and the rest as
false. This encoding is needed for feeding categorical data to many
scikit-learn estimators, notably linear models and SVMs with the
standard kernels. Note: a one-hot encoding of y labels should use a
LabelBinarizer instead.

*Configuration*:


  - *n_values*

    Number of values per feature.

    - 'auto' : determine value range from training data.
    - int : number of categorical values per feature.
            Each feature value should be in ``range(n_values)``
    - array : ``n_values[i]`` is the number of categorical values in
              ``X[:, i]``. Each feature value should be
              in ``range(n_values[i])``

    .. deprecated:: 0.20
        The `n_values` keyword was deprecated in version 0.20 and will
        be removed in 0.22. Use `categories` instead.


  - *categorical_features*

    Specify what features are treated as categorical.

    - 'all': All features are treated as categorical.
    - array of indices: Array of categorical feature indices.
    - mask: Array of length n_features and with dtype=bool.

    Non-categorical features are always stacked to the right of the matrix.

    .. deprecated:: 0.20
        The `categorical_features` keyword was deprecated in version
        0.20 and will be removed in 0.22.
        You can use the ``ColumnTransformer`` instead.


  - *handle_unknown*

    How to handle unknown categories during (non-fit) transform

  - *sparse*


    Will generate sparse matrix if true.
    Warning: sparse matrices are not handled by all Sympathy nodes and may be
    silently converted to non-sparse arrays


*Attributes*:


  - *active_features_*

    Indices for active features, meaning values that actually occur
    in the training set. Only available when n_values is ``'auto'``.

    .. deprecated:: 0.20
        The ``active_features_`` attribute was deprecated in version
        0.20 and will be removed in 0.22.


  - *feature_indices_*

    Indices to feature ranges.
    Feature ``i`` in the original data is mapped to features
    from ``feature_indices_[i]`` to ``feature_indices_[i+1]``
    (and then potentially masked by ``active_features_`` afterwards)

    .. deprecated:: 0.20
        The ``feature_indices_`` attribute was deprecated in version
        0.20 and will be removed in 0.22.


  - *n_values_*

    Maximum number of values per feature.

    .. deprecated:: 0.20
        The ``n_values_`` attribute was deprecated in version
        0.20 and will be removed in 0.22.


  - *categories_*

    The categories of each feature determined during fitting
    (in order of the features in X and corresponding with the output
    of ``transform``).


*Input ports*:


*Output ports*:
    **model** : model
        Model


**n_values** (n_values)
    Number of values per feature.

    - 'auto' : determine value range from training data.
    - int : number of categorical values per feature.
            Each feature value should be in ``range(n_values)``
    - array : ``n_values[i]`` is the number of categorical values in
              ``X[:, i]``. Each feature value should be
              in ``range(n_values[i])``

    .. deprecated:: 0.20
        The `n_values` keyword was deprecated in version 0.20 and will
        be removed in 0.22. Use `categories` instead.
**categorical_features** (categorical_features)
    Specify what features are treated as categorical.

    - 'all': All features are treated as categorical.
    - array of indices: Array of categorical feature indices.
    - mask: Array of length n_features and with dtype=bool.

    Non-categorical features are always stacked to the right of the matrix.

    .. deprecated:: 0.20
        The `categorical_features` keyword was deprecated in version
        0.20 and will be removed in 0.22.
        You can use the ``ColumnTransformer`` instead.
**handle_unknown** (handle_unknown)
    How to handle unknown categories during (non-fit) transform
**sparse** (sparse)
    Will generate sparse matrix if true.
    Warning: sparse matrices are not handled by all Sympathy nodes and may be
    silently converted to non-sparse arrays

.. automodule:: node_preprocessing

.. class:: OneHotEncoder