One-Hot Encoder¶
Encode categorical integer features using a one-hot aka one-of-K scheme.
For each categorical input feature, a number of output features will be given of which exactly one is marked as true and the rest as false. This encoding is needed for feeding categorical data to many scikit-learn estimators, notably linear models and SVMs with the standard kernels. Note: a one-hot encoding of y labels should use a LabelBinarizer instead.
Configuration: |
|
---|---|
Attributes: |
|
Inputs: | |
Outputs: |
|
Ports:
Outputs:
model: model
Model
Configuration:
- n_values
Number of values per feature.
‘auto’ : determine value range from training data.
- int : number of categorical values per feature.
Each feature value should be in
range(n_values)
- array :
n_values[i]
is the number of categorical values in
X[:, i]
. Each feature value should be inrange(n_values[i])
- categorical_features
Specify what features are treated as categorical.
- ‘all’ (default): All features are treated as categorical.
- array of indices: Array of categorical feature indices.
- mask: Array of length n_features and with dtype=bool.
Non-categorical features are always stacked to the right of the matrix.
- handle_unknown
- How to handle unknown categories during (non-fit) transform
- sparse
- Will generate sparse matrix if true. Warning: sparse matrices are not handled by all Sympathy nodes and may be silently converted to non-sparse arrays
Some of the docstrings for this module have been automatically extracted from the scikit-learn library and are covered by their respective licenses.