.. _`Logistic Regression`:
.. _`org.sysess.sympathy.machinelearning.logisticregression`:
Logistic Regression
===================
.. image:: logistic_regression.svg
:width: 48
Logistic regression of a categorical dependent variable
**Documentation**
Logistic regression of a categorical dependent variable
*Configuration*:
- *penalty*
Used to specify the norm used in the penalization. The 'newton-cg',
'sag' and 'lbfgs' solvers support only l2 penalties. 'elasticnet' is
only supported by the 'saga' solver. If 'none' (not supported by the
liblinear solver), no regularization is applied.
.. versionadded:: 0.19
l1 penalty with SAGA solver (allowing 'multinomial' + L1)
- *dual*
Dual or primal formulation. Dual formulation is only implemented for
l2 penalty with liblinear solver. Prefer dual=False when
n_samples > n_features.
- *C*
Inverse of regularization strength; must be a positive float.
Like in support vector machines, smaller values specify stronger
regularization.
- *fit_intercept*
Specifies if a constant (a.k.a. bias or intercept) should be
added to the decision function.
- *intercept_scaling*
Useful only when the solver 'liblinear' is used
and self.fit_intercept is set to True. In this case, x becomes
[x, self.intercept_scaling],
i.e. a "synthetic" feature with constant value equal to
intercept_scaling is appended to the instance vector.
The intercept becomes ``intercept_scaling * synthetic_feature_weight``.
Note! the synthetic feature weight is subject to l1/l2 regularization
as all other features.
To lessen the effect of regularization on synthetic feature weight
(and therefore on the intercept) intercept_scaling has to be increased.
- *class_weight*
Weights associated with classes in the form ``{class_label: weight}``.
If not given, all classes are supposed to have weight one.
The "balanced" mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data
as ``n_samples / (n_classes * np.bincount(y))``.
Note that these weights will be multiplied with sample_weight (passed
through the fit method) if sample_weight is specified.
.. versionadded:: 0.17
*class_weight='balanced'*
- *tol*
Tolerance for stopping criteria.
- *multi_class*
If the option chosen is 'ovr', then a binary problem is fit for each
label. For 'multinomial' the loss minimised is the multinomial loss fit
across the entire probability distribution, *even when the data is
binary*. 'multinomial' is unavailable when solver='liblinear'.
'auto' selects 'ovr' if the data is binary, or if solver='liblinear',
and otherwise selects 'multinomial'.
.. versionadded:: 0.18
Stochastic Average Gradient descent solver for 'multinomial' case.
.. versionchanged:: 0.22
Default changed from 'ovr' to 'auto' in 0.22.
- *max_iter*
Maximum number of iterations taken for the solvers to converge.
- *solver*
Algorithm to use in the optimization problem.
- For small datasets, 'liblinear' is a good choice, whereas 'sag' and
'saga' are faster for large ones.
- For multiclass problems, only 'newton-cg', 'sag', 'saga' and 'lbfgs'
handle multinomial loss; 'liblinear' is limited to one-versus-rest
schemes.
- 'newton-cg', 'lbfgs', 'sag' and 'saga' handle L2 or no penalty
- 'liblinear' and 'saga' also handle L1 penalty
- 'saga' also supports 'elasticnet' penalty
- 'liblinear' does not support setting ``penalty='none'``
Note that 'sag' and 'saga' fast convergence is only guaranteed on
features with approximately the same scale. You can
preprocess the data with a scaler from sklearn.preprocessing.
.. versionadded:: 0.17
Stochastic Average Gradient descent solver.
.. versionadded:: 0.19
SAGA solver.
.. versionchanged:: 0.22
The default solver changed from 'liblinear' to 'lbfgs' in 0.22.
- *n_jobs*
Number of CPU cores used when parallelizing over classes if multi_class="ovr". Ignored when the solver is set to "liblinear" regardless of multi_class. If given -1 then all cores are used
- *random_state*
Used when ``solver`` == 'sag', 'saga' or 'liblinear' to shuffle the
data. See random_state for details.
- *warm_start*
When set to True, reuse the solution of the previous call to fit as
initialization, otherwise, just erase the previous solution.
Useless for liblinear solver. See warm_start.
.. versionadded:: 0.17
*warm_start* to support *lbfgs*, *newton-cg*, *sag*, *saga* solvers.
*Attributes*:
- *n_iter_*
Actual number of iterations for all classes. If binary or multinomial,
it returns only 1 element. For liblinear solver, only the maximum
number of iteration across all classes is given.
.. versionchanged:: 0.20
In SciPy <= 1.0.0 the number of lbfgs iterations may exceed
``max_iter``. ``n_iter_`` will now report at most ``max_iter``.
- *coef_*
Coefficient of the features in the decision function.
`coef_` is of shape (1, n_features) when the given problem is binary.
In particular, when `multi_class='multinomial'`, `coef_` corresponds
to outcome 1 (True) and `-coef_` corresponds to outcome 0 (False).
- *intercept_*
Intercept (a.k.a. bias) added to the decision function.
If `fit_intercept` is set to False, the intercept is set to zero.
`intercept_` is of shape (1,) when the given problem is binary.
In particular, when `multi_class='multinomial'`, `intercept_`
corresponds to outcome 1 (True) and `-intercept_` corresponds to
outcome 0 (False).
*Input ports*:
*Output ports*:
**model** : model
Model
**Definition**
*Input ports*
*Output ports*
:model: model
Model
.. automodule:: node_regression
.. class:: LogisticRegression