Kernel Principal Component Analysis (KPCA)¶

../../../../_images/PCA.svg

Non-linear dimensionality reduction through the use of kernels

Documentation¶

Attributes¶

alphas_

dual_coef_
Inverse transform matrix. Only available when fit_inverse_transform is True.

lambdas_

X_fit_
The data used to fit the model. If copy_X=False, then X_fit_ is a reference. This attribute is used for the calls to transform.

X_transformed_fit_
Projection of the fitted data on the kernel principal components. Only available when fit_inverse_transform is True.

Definition¶

Output ports¶

model model
Model

Configuration¶

Ridge regression hyperparameter (alpha)
Hyperparameter of the ridge regression that learns the inverse transform (when fit_inverse_transform=True).

Independent term (poly, sigmoid) (coef0)
Independent term in poly and sigmoid kernels. Ignored by other kernels.

Poly kernel degree (degree)
Degree for poly kernels. Ignored by other kernels.

Eigensolver (eigen_solver)
Select eigensolver to use. If n_components is much less than the number of training samples, randomized (or arpack to a smaller extent) may be more efficient than the dense eigensolver. Randomized SVD is performed according to the method of Halko et al _.

auto :
the solver is selected by a default policy based on n_samples (the number of training samples) and n_components: if the number of components to extract is less than 10 (strict) and the number of samples is more than 200 (strict), the ‘arpack’ method is enabled. Otherwise the exact full eigenvalue decomposition is computed and optionally truncated afterwards (‘dense’ method).

dense :
run exact full eigenvalue decomposition calling the standard LAPACK solver via scipy.linalg.eigh, and select the components by postprocessing

arpack :
run SVD truncated to n_components calling ARPACK solver using scipy.sparse.linalg.eigsh. It requires strictly 0 < n_components < n_samples

randomized :
run randomized SVD by the method of Halko et al. _. The current implementation selects eigenvalues based on their module; therefore using this method can lead to unexpected results if the kernel is not positive semi-definite. See also _.

Changed in version 1.0: ‘randomized’ was added.

Fit inverse-transform (fit_inverse_transform)
Learn the inverse transform for non-precomputed kernels (i.e. learn to find the pre-image of a point). This method is based on _.

Kernel coefficient (poly, rbf, sigmoid) (gamma)
Kernel coefficient for rbf, poly and sigmoid kernels. Ignored by other kernels. If gamma is None, then it is set to 1/n_features.

Kernel (kernel)
Kernel used for PCA.

Max iteratins (max_iter)
Maximum number of iterations for arpack. If None, optimal value will be chosen by arpack.

Number of components (n_components)
Number of components. If None, all non-zero components are kept.

number of jobs (n_jobs)
The number of parallel jobs to run. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See n_jobs for more details.

Added in version 0.18.

Random seed (random_state)
Used when eigen_solver == ‘arpack’ or ‘randomized’. Pass an int for reproducible results across multiple function calls. See random_state.

Added in version 0.18.

Remove components with zero eigenvalue (remove_zero_eig)
If True, then all components with zero eigenvalues are removed, so that the number of components in the output may be < n_components (and sometimes even zero due to numerical instability). When n_components is None, this parameter is ignored and components with zero eigenvalues are removed regardless.

Tolerance (tol)
Convergence tolerance for arpack. If 0, optimal value will be chosen by arpack.

Implementation¶

class node_decomposition.KernelPCA[source]