chemometrics.fit_pca

class chemometrics.fit_pca(X, pipeline=None, cv_object=None, max_lv=10)

Bases:

Auto-calibrate PCA model and generate analytical plots

A PCA model is calibrated based on the maximization of the coefficient of determination during cross-validation (Q2). The function provides multiple plots for assessing the model quality. The first figure addresses the model performance during cross validation and the estimation of optimal number of latent variables by showing R2/Q2 values (R^2 as bars, Q^2 as boxplots based on the individual rotations). The second figure shows four subplots with analytical information for the optimal model. The plotted figures are: a) observed versus predicted 2) predicted versus residuals 3) leverage versus residuals 4) Variable importance in projection (VIP) scores.

Parameters
  • X ((n, m) ndarray) – Matrix of predictors. n samples x m predictors

  • Y ((n, o) ndarray) – Matrix of responses. n samples x o responses

  • pipeline ({None, sklearn.pipeline.Pipeline}) – A pipeline object providing a workflow of preprocessing and a PLSRegression model. The last entry must be a chemometrics.PCA instance.

  • cv_object ({None, cv_object}) – An object providing guidance for cross-validation. Typically, it will be an instance of an sklearn.model_selection.BaseCrossValidator object.

  • max_lv (int) – Number of latent variables up to which the cross-validation score will be screened.

Returns

  • pipeline (Pipeline) – The calibrated model pipeline

  • summary (dict) – Summary of the model calibration.

__init__(**kwargs)