chemometrics.PCA

class chemometrics.PCA(n_components=2, *, copy=True, whiten=False, svd_solver='auto', tol=0.0, iterated_power='auto', n_oversamples=10, power_iteration_normalizer='auto', random_state=None)

Bases: PCA, LVmixin

Principal component analysis with added chemometric functionality

Linear factorization of the data matrix X into scores and loadings (=components) similar to a truncated singular value decomposition. Next to the transformer capabilities, PCA provides additionally different metrics on the fitted latent variable model.

__init__(n_components=2, *, copy=True, whiten=False, svd_solver='auto', tol=0.0, iterated_power='auto', n_oversamples=10, power_iteration_normalizer='auto', random_state=None)

Methods

__init__([n_components, copy, whiten, ...])

crit_dhypx([confidence])

Calculate critical dhypx according to Hotelling's T2

crit_dmodx([confidence])

Critical distance to hyperplane based on an F2 test

dhypx(X)

Normalized distance on hyperplane

distance_plot(X[, sample_id, confidence])

Plot distances colinear and orthogonal to model predictor hyperplane

dmodx(X[, normalize, absolute])

Calculate distance to model hyperplane in X (DModX)

fit(X[, y])

Fit the model with X.

fit_transform(X[, y])

Fit the model with X and apply the dimensionality reduction on X.

get_covariance()

Compute data covariance with the generative model.

get_feature_names_out([input_features])

Get output feature names for transformation.

get_params([deep])

Get parameters for this estimator.

get_precision()

Compute data precision matrix with the generative model.

inverse_transform(X)

Transform data back to its original space.

score(X[, y, scoring])

score_samples(X)

Return the log-likelihood of each sample.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Apply dimensionality reduction to X.

Attributes

x_loadings_

The loadings of X with shape (n_features, n_components).

crit_dhypx(confidence=0.95)

Calculate critical dhypx according to Hotelling’s T2

crit_dmodx(confidence=0.95)

Critical distance to hyperplane based on an F2 test

The critical distance to the model hyperplane is estimated based on an F2 distribution. Values above crit_dmodx may be considered outliers. dmodx is only approximately F2 distributed [Eriksson]. It is thus worthnoting that the estimated critcal distance is biased. It however gives a reasonable indication of points worth investigating.

dhypx(X)

Normalized distance on hyperplane

Provides a distance on the hyperplane, normalized by the distance observed during calibration. It can be a useful measure to see whether new data is comparable to the calibration data. The normalized dhypx is slightly biased towards larger values since the estimated x_residual_std_ is slightly underestimated during model calibration [Eriksson].

distance_plot(X, sample_id=None, confidence=0.95)

Plot distances colinear and orthogonal to model predictor hyperplane

Generates a figure with two subplots. The subplots provide information on how X behaves compared to the calibration data. Subplots: 1) Distance in model hyperplane of predictors. Provides insight into the magnitude of variation within the hyperplane compared to the calibration data. Large values indicate samples which are outside of the calibration space but may be described by linearly scaled latent variables. 2) Distance orthogonal to model hyperplane. Provides insight into the magnitude of variation orthogonal to the model hyperplane compared to the calibration data. Large values indicate samples which show a significant trend not observed in the calibration data.

dmodx(X, normalize=True, absolute=False)

Calculate distance to model hyperplane in X (DModX)

DModX provides the distance to the model hyperplane spanned by the loading vectors. Any information in the predictors that is not captured by the PLS model contributes to DModX. If the DModX is normalized, DModX is devided by the mean residual variance of X observed during model calibration.

Parameters
  • X ((n, m) ndarray) – matrix of predictors. n samples x m predictors

  • normalize ({True (default); False}) – normalization of DModX by error in X during calibration

  • absolute ({True; False (default)}) – return the absolute distance to the model plane (not normalized by degrees of freedom)

Returns

dmodx – distance of n samples to model hyperplane

Return type

(n, ) ndarray

fit(X, y=None)

Fit the model with X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.

  • y (Ignored) – Ignored.

Returns

self – Returns the instance itself.

Return type

object

fit_transform(X, y=None)

Fit the model with X and apply the dimensionality reduction on X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.

  • y (Ignored) – Ignored.

Returns

X_new – Transformed values.

Return type

ndarray of shape (n_samples, n_components)

Notes

This method returns a Fortran-ordered array. To convert it to a C-ordered array, use ‘np.ascontiguousarray’.

get_covariance()

Compute data covariance with the generative model.

cov = components_.T * S**2 * components_ + sigma2 * eye(n_features) where S**2 contains the explained variances, and sigma2 contains the noise variances.

Returns

cov – Estimated covariance of data.

Return type

array of shape=(n_features, n_features)

get_feature_names_out(input_features=None)

Get output feature names for transformation.

Parameters

input_features (array-like of str or None, default=None) – Only used to validate feature names with the names seen in fit().

Returns

feature_names_out – Transformed feature names.

Return type

ndarray of str objects

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

get_precision()

Compute data precision matrix with the generative model.

Equals the inverse of the covariance but computed with the matrix inversion lemma for efficiency.

Returns

precision – Estimated precision of data.

Return type

array, shape=(n_features, n_features)

inverse_transform(X)

Transform data back to its original space.

In other words, return an input X_original whose transform would be X.

Parameters

X (array-like of shape (n_samples, n_components)) – New data, where n_samples is the number of samples and n_components is the number of components.

Returns

Original data, where n_samples is the number of samples and n_features is the number of features.

Return type

X_original array-like of shape (n_samples, n_features)

Notes

If whitening is enabled, inverse_transform will compute the exact inverse operation, which includes reversing whitening.

score(X, y=None, scoring='r2')
score_samples(X)

Return the log-likelihood of each sample.

See. “Pattern Recognition and Machine Learning” by C. Bishop, 12.2.1 p. 574 or http://www.miketipping.com/papers/met-mppca.pdf

Parameters

X (array-like of shape (n_samples, n_features)) – The data.

Returns

ll – Log-likelihood of each sample under the current model.

Return type

ndarray of shape (n_samples,)

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

transform(X)

Apply dimensionality reduction to X.

X is projected on the first principal components previously extracted from a training set.

Parameters

X (array-like of shape (n_samples, n_features)) – New data, where n_samples is the number of samples and n_features is the number of features.

Returns

X_new – Projection of X in the first principal components, where n_samples is the number of samples and n_components is the number of the components.

Return type

array-like of shape (n_samples, n_components)

property x_loadings_

The loadings of X with shape (n_features, n_components).