chemometrics.PCA¶
- class chemometrics.PCA(n_components=2, *, copy=True, whiten=False, svd_solver='auto', tol=0.0, iterated_power='auto', n_oversamples=10, power_iteration_normalizer='auto', random_state=None)¶
Bases:
PCA
,LVmixin
Principal component analysis with added chemometric functionality
Linear factorization of the data matrix X into scores and loadings (=components) similar to a truncated singular value decomposition. Next to the transformer capabilities, PCA provides additionally different metrics on the fitted latent variable model.
- __init__(n_components=2, *, copy=True, whiten=False, svd_solver='auto', tol=0.0, iterated_power='auto', n_oversamples=10, power_iteration_normalizer='auto', random_state=None)¶
Methods
__init__
([n_components, copy, whiten, ...])crit_dhypx
([confidence])Calculate critical dhypx according to Hotelling's T2
crit_dmodx
([confidence])Critical distance to hyperplane based on an F2 test
dhypx
(X)Normalized distance on hyperplane
distance_plot
(X[, sample_id, confidence])Plot distances colinear and orthogonal to model predictor hyperplane
dmodx
(X[, normalize, absolute])Calculate distance to model hyperplane in X (DModX)
fit
(X[, y])Fit the model with X.
fit_transform
(X[, y])Fit the model with X and apply the dimensionality reduction on X.
Compute data covariance with the generative model.
get_feature_names_out
([input_features])Get output feature names for transformation.
get_params
([deep])Get parameters for this estimator.
Compute data precision matrix with the generative model.
Transform data back to its original space.
score
(X[, y, scoring])Return the log-likelihood of each sample.
set_params
(**params)Set the parameters of this estimator.
transform
(X)Apply dimensionality reduction to X.
Attributes
The loadings of X with shape (n_features, n_components).
- crit_dhypx(confidence=0.95)¶
Calculate critical dhypx according to Hotelling’s T2
- crit_dmodx(confidence=0.95)¶
Critical distance to hyperplane based on an F2 test
The critical distance to the model hyperplane is estimated based on an F2 distribution. Values above crit_dmodx may be considered outliers. dmodx is only approximately F2 distributed [Eriksson]. It is thus worthnoting that the estimated critcal distance is biased. It however gives a reasonable indication of points worth investigating.
- dhypx(X)¶
Normalized distance on hyperplane
Provides a distance on the hyperplane, normalized by the distance observed during calibration. It can be a useful measure to see whether new data is comparable to the calibration data. The normalized dhypx is slightly biased towards larger values since the estimated x_residual_std_ is slightly underestimated during model calibration [Eriksson].
- distance_plot(X, sample_id=None, confidence=0.95)¶
Plot distances colinear and orthogonal to model predictor hyperplane
Generates a figure with two subplots. The subplots provide information on how X behaves compared to the calibration data. Subplots: 1) Distance in model hyperplane of predictors. Provides insight into the magnitude of variation within the hyperplane compared to the calibration data. Large values indicate samples which are outside of the calibration space but may be described by linearly scaled latent variables. 2) Distance orthogonal to model hyperplane. Provides insight into the magnitude of variation orthogonal to the model hyperplane compared to the calibration data. Large values indicate samples which show a significant trend not observed in the calibration data.
- dmodx(X, normalize=True, absolute=False)¶
Calculate distance to model hyperplane in X (DModX)
DModX provides the distance to the model hyperplane spanned by the loading vectors. Any information in the predictors that is not captured by the PLS model contributes to DModX. If the DModX is normalized, DModX is devided by the mean residual variance of X observed during model calibration.
- Parameters
X ((n, m) ndarray) – matrix of predictors. n samples x m predictors
normalize ({True (default); False}) – normalization of DModX by error in X during calibration
absolute ({True; False (default)}) – return the absolute distance to the model plane (not normalized by degrees of freedom)
- Returns
dmodx – distance of n samples to model hyperplane
- Return type
(n, ) ndarray
- fit(X, y=None)¶
Fit the model with X.
- Parameters
X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.
y (Ignored) – Ignored.
- Returns
self – Returns the instance itself.
- Return type
object
- fit_transform(X, y=None)¶
Fit the model with X and apply the dimensionality reduction on X.
- Parameters
X (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.
y (Ignored) – Ignored.
- Returns
X_new – Transformed values.
- Return type
ndarray of shape (n_samples, n_components)
Notes
This method returns a Fortran-ordered array. To convert it to a C-ordered array, use ‘np.ascontiguousarray’.
- get_covariance()¶
Compute data covariance with the generative model.
cov = components_.T * S**2 * components_ + sigma2 * eye(n_features)
where S**2 contains the explained variances, and sigma2 contains the noise variances.- Returns
cov – Estimated covariance of data.
- Return type
array of shape=(n_features, n_features)
- get_feature_names_out(input_features=None)¶
Get output feature names for transformation.
- Parameters
input_features (array-like of str or None, default=None) – Only used to validate feature names with the names seen in
fit()
.- Returns
feature_names_out – Transformed feature names.
- Return type
ndarray of str objects
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
dict
- get_precision()¶
Compute data precision matrix with the generative model.
Equals the inverse of the covariance but computed with the matrix inversion lemma for efficiency.
- Returns
precision – Estimated precision of data.
- Return type
array, shape=(n_features, n_features)
- inverse_transform(X)¶
Transform data back to its original space.
In other words, return an input X_original whose transform would be X.
- Parameters
X (array-like of shape (n_samples, n_components)) – New data, where n_samples is the number of samples and n_components is the number of components.
- Returns
Original data, where n_samples is the number of samples and n_features is the number of features.
- Return type
X_original array-like of shape (n_samples, n_features)
Notes
If whitening is enabled, inverse_transform will compute the exact inverse operation, which includes reversing whitening.
- score(X, y=None, scoring='r2')¶
- score_samples(X)¶
Return the log-likelihood of each sample.
See. “Pattern Recognition and Machine Learning” by C. Bishop, 12.2.1 p. 574 or http://www.miketipping.com/papers/met-mppca.pdf
- Parameters
X (array-like of shape (n_samples, n_features)) – The data.
- Returns
ll – Log-likelihood of each sample under the current model.
- Return type
ndarray of shape (n_samples,)
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- transform(X)¶
Apply dimensionality reduction to X.
X is projected on the first principal components previously extracted from a training set.
- Parameters
X (array-like of shape (n_samples, n_features)) – New data, where n_samples is the number of samples and n_features is the number of features.
- Returns
X_new – Projection of X in the first principal components, where n_samples is the number of samples and n_components is the number of the components.
- Return type
array-like of shape (n_samples, n_components)
- property x_loadings_¶
The loadings of X with shape (n_features, n_components).