chemometrics.Whittaker¶
- class chemometrics.Whittaker(penalty='auto', constraint_order=2, deriv=0)¶
Bases:
sklearn.base.TransformerMixin
,sklearn.base.BaseEstimator
Smooth X with a whittaker smoother
Whittaker smooths X with a whittaker smoother. The smoother smooths the data with a non-parametric line constraint by its derivative smoothness. penalty defines the penalty on non-smoothness. The whittaker smoother is very efficient and a useful drop-in replacement for Savitzky-Golay smoothing.
- Parameters
penalty (float or 'auto' (default)) – Scaling factor of the penalty term for non-smoothness. If ‘auto’ is given, a penalty is estimated based on an algorithmically optimized leave-one-out cross validation
constraint_order (int) – Defines on which order of derivative the constraint acts on.
deriv (int) – Derivative of the data. Default: 0 - no derivative. Note: deriv should always be <= constraint_order.
- estimate_penalty¶
True if penalty is estimated.
- Type
boolean
- penalty_¶
The applied penalty for smoothing.
- Type
float
- solve1d_¶
Solver for smoothing of 1D vector
- Type
function
Notes
Whittaker uses sparse matrices for efficiency reasons. X may however be a full matrix. In contrast to the proposed algorithm by Eilers 1, no Cholesky decomposition is used. The reason is twofold. The Cholesky decomposition is not implemented for sparse matrices in Numpy/Scipy. Eilers uses the Cholesky decomposition to prevent Matlab from “reordering the sparse equation systems for minimal bandwidth”. Matlab seems to rely on UMFPACK for sparse matrix devision 2 which implements column reordering for sparsity preservation. As sparse matrix we are working with is square and positive-definite, we can rely on the builtin factorize method, which solves with UMFPACK if installed, otherwise with SuperLU.
Derivatives are implemented by multiplying the smoothed matrix with a (local) difference matrix. This is not explicitly described in 1. However, the approach is consistent with the underlying idea of the Whittaker smoother as the (local) differences are used in the derivation of the filter. Note: the derivative should always be smaller equal to the constraint order. This is, since the Whittaker filter won’t explizitly penaltize higher derivative fluctuations than the constraint order.
References
- 1(1,2)
Paul H. Eilers, A perfect smoother, Anal. Chem., vol 75, 14, pp. 3631-3636, 2003.
- 2
UMFPAC, https://en.wikipedia.org/wiki/UMFPACK, accessed 03.May.2020.
- __init__(penalty='auto', constraint_order=2, deriv=0)¶
Methods
__init__
([penalty, constraint_order, deriv])fit
(X[, y])Calculate regression matrix for later use.
fit_transform
(X[, y])Fit to data, then transform it.
get_params
([deep])Get parameters for this estimator.
plot
(X[, logpenalty])Plot CV performance over given range
score
(X[, y])Calculate cross-validation error of whittaker smoother.
set_params
(**params)Set the parameters of this estimator.
transform
(X[, copy])Do Whittaker smoothing.
- fit(X, y=None)¶
Calculate regression matrix for later use.
- Parameters
X ((n, m) ndarray) – Data to be pretreated.
n
samples xm
variables (typically wavelengths)y – Ignored
- fit_transform(X, y=None, **fit_params)¶
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns
X_new – Transformed array.
- Return type
ndarray array of shape (n_samples, n_features_new)
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
dict
- plot(X, logpenalty=[- 4, 4])¶
Plot CV performance over given range
Provides an analytical plot of the Whittaker filter score depending on the penalty. Each score is the estimated based on a leave-one-out approach (see also score).
- score(X, y=None)¶
Calculate cross-validation error of whittaker smoother.
Computes the cross-validation error of a whittaker smoother by a leave-one-out cross-validation scheme. The algorithm uses an approximation scheme and does not perform the explicit leave-one-out cross-validation. Users need should be careful when applying this cross-validation scheme to data with autocorrelated noise. The algorithm then tends to undersmooth the data.
- Parameters
X ((n, m) ndarray) – Data.
n
samples xm
variables (typically wavelengths)y – Ignored
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- transform(X, copy=True)¶
Do Whittaker smoothing.
- Parameters
X ((n, m) ndarray) – Data to be pretreated.
n
samples xm
variables (typically wavelengths)copy (bool (True default)) – Whether to genrate a copy of the input file or calculate in place.