chemometrics.Whittaker¶

class chemometrics.Whittaker(penalty='auto', constraint_order=2, deriv=0)¶

Bases: TransformerMixin, BaseEstimator

Smooth with a Whittaker filter

Whittaker smooths X with a Whittaker filter. The filter smooths the data with a non-parametric line constraint by its derivative smoothness. penalty defines the penalty on non-smoothness. The whittaker smoother is very efficient and a useful drop-in replacement for Savitzky-Golay smoothing.

Parameters

penalty (float or 'auto' (default)) – Scaling factor of the penalty term for non-smoothness. If ‘auto’ is given, a penalty is estimated based on an algorithmically optimized leave-one-out cross validation
constraint_order (int) – Defines on which order of derivative the constraint acts on.
deriv (int) – Derivative of the data. Default: 0 - no derivative. Note: deriv should always be <= constraint_order.

estimate_penalty¶

True if penalty is estimated.

Type: boolean

penalty_¶

The applied penalty for smoothing.

Type: float

solve1d_¶

Solver for smoothing of 1D vector

Type: function

Notes

Whittaker uses sparse matrices for efficiency reasons. X may however be a full matrix. In contrast to the proposed algorithm by Eilers 1, no Cholesky decomposition is used. The reason is twofold. The Cholesky decomposition is not implemented for sparse matrices in Numpy/Scipy. Eilers uses the Cholesky decomposition to prevent Matlab from “reordering the sparse equation systems for minimal bandwidth”. Matlab seems to rely on UMFPACK for sparse matrix devision 2 which implements column reordering for sparsity preservation. As sparse matrix we are working with is square and positive-definite, we can rely on the builtin factorize method, which solves with UMFPACK if installed, otherwise with SuperLU.

Derivatives are implemented by multiplying the smoothed matrix with a (local) difference matrix. This is not explicitly described in 1. However, the approach is consistent with the underlying idea of the Whittaker smoother as the (local) differences are used in the derivation of the filter. Note: the derivative should always be smaller equal to the constraint order. This is, since the Whittaker filter won’t explizitly penaltize higher derivative fluctuations than the constraint order.

References

1(1,2): Paul H. Eilers, A perfect smoother, Anal. Chem., vol 75, 14, pp. 3631-3636, 2003.
2: UMFPAC, https://en.wikipedia.org/wiki/UMFPACK, accessed 03.May.2020.

__init__(penalty='auto', constraint_order=2, deriv=0)¶

Methods

`__init__`([penalty, constraint_order, deriv])
`fit`(X[, y])	Calculate regression matrix for later use.
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_params`([deep])	Get parameters for this estimator.
`plot`(X[, logpenalty])	Plot CV performance over given range
`score`(X[, y])	Calculate cross-validation error of Whittaker filter
`set_params`(**params)	Set the parameters of this estimator.
`transform`(X[, copy])	Do Whittaker smoothing

fit(X, y=None)¶

Calculate regression matrix for later use.

Parameters

X ((n, m) ndarray) – Data to be pretreated. n samples x m variables (typically wavelengths)
y – Ignored

fit_transform(X, y=None, **fit_params)¶

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.

Returns

X_new – Transformed array.

Return type

ndarray array of shape (n_samples, n_features_new)

get_params(deep=True)¶

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

plot(X, logpenalty=[- 4, 4])¶

Plot CV performance over given range

Provides an analytical plot of the Whittaker filter score depending on the penalty. Each score is the estimated based on a leave-one-out approach (see also score).

score(X, y=None)¶

Calculate cross-validation error of Whittaker filter

Computes the cross-validation error of a Whittaker filter by a leave-one-out cross-validation scheme. The algorithm uses an approximation scheme and does not perform the explicit leave-one-out cross-validation. Users need should be careful when applying this cross-validation scheme to data with autocorrelated noise. The algorithm then tends to undersmooth the data.

Parameters

X ((n, m) ndarray) – Data. n samples x m variables (typically wavelengths)
y – Ignored

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

transform(X, copy=True)¶

Do Whittaker smoothing

Parameters

X ((n, m) ndarray) – Data to be pretreated. n samples x m variables (typically wavelengths)
copy (bool (True default)) – Whether to genrate a copy of the input file or calculate in place.

chemometrics.Whittaker¶

chemometrics

Navigation

Related Topics