chemometrics.Whittaker

class chemometrics.Whittaker(penalty='auto', constraint_order=2, deriv=0)

Bases: TransformerMixin, BaseEstimator

Smooth with a Whittaker filter

Whittaker smooths X with a Whittaker filter. The filter smooths the data with a non-parametric line constraint by its derivative smoothness. penalty defines the penalty on non-smoothness. The whittaker smoother is very efficient and a useful drop-in replacement for Savitzky-Golay smoothing.

Parameters
  • penalty (float or 'auto' (default)) – Scaling factor of the penalty term for non-smoothness. If ‘auto’ is given, a penalty is estimated based on an algorithmically optimized leave-one-out cross validation

  • constraint_order (int) – Defines on which order of derivative the constraint acts on.

  • deriv (int) – Derivative of the data. Default: 0 - no derivative. Note: deriv should always be <= constraint_order.

estimate_penalty

True if penalty is estimated.

Type

boolean

penalty_

The applied penalty for smoothing.

Type

float

solve1d_

Solver for smoothing of 1D vector

Type

function

Notes

Whittaker uses sparse matrices for efficiency reasons. X may however be a full matrix. In contrast to the proposed algorithm by Eilers 1, no Cholesky decomposition is used. The reason is twofold. The Cholesky decomposition is not implemented for sparse matrices in Numpy/Scipy. Eilers uses the Cholesky decomposition to prevent Matlab from “reordering the sparse equation systems for minimal bandwidth”. Matlab seems to rely on UMFPACK for sparse matrix devision 2 which implements column reordering for sparsity preservation. As sparse matrix we are working with is square and positive-definite, we can rely on the builtin factorize method, which solves with UMFPACK if installed, otherwise with SuperLU.

Derivatives are implemented by multiplying the smoothed matrix with a (local) difference matrix. This is not explicitly described in 1. However, the approach is consistent with the underlying idea of the Whittaker smoother as the (local) differences are used in the derivation of the filter. Note: the derivative should always be smaller equal to the constraint order. This is, since the Whittaker filter won’t explizitly penaltize higher derivative fluctuations than the constraint order.

References

1(1,2)

Paul H. Eilers, A perfect smoother, Anal. Chem., vol 75, 14, pp. 3631-3636, 2003.

2

UMFPAC, https://en.wikipedia.org/wiki/UMFPACK, accessed 03.May.2020.

__init__(penalty='auto', constraint_order=2, deriv=0)

Methods

__init__([penalty, constraint_order, deriv])

fit(X[, y])

Calculate regression matrix for later use.

fit_transform(X[, y])

Fit to data, then transform it.

get_params([deep])

Get parameters for this estimator.

plot(X[, logpenalty])

Plot CV performance over given range

score(X[, y])

Calculate cross-validation error of Whittaker filter

set_params(**params)

Set the parameters of this estimator.

transform(X[, copy])

Do Whittaker smoothing

fit(X, y=None)

Calculate regression matrix for later use.

Parameters
  • X ((n, m) ndarray) – Data to be pretreated. n samples x m variables (typically wavelengths)

  • y – Ignored

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new – Transformed array.

Return type

ndarray array of shape (n_samples, n_features_new)

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

plot(X, logpenalty=[- 4, 4])

Plot CV performance over given range

Provides an analytical plot of the Whittaker filter score depending on the penalty. Each score is the estimated based on a leave-one-out approach (see also score).

score(X, y=None)

Calculate cross-validation error of Whittaker filter

Computes the cross-validation error of a Whittaker filter by a leave-one-out cross-validation scheme. The algorithm uses an approximation scheme and does not perform the explicit leave-one-out cross-validation. Users need should be careful when applying this cross-validation scheme to data with autocorrelated noise. The algorithm then tends to undersmooth the data.

Parameters
  • X ((n, m) ndarray) – Data. n samples x m variables (typically wavelengths)

  • y – Ignored

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

transform(X, copy=True)

Do Whittaker smoothing

Parameters
  • X ((n, m) ndarray) – Data to be pretreated. n samples x m variables (typically wavelengths)

  • copy (bool (True default)) – Whether to genrate a copy of the input file or calculate in place.