Baseline: Asymmetric Least Squares

  |   Source

The last method for baseline correction I talked about used polynomials fit to user defined spectral regions. The drawback of course is, that it is necessary find wavelength ranges that never contain features.

In this post I will take a look at a set of methods that work without the need for parts of the spectrum never containing bands. The basic idea of these methods is to use the least squares method to fit the spectrum with a smooth line. This post largely follows the very accessible 2015 paper by Baek, Park, Ahn und Choo (Baek et al.).

That means we have to balance closely fitting the data with keeping the resulting curve smooth, or to put it into an equation, we want minimize the following equation: $$S(\mathbf{z}) = (\mathbf{y} - \mathbf{z})^T\mathbf{W}(\mathbf{y} - \mathbf{z}) + \lambda (\mathbf{Dz})^T \mathbf{Dz}$$ , where $\mathbf{y}$ is a column vector containing the spectral intensities and $\mathbf{z}$ is a column vector containing the baseline. $W$ is a diagonal matrix of weights that we use to describe how closely we want to fit the spectrum in each point. It contains 1 along the diagonal for points that are definitely part of the baseline and 0 for parts that are definitely part of a feature that is not on the baseline. Multiplying with the difference matrix $\mathbf{D}$ calculates the first derivative. Lambda is a free parameter that is used to determine how important the smoothness of the baseline is for us. Higher values mean that it is more important.

Minimizing $S(\mathbf{z})$ means finding the baseline where the derivative $\frac{\partial S(\mathbf{z})}{\partial \mathbf{z}^T}$ is zero: $$-2 \mathbf{W} (\mathbf{y} -\mathbf{z}) + 2 \lambda \mathbf{D}^T\mathbf{D}\mathbf{z} = \mathbf{0}$$ $$ ( \mathbf{W} + \lambda \mathbf{D}^T\mathbf{D}) \mathbf{z} = \mathbf{Wy}$$

We can now calculate $\mathbf{z}$ by "dividing" the right side by $ ( \mathbf{W} + \lambda \mathbf{D}^T\mathbf{D}) $, or rather, since we are dealing with matrices here, by multiplying both sided with with $ ( \mathbf{W} + \lambda \mathbf{D}^T\mathbf{D})^{-1} $ from the left side.

Let's see how to apply this expression with an example consisting of a sharp spectral feature, a slowly changing baseline and some noise:

In [1]:
import numpy as np
from numpy.random import randn

import matplotlib.pyplot as plt

%matplotlib notebook


wn = np.linspace(0,1000, 1000)

baseline = np.cos(2*np.pi*wn/500)
feature = np.exp(-(wn-500)**2/20)
noise = 0.1*randn(len(wn))

signal = baseline + feature + noise

plt.figure()

plt.plot(wn, baseline, label="baseline")
plt.plot(wn, feature, label="feature")
plt.plot(wn, noise, label="noise")

plt.plot(wn, signal, label="signal")
plt.legend()
Out[1]:
<matplotlib.legend.Legend at 0x7f2d915b6908>
In [2]:
from numpy.linalg import solve


def apply_fit(y, w, lam):
    """
    apply smooth least squares fit to signal y
    
    y... signal, 1D vector
    w... weights [0,1], length of y
    lam... smoothing factor, >>0
    """
    # create difference matrix
    D = np.zeros((len(y), len(y)))
    diag_row, diag_col = np.diag_indices(len(y)-1)
    diag_row = diag_row + 1
    D[diag_row, diag_col] = 1
    D[diag_row, diag_col+1] = -1
    # create weight matrix
    W = np.diag(w, 0)
    # create matrix A for least squares
    A = (W + lam * D.T @ D)
    z = solve(A, W @ y)
    return z


#our feature is pretty much concentrated between 490 and 510:
w = (wn<=490)|(wn>=510)

z = apply_fit(signal, w, 1E4)




plt.figure()
plt.plot(wn, feature, label="original feature")
plt.plot(wn, signal - z, label="baseline corrected")
plt.legend()



plt.figure()
plt.plot(wn, baseline, label="baseline actual")
plt.plot(wn,z, label="baseline fit")
plt.legend()
Out[2]:
<matplotlib.legend.Legend at 0x7f2d914e8c88>

We use the apply_fit function to wrap the steps needed to apply the fit. The first few lines are used to set up the difference matrix, nothing really exciting happening there. Then the weight matrix is create, again, pretty straight forward. The interesting part is in the use of the least squares function to calculate $\mathbf{z}$. In general, it is not a good idea to calculate the inverse of a matrix, because inverting a matrix is numerically unstable. Instead we use numpy.linalg.solve to calculate the $\mathbf{z}$ that solves $\mathbf{A z} = \mathbf{W y}$.'

While we are at it, let's also check what the $\lambda$ does:

In [3]:
plt.figure()
for lam in [1,1E1, 1E2, 1E3, 1E8, 1E12]:
    z = apply_fit(signal, w, lam)
    plt.plot(wn, z, label=r"$\lambda$={}".format(lam))
plt.legend()
Out[3]:
<matplotlib.legend.Legend at 0x7f2d91530f98>

As we can see, there is quite a big latitude in the possible values of (\lambda), with higher values leading to smoother spectra but larger differences to the data.

Determinig $\mathbf{W}$

While somewhat useful for calculating baselines of spectra with bands in known positions, the baseline fit would be vastly more useful if we didn't have to know where to expect bands beforehand, otherwise the calculated baseline will always be trying to fit the features as well as the baseline. While the smoothness requirement will make sure that it won't perfectly fit the features, it will still lead to results that are unusable for a baseline.

In [4]:
w = np.ones(len(signal))
z = apply_fit(signal, w, 10000)


plt.figure()
plt.plot(wn, signal, label="signal")
plt.plot(wn, z, label="baseline without feature selection")
plt.legend()
Out[4]:
<matplotlib.legend.Legend at 0x7f2d91333978>

AsLS: automatically finding features

Eilers and Boelens (Eilers & Boelens n.d.) deal with this problem by noting that regions where the fit is imperfect can be used to detect features. They introduce an additional user selected parameter $p$ that is assigned to the weight vector according to $$w_i = \begin{cases} p, &y_i > z_i\\ 1-p, &y_i \leq z_i \end{cases}$$ and a number of iterations $N$.

In [5]:
def AsLS(y, lam, p, N):
    w = np.ones(len(y))
    for n in range(N):   
        z = apply_fit(y, w, lam)
        w[y>z] = p
        w[y<=z] = 1 - p
    return z

z = AsLS(signal, 100, .25, 10)


plt.figure()
plt.plot(wn, signal, label="signal")
plt.plot(wn, z, label="AsLS baseline")
plt.legend()
Out[5]:
<matplotlib.legend.Legend at 0x7f2d912a2a20>

The important step here is to balance $\lambda$ and $p$. If $\lambda$ is too low, then the baseline just follows the shape of the spectrum, features included, if it is too high, then the fit tends to be a straight, horizontal line. For low $p$, the baseline stays below the spectrum at all times, meaning that noise around the baseline is lifted above it, whereas for too high values, features are treated the same way as parts that belong to the baseline.

See the plots below for a visualization for the influence of the fitting parameters.

In [6]:
plt.figure()
plt.plot(wn, signal, label="signal")
plt.title("Influence of $p$")
for p in [1E-3, 1E-2, 1E-1, .25, 0.5]:
    z = AsLS(signal, 1E2, p, 100)
    plt.plot(wn, z, label="p = {}".format(p))
plt.legend()

plt.figure()
plt.plot(wn, signal, label="signal")
plt.title("Influence of $\lambda$")
for lam in [1, 1E2, 1E4, 1E6, 1E8]:
    z = AsLS(signal, lam, .1, 100)
    plt.plot(wn, z, label="lam = {}".format(lam))
plt.legend()
Out[6]:
<matplotlib.legend.Legend at 0x7f2d91233a58>

arPLS: getting rid of $p$

Baek, Park, Ahn und Choo(Baek et al.) propose an improved method for baseline correction that does not need a $p$ parameter. They argue, that signals that are slightly above the baseline are probably still part of the baseline, however, in AsLS they would be considered part of a feature and weighed with $p$ instead of $1 - p$. This is what leads to the problems in selecting $p$ .

To remediate this problem, they propose to use what they call asymmetrically reweighted penalized least squares (arPLS). Instead of using a user selected value for points above the baseline, they assign them a value that depends on its distance to the baseline. The values are assigned according to a logistic function.

In [7]:
def logistic(d, m, sigma):
    return 1 / (1 + np.exp(2*(d-(2*sigma-m))/sigma))

ds = np.linspace(0,10)

plt.figure()
plt.plot(ds, logistic(ds, 1, 3))
Out[7]:
[<matplotlib.lines.Line2D at 0x7f2d91129048>]

The parameters of the logistic function are determined from the deviation of the signal around the baseline. The m argument is the mean value of the data point below the curve, the sigma argument is its standard deviation.

The only parameters that the user has to choose are $\lambda$ (smoothness) and a cutoff value for the change between two $w$ vectors to determine when to stop the iteration.

In [8]:
def arPLS(y, lam, N, ratio):
    w_old = 0
    w = np.ones(len(y))
    for n in range(N):   
        z = apply_fit(y, w, lam)
        diff = (y-z)
        diff_neg = diff[diff < 0]
        diff_pos = diff[diff >= 0]
        w_old = w.copy()
        w[y>=z] = logistic(diff_pos, np.mean(diff_neg), np.std(diff_neg))
        w[y<z] = 1
        if np.linalg.norm(w_old -w)/np.linalg.norm(w_old) < ratio:
            print ("stopped at n = {}".format(n))
            break
    return z, w_old


z, w_old = arPLS(signal, 100, 100, .001)


plt.figure()
plt.plot(wn, signal, label="signal")
plt.plot(wn, z, label="arPLS baseline")
plt.legend()
stopped at n = 6
Out[8]:
<matplotlib.legend.Legend at 0x7f2d91102c50>

The cut-off value appears to have little influence on the shape of the baseline and needs no fine tuning.

In [9]:
plt.figure()
plt.plot(wn, signal, label="signal")

for ratio in [1E-1, 1E-2, 1E-3, 1E-4, 1E-5]:
    z, w_old = arPLS(signal, 100, 100, ratio)
    plt.plot(wn, z, label="ratio = {}".format(ratio))
plt.legend()
stopped at n = 1
stopped at n = 4
stopped at n = 6
stopped at n = 9
stopped at n = 12
Out[9]:
<matplotlib.legend.Legend at 0x7f2d912113c8>

Bibliography

Baek, S.-J. et al., Baseline correction using asymmetrically reweighted penalized least squares smoothing. The Analyst, 140(1), pp.250–257. Available at: http://xlink.rsc.org/?DOI=C4AN01061B.

Eilers, P.H.C. & Boelens, H.F.M., Baseline Correction with Asymmetric Least Squares Smoothing. , p.24.