Savitzky Golay Filter
The Savitzky-Golay (S-G) filter (Savitzky & Golay) is a convolution based filter, like the moving average filter. However, whereas the moving average filter weighs all all data points inside the window the same, the S-G filter gives each of the points inside the window a different weight. The weights are chosen such that the point at the center of the window is replaced with the value at polynomial least squares fit through the data points has at this position.
The upside of this scheme is, that it can reduce noise in the spectra while still preserving sharp features. In addition to the window size, the S-G filter also allows to choose a degree of polynomial. A zeroth degree S-G filter is exactly the same as a moving average filter, as the degree of the polynomial increases, sharper and sharper features are preserved, as you can see in the plot below.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib notebook
from spectroscopy_data.pretreatment import SG_smoothing
data = SG_smoothing()
wn = data["wn"]
noiseless = data["noiseless"]
noisy = data["noisy"]
plt.figure()
plt.plot(wn, noisy, label="noisy")
plt.plot(wn, noiseless, label="noiseless")
plt.legend()
scipy.signal
contains a function to apply an S-G filter to a numpy array,
savgol_filter
:
from scipy.signal import savgol_filter
window_Ns = [2,4,8]
polyorders = [1,2,3]
plt.figure()
for idx, N in enumerate(window_Ns):
ax = plt.gcf().add_subplot(3,1,idx+1)
ax.set_title('N={}'.format(N))
ax.plot(wn, noiseless, color='lightgrey')
for polyorder in polyorders:
if polyorder < 2*N + 1:
if polyorder < 2*N + 1:
smoothed = savgol_filter(noisy,
window_length=2*N+1,
polyorder=polyorder)
plt.plot(wn, smoothed, label='order={}'.format(polyorder))
ax.legend(loc="center right")
plt.tight_layout()
The above plot also shows a draw back of S-G filters: they can introduce ringing at the edge of sharp features. So choosing incorrect parameters for the S-G function will not only not remove noise (or remove features) but it can also introduce artifacts into the dataset. One approach to selecting the correct parameters is of course, to hand tune them, and use check the filtered spectra visually for loss of information and ringing. Enke and Nieman (Enke & Nieman) test the improvement in signal-to-noise ratio due to a S-G filter with given parameters by applying it to simulated data sets - purse signal spectra containing a single Gaussian band and pure noise spectra. For second and third order filters, they suggest to use a window length that is at most 0.7 times the full width at half maximum (FWHM) for Lorentzian peaks and 0.9 times the full width at half maximum for Gaussian peaks, in order to keep the distortion of the peak shape small (> 1% change of the peak height). A guide-line for the best compromise between noise reduction and conservation of peak shape is a window size proportional to 2 times the FWHM of bands in the spectrum (Enke & Nieman)
There are also methods for automatic selection of the S-G parameters. For example, in (Vivó-Truyols & Schoenmakers) the noise characteristic of the measurement is used to select a SG windows size that smoothes out the noise but not the signal. The prerequisite of this procedure is that the noise characteristic of the measurement has been determined, usually from a blank. The noise characteristic is determined via autocorrelation $\rho_1$ of the signal: $$\rho_h = 1-\frac{1}{2}\frac{\sum_{i=1+h}^n (e_i-e_{i-h})^2}{\sum_{i=1}^n (e_i)^2}\frac{n}{n-h}$$ Here $h$ is the lag of the autocorrelation and $e_i$ are individual data point and n is the number of data points in each spectrum.
If the smoothing removes only noise, than the the differences between the smoothed spectrum and the original spectrum should have the same noise characteristic as the blank. Therefore, the measurement spectra are smoothed with a series of window sizes and the noise characteristic of the removed noise is compared to the noise characteristic of the blank. Whichever window size leads to the closest noise characteristic to the blank noise is the one that should be used to smooth the spectra.
def autocorr(e, h):
"""calculates autocorrelation of vector e for lag h"""
diff = np.sum((e[:-h]-e[h:])**2)
rho_h = 1 - diff/np.sum(e**2)*len(e)/(len(e)-h)/2
return rho_h
polyorder = 2
fig_spectra = plt.figure()
ax_spectra = fig_spectra.add_subplot(111)
ax_spectra.plot(wn, noiseless, color='darkgrey', label='noise')
noise = .1 * np.random.randn(*noiseless.shape)
ax_spectra.plot(wn, noise, color='lightgrey', label='noiseless')
# here we create a spectrum with the same noise characteristic but not
# the same noise
noisy = noiseless + .1 * np.random.randn(*noiseless.shape)
ax_spectra.plot(wn, noisy, color='black', label='noisy')
tested_Ns = list(range(1, 10))
corrs = []
# calculate noise characteristic of the residual.
for N in tested_Ns:
smoothed = savgol_filter(noisy,
window_length=2*N+1,
polyorder=polyorder)
corrs.append(autocorr(noisy-smoothed, 1))
fig_rho = plt.figure()
ax_rho = fig_rho.add_subplot(111)
ax_rho.plot(tested_Ns, corrs, label='noise characteristic difference')
noise_characteristic = autocorr(noise, 1)
ax_rho.axhline(noise_characteristic, linestyle=":",
label='noise characteristic')
# choose most similar error characteristic
selected_window = np.argmin(np.abs(corrs-noise_characteristic))
ax_spectra.plot(wn, savgol_filter(noisy,
window_length=2*tested_Ns[selected_window]+1,
polyorder=polyorder), color='red',
label='smoothed')
ax_spectra.legend()
ax_rho.legend()
In addition to smoothing spectra, the SG method can also be used to calculate derivatives of spectra. It is usually a good idea to use SG instead of the difference quotient to calculate derivatives, because derivation leads to high frequency noise, which SG suppresses.
Bibliography¶
Savitzky, A. & Golay, M.J.E., Smoothing and Differentiation of Data by Simplified Least Squares Procedures.. Analytical Chemistry, 36(8), pp.1627–1639.
Enke, C.G. & Nieman, T.A., Signal-to-Noise Ratio Enhancement by Least-Squares Polynomial Smoothing. Analytical Chemistry, 48(8), pp.705A–712A.
Vivó-Truyols, G. & Schoenmakers, P.J., Automatic Selection of Optimal Savitzky-Golay Smoothing. Analytical Chemistry, 78(13), pp.4598–4608.