Smoothing: moving average
Moving average is the most straight forward way of smoothing data. Each point of the spectrum is replaced by the average of $N$ points to its left and right. The easiest way to apply a moving average filter is to convolve the spectrum with an array of length $2 N + 1$ of the value $\frac{1}{2 N + 1}$.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib notebook
First, let's create a dataset to visualize the effect of moving average smoothing:
# create dataset
gaussian_band = lambda wn, A, s, m: A/s*np.sqrt(2*np.pi)*np.exp(-(wn-m)**2/2/s**2)
wn = np.arange(1460, 1621)
spectrum = sum([
gaussian_band(wn, A, s, m)
for A, s, m in ((1, 1, 1485), (2.5, 2.5, 1515), (10, 10, 1590))
], np.zeros(wn.shape))
# add normal distributed noise to spectrum
noise = 0.1*np.random.randn(*spectrum.shape)
spectrum_noisy = spectrum + noise
plt.figure()
plt.plot(wn, spectrum, label="spectrum")
plt.plot(wn, spectrum_noisy, label="spectrum + noise")
plt.xlabel("wavenumber")
plt.ylabel("absorption");
plt.legend();
I use the np.convolve(a, v, mode='full')
function to apply the moving average filter. This function convolves the array a
with the spectrum v
. Convolve means, that a
is flipped and then the last element of a
is multiplied with the first element in v
. This is the first element of the result vector. For the next element in the result vector, the one but last element of a
is multiplied with the first element of v
and the last element of a
is multiplied the second element of v
. The sum of these products is the second element of the result vector. And so forth.
For a moving average filter I set all the values in a
to $\frac{1}{N}$, where $N$ is the length of a
.
The mode
argument selects how the filter deals with data points at the edges of the spectrum. The default is 'full'
, which means that that the result vector contains data at its edges that where not all elements of a
have been multiplied with actual data in 'v'. See the plot below for the resulting vector:
plt.figure()
N = 10
v = np.ones(20)
a = np.ones(N)/N
plt.plot(np.convolve(a,v,mode='full'))
plt.xlabel("index")
plt.ylabel("convolution")
The easiest solution is to extend the edges of the spectrum with $N$ copies of the first or the last values, respectively, so that the outermost values of the smoothed spectrum don't point towards zero. In case of highly noisy data, it can also be a good idea, to fill with the average of the $N+1$ values closest to the edge instead:
plt.figure()
N = 10
v = np.ones(20)
a = np.ones(N)/N
v_extended = np.zeros(2*N + len(v))
v_extended[N:-N] = v
v_extended[:N] = np.mean(v[:N+1])
v_extended[-N:] = np.mean(v[-N-1:])
plt.plot(np.convolve(a,v,mode='valid'))
plt.xlabel("index")
plt.ylabel("convolution");
To make the moving average filter reusable, I wrap it in a function like so:
def moving_average(data, N):
"""
`data`: input data array
`N`: data points to average, integer
"""
# create convolution array
window_arr = np.ones(2*N + 1)/(2*N + 1)
#extend spectrum with mean of N+1 values at the end
data_extended = np.zeros(2*N + len(data))
data_extended[N:-N] = data
data_extended[:N] = np.mean(data[:N+1])
data_extended[-N:] = np.mean(data[-N-1:])
smoothed = np.convolve(window_arr, data_extended, mode='valid')
return smoothed
When I apply the moving average filter to the noise test spectrum from the beginning of this article, this is what happens:
plt.figure()
plt.plot(wn, noise, label="unsmoothed")
for N in (1,2,4,8, 16, 32):
plt.plot(wn, moving_average(noise, N), label='$N=${}'.format(N))
plt.xlabel("wavenumber")
plt.ylabel("absorption");
plt.legend();
Obviously, higher higher $N$ means less noise. On the flip side, here is what happens to the bands in the spectrum when I aggressively smooth them:
plt.figure()
plt.plot(wn, spectrum_noisy, label="unsmoothed")
for N in (1,2,4,8, 16, 32):
plt.plot(wn, moving_average(spectrum_noisy, N), label='$N=${}'.format(N))
plt.xlabel("wavenumber")
plt.ylabel("absorption");
plt.legend();
Obviously, higher order filters make the spectrum smoother but at the same time, but they also significantly distort spectra, with narrower bands being affected more noticeably than broader bands.
Moving average filters don't offer a lot of options for customization, but they are the simplest form of a finite impulse response (FIR) filter. These can be tailored to specific problems by selecting frequency windows that contain information and those that don't. The Savitzky-Golay (SG) filter is another type of FIR filter, that has found widespread use in spectroscopy. I will write about these filters more in detail in a future post.