Integration

  |   Source

Integration of bands is one of the most fundamental operations in spectroscopy. Here, we take look at how to integrate spectra in python.

The physical justification for comparing band integrals instead of the absorption or intensity values at a single wavelength is that the band integral accounts for line broadening that the band is subject to. Comparing integrals instead of single point values also decreases noise, as numerical integration is essentially a weighted average of the data points.

Limits for the integral depend on the width and shape of the band. For Gaussian shaped bands, one FWHM to each side of the maximum covers 98% of the band integral. Lorentzian bands are more spread out, here 15.9 FWHM to each side have to be used to cover the same fraction of the integral of the band. However, if the shape of the band does not change between spectra, then covering a specific fraction of the integral is not important and limits can be chosen as expedient.

When the integrated band overlaps with another band, baseline correction should be applied before integration. Since the baseline only needs to be accurate across a small part of the spectrum, linear baselines are often used. The points used for the baseline should be to both sides of the band for stability.

The scipy.integrate.simps and scipy.integrate.trapz functions provide convenient ways to calculate the integrals of the discrete data. The difference between them is the way the numerical integral is calculated. trapz calculates the area of the trapezoids determined by each pair of successive data points, simps uses Simpson's rule to integrate the spectrum, i.e. the data is treated as a series of second order polynomials which are then integrated (note: this is what the underlying idea of Simpson's rule is, not how it is actually calculated).

The dataset

First, we'll grab the test dataset from the web. It consists of spectra of two made up consecutive reactions taken over some time after the start of the reaction. The reaction follows these kinetics:

$$\mathrm{A}\xrightarrow{k_a}\mathrm{B}\xrightarrow{k_B}\mathrm{C}$$

Each of the reactants has a specific spectrum. The intensities of the bands are proportional to the concentration of each of the reagents.

In [1]:
# set up environment
import numpy as np
import scipy as sp

from scipy.integrate import simps

import matplotlib.pyplot as plt
import matplotlib as mpl
import matplotlib.gridspec as gridspec
%matplotlib notebook

import urllib.request
from io import BytesIO
In [2]:
#get dataset
from spectroscopy_data.reduction import integration
data = integration()
wn = data["wn"]
dataset = data["dataset"]
specA = data["specA"]
specB = data["specB"]
specC = data["specC"]

This is what the series of spectra in this dataset looks like:

In [3]:
plt.figure()
gs = gridspec.GridSpec(1,2,width_ratios=[9, 1])
ax = plt.subplot(gs[0,0])
# plot spectra with color dependent on spectrum number
cmap = mpl.cm.RdYlGn
for spec_num, row in enumerate(dataset):
    plt.plot(wn, row, color=cmap(spec_num/(len(dataset)-1)))
plt.xlabel("wavenumber / $\mathrm{cm^{-1}}$")
# add colorbar (see: https://matplotlib.org/examples/api/colorbar_only.html)
norm = norm = mpl.colors.Normalize(vmin=0, vmax=1)
cb1 = mpl.colorbar.ColorbarBase(plt.subplot(gs[0,1]), cmap=cmap,
                                norm=norm,
                                orientation='vertical')
cb1.set_label("reaction time / a.u.")

Writing a function for easier reuse

Since we will need to integrate several bands and multiple spectra, we will wrap the integration procedure into a function. This allows us to perform similar operations (her integration with different limits applied to different spectra) in a very succinct way.

In [4]:
def integrate_band(wn, I, band, baseline=None, baseline_degree=1, show=False):
    """
    wn: wavenumber
    I: spectrum channel
    band: tuple, containing left and right edge of band
    baseline: list containing tuples with ranges (left and right edge)
        used for baseline
    baseline_degree: degree of polynomial used for baseline
    """
    band = sorted(band)
    # create logical index for band:
    sel_band = (wn >= band[0]) & (wn <= band[1])
    # if baseline regions have been defined, create baseline logical index
    if baseline is not None:
        sel_baseline = np.zeros(wn.shape, bool)
        for base_range in baseline:
            base_range = sorted(base_range)
            sel_baseline = sel_baseline ^ ((wn >= base_range[0]) &
                                           (wn <= base_range[1]))
        poly = np.polyfit(wn[sel_baseline], I[sel_baseline], baseline_degree)
        baseline = np.polyval(poly, wn)
    # if no baseline ranges have been defined, create a zero baseline
    else:
        baseline = np.zeros(sel_band.shape)
    # integrate band
    integral = simps(I[sel_band] - baseline[sel_band], x=wn[sel_band])
    if show:
        # output additional data for trouble shooting
        return integral, baseline, sel_band, sel_baseline
    return integral

Let's have a look how this function is when applied to our spectra. First

In [5]:
data_idx = 2

integral, baseline, sel_band, sel_baseline = integrate_band(
    wn,
    dataset[data_idx],
    band=[1120, 1175],
    baseline=[[1114, 1117], [1174, 1176]],
    show=True)

plt.figure()
plt.plot(wn, dataset[data_idx], label="spectrum")
plt.plot(wn[sel_band], baseline[sel_band], label="baseline")
plt.plot(
    wn[sel_baseline],
    dataset[data_idx][sel_baseline],
    "x",
    label="baseline anchor")
plt.fill_between(
    wn[sel_band],
    baseline[sel_band],
    dataset[data_idx][sel_band],
    alpha=.7,
    label='integral')

plt.annotate("integral = {:.3f}".format(integral),
             (np.mean(wn[sel_band]), np.amax(dataset[data_idx][sel_band])),
            ha="center")
plt.legend()
Out[5]:
<matplotlib.legend.Legend at 0x7f872fa6e7b8>

If apply the same function to a spectrum a bit later in the series, the integral has decreased, as expected.

In [6]:
data_idx = 6

integral, baseline, sel_band, sel_baseline = integrate_band(
    wn,
    dataset[data_idx],
    band=[1120, 1175],
    baseline=[[1114, 1117], [1174, 1176]],
    show=True)

plt.figure()
plt.plot(wn, dataset[data_idx], label="spectrum")
plt.plot(wn[sel_band], baseline[sel_band], label="baseline")
plt.plot(
    wn[sel_baseline],
    dataset[data_idx][sel_baseline],
    "x",
    label="baseline anchor")
plt.fill_between(
    wn[sel_band],
    baseline[sel_band],
    dataset[data_idx][sel_band],
    alpha=.7,
    label='integral')

plt.annotate("integral = {:.3f}".format(integral),
             (np.mean(wn[sel_band]), np.amax(dataset[data_idx][sel_band])),
            ha="center")
plt.legend();

Integrating the full dataset

To apply the function to all spectra, we can use a for loop. The integral for each spectrum is then appended to the list integrals.

In [7]:
integrals = []

for spectrum in dataset:
    integral= integrate_band(
    wn,
    spectrum,
    band=[1120, 1175],
    baseline=[[1114, 1117], [1174, 1176]])
    integrals.append(integral)
plt.figure()
plt.plot(integrals)
plt.xlabel("spectrum index / 1")
plt.ylabel("integral / a.u.");

When integrating multiple bands, we need to specify the integration ranges and baseline ranges for all bands. Below, I am using dictionaries to store these parameters. I could be using lists as well, but the advantage of dictionaries in this case is, that we cannot accidentally get confused about the order of parameters and get completely wrong results, without even noticing.

In [8]:
integral_parameters = {
    "1155":{
        "band":[1120, 1175],
        "baseline":[[1114, 1117], [1174, 1176]]
    },
    "1200":{
        "band":[1175, 1230],
        "baseline":[[1172, 1174], [1230, 1232]]
    }
    ,
    "1500":{
        "band":[1450, 1550],
        "baseline":[[1445, 1450], [1550, 1550]]
    }
}
In [9]:
all_integrals = dict()

for name, parameters in integral_parameters.items():
    integrals = []
    for spectrum in dataset:
        integrals.append(integrate_band(wn, spectrum, **parameters))
    all_integrals[name] = integrals

plt.figure()
for name, integrals in all_integrals.items():
    plt.plot(integrals, label=name)
plt.legend()
plt.xlabel("spectrum index / 1")
plt.ylabel("integral / a.u.");

And, as we can see in the plot above, the three bands trace out the changes in concentration of three components in the two step reaction.

In a future post, I will take a close look at possible problems when integrating bands.