Loading data from text files
Of course the first in any kind of data evaluation is loading data. To make this harder, every spectrometer vendor creates their own file format - which often lack a public documentation. In this case the last resort to get files into python is often to export them to a text file. numpy
comes with a few functions that read files into arrays.
First, let's get some test data.
from spectroscopy_data.pretreatment import loading_data
load_dir, csv_path = loading_data()
import os
These files consist of two columns of data, the first one being the wavenumber axis and the second one the intensity values. The data looks as follows:
1.000000000000000000e+03;0.000000000000000000e+00
1.020408163265306143e+03;0.000000000000000000e+00
1.040816326530612287e+03;0.000000000000000000e+00
1.061224489795918316e+03;0.000000000000000000e+00
1.081632653061224573e+03;0.000000000000000000e+00
1.102040816326530603e+03;0.000000000000000000e+00
1.122448979591836633e+03;0.000000000000000000e+00
and so on.
The filenames look "measurement_series 0.csv" to "measurement_series 99.csv". The main function to load such comma separated files into numpy arrays is np.genfromtxt
. It's first input is either the path to the file we want to load or its contents as a string.
It has a few options to make sure the file is loaded correctly. The most important ones probably are those for skipping the first few rows skiprows
and the option to state select the column separator character.
The column separator in files for this exercise is ;
.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib notebook
first_file_path = os.path.join(csv_path, "measurement_series 0.csv")
data = np.genfromtxt(first_file_path, delimiter=";")
plt.figure()
plt.plot(data[:,0], data[:,1])
In the case of column wise data, it can also make sense to use the unpack
option as well. It transposes the loaded data, so that columns are the first index of the array:
first_file_path = os.path.join(csv_path, "measurement_series 0.csv")
data = np.genfromtxt(first_file_path, delimiter=";", unpack=True)
plt.figure()
plt.plot(data[0], data[1])
The folder with raw data contains 100 files in total. It would be tedious to load all of them by hand. If we know what the files are called like, we can of course load them in a loop, like so:
generic_file_path = os.path.join(csv_path, "measurement_series {}.csv")
all_data = []
for file_idx in range(100):
data = np.genfromtxt(generic_file_path.format(file_idx),
delimiter=";",
unpack=True)
all_data.append(data)
plt.figure()
for data in all_data:
plt.plot(data[0], data[1])
Of course, not always is there a straight forward way in which
Optimizing Code
In this post, I will discuss optimizing python code for numeric math on the example of an asymmetric least squares baseline correction algorithm.
Baseline: Asymmetric Least Squares
The last method for baseline correction I talked about used polynomials fit to user defined spectral regions. The drawback of course is, that it is necessary find wavelength ranges that never contain features.
In this post I will take a look at a set of methods that work without the need for parts of the spectrum never containing bands.
Savitzky Golay Filter
The Savitzky-Golay (S-G) filter (Savitzky & Golay) is a convolution based filter, like the moving average filter. However, whereas the moving average filter weighs all all data points inside the window the same, the S-G filter gives each of the points inside the window a different weight. The weights are chosen such that the point at the center of the window is replaced with the value at polynomial least squares fit through the data points has at this position.
Smoothing: moving average
Moving average is the most straight forward way of smoothing data. Each point of the spectrum is replaced by the average of $N$ points to its left and right. The easiest way to apply a moving average filter is to convolve the spectrum with an array of length $2 N + 1$ of the value $\frac{1}{2 N + 1}$.
Integration
Integration of bands is one of the most fundamental operations in spectroscopy. Here, we take look at how to integrate spectra in python.
Baseline Correction 1: Polynomial Baseline
Baseline correction¶
Baseline correction removes slow changes from the system. These can either be offsets in the base line, such as a change in the overall transmission in an absorption measurement, or broad features in the spectrum, such as fluorescence in Raman spectra.