Introduces BoxSERS, a complete and ready-to-use python library for the application of data augmentation, dimensional reduction, spectral correction, machine learning and other methods specially designed and adapted for vibrational spectra(Raman,FTIR, SERS, etc.).
Table of contents
BoxSERS Installation
From PypY
pip install boxsers
From Github
pip install git+https://github.com/ALebrun-108/BoxSERS.git
Requirements
Listed below are the main modules needed to operate the codes:
- Sklearn
- Scipy
- Numpy
- Pandas
- Matplotlib
- Tensor flow (GPU or CPU)
Labels associated to spectra can be in one of the following three forms:
Label Type | Examples |
---|
Text | Cholic, Deoxycholic, Lithocholic, ... |
Integer | 0, 3, 1 , ... |
Binary | [1 0 0 0], [0 0 0 1], [0 1 0 0], ... |
Included Features
Module boxsers.misc_tools
This module provides functions for a variety of utilities.
-
data_split : Randomly splits an initial set of spectra into two new subsets named in this
function: subset A and subset B.
-
load_rruff : Export a subset of Raman spectra from the RRUFF database in the form of three related lists
containing Raman shifts, intensities and mineral names.
Module boxsers.visual_tools
This module provides different tools to visualize vibrational spectra quickly.
-
spectro_plot : Returns a plot with the selected spectrum(s)
-
random_plot : Plot a number of randomly selected spectra from a set of spectra.
-
distribution_plot : Return a bar plot that represents the distributions of spectra for each classes in
a given set of spectra
from boxsers.misc_tools import data_split
from boxsers.visual_tools import spectro_plot, random_plot, distribution_plot
wn = 3
spec =5
(spec_train, spec_test, lab_train, lab_test) = data_split(wn, spec , b_size=0.4)
distribution_plot(lab_train, title='Train set distribution')
random_plot(wn, spec, random_spectra=4)
spectro_plot(wn, spec[0], spec[2])
Module boxsers.preprocessing
This module provides multiple functions to preprocess vibrational spectra. These features
improve spectrum quality and can improve performance for machine learning applications.
-
baseline_substraction : Subtracts the baseline signal from the spectrum(s) using
Asymmetric Least Squares estimation.
-
intensity_normalization : Normalizes the spectrum(s) using one of the available norms in this function.
-
savgol_smoothing : Smoothes the spectrum(s) using a Savitzky-Golay polynomial filter.
-
spectral_cut : Subtracts or sets to zero a delimited spectral region of the spectrum(s)
-
spline_interpolation : Performs a one-dimensional interpolation spline on the spectra to reproduce
them with a new x-axis.
import numpy as np
from boxsers.preprocessing import als_baseline_cor, spectral_cut, spectral_normalization, spline_interpolation
new_wn = np.linspace(500, 3000, 1000)
spec_cor = spline_interpolation(spec, wn, new_wn)
(spec_cor, baseline) = als_baseline_cor(spec, lam=1e4, p=0.001, niter=10)
spec_cor = spectral_normalization(spec)
spec_cor, wn_cor = spectral_cut(spec, wn, wn_start, wn_end)
Module boxsers.data_augmentation
This module provides several data augmentation methods that generate new spectra by adding
different variations to existing spectra.
-
aug_mixup : Randomly generates new spectra by mixing together several spectra with a Dirichlet
probability distribution.
-
aug_noise : Randomly generates new spectra with Gaussian noise added.
-
aug_multiplier : Randomly generates new spectra with multiplicative factors applied.
-
aug_ioffset : Randomly generates new spectra shifted in intensity.
-
aug_xshift : Randomly generates new spectra shifted in wavelength.
-
aug_linslope : Randomly generates new spectra with additional linear slopes
from boxsers.data_augmentation import aug_mixup, aug_noise
spectra_nse, label_nse = aug_noise(spec, lab, snr=10)
spectra_mult, label_mult = aug_multiplier(spectra, labels, 0.15,)
spectro_plot(wn, spec, spec_nse, spec_mult_sup, spec_mult_inf, legend=legend)
spec_nse, lab_nse = aug_noise(spec, lab, param_nse, quantity=2, mode='random')
spec_mul, lab_mul = aug_multiplier(spec, lab, mult_lim, quantity=2, mode='random')
spec_aug = np.vstack((x, spec_nse, spec_mul))
lab_aug = np.vstack((lab, lab_nse, lab_mul))
x_aug, y_aug = shuffle(x_aug, y_aug)
Module boxsers.dimension_reduction
This module provides different techniques to perform dimensionality reduction of
vibrational spectra.
-
SpectroPCA: Returns a plot with the selected spectrum(s)
-
SpectroPCA : Plot a number of randomly selected spectra from a set of spectra.
-
distribution_plot : Return a bar plot that represents the distributions of spectra for each classes in
a given set of spectra
Dimensional Reduction
from boxsers.machine_learning.dimension_reduction import SpectroPCA, SpectroICA
pca_model = SpectroPCA(n_comp=50)
pca_model.fit_model(spec_train)
pca_model.scatter_plot(spec_test, spec_test, targets=classnames, component_x=1, component_y=2)
pca_model.component_plot(wn, component=2)
spec_pca = pca_model.transform_spectra(spec_test)
Unsupervised Machine Learning
from boxsers.machine_learning import SpectroGmixture, SpectroKmeans
kmeans_model = SpectroKmeans(n_cluster=5)
kmeans_model.fit_model(spec_train)
kmeans_model.scatter_plot(spec_test)
Supervised Machine Learning
- Convolutional Neural Networt (3 x Convolutional layer 1D , 2 x Dense layer)
from boxsers.pca_model import SpectroPCA, SpectroFA, SpectroICA
pca_model = SpectroICA(n_comp=50)
pca_model.fit_model(x_train)
pca_model.scatter_plot(x_test, y_test, targets=classnames, comp_x=1, comp_y=2)
pca_model.pca_component(Wn, 2)
x_pca = pca_model.transform_spectra(x_train)
Module validation_metrics
This module provides different tools to evaluate the quality of a model’s predictions.