4 Fourier-transform infrared spectroscopy of soil samples

4.1 Protocol and devices

We analysed all of our 29 samples from 2017-2018 campaigns and 532 from 2022-2023 campaigns with mid-infrared spectroscopy to serve as predictors for the prediction model on the soil properties. These techniques are now well knowned and commonly used in soil science as they allow spare time and money on the different laboratory measurements (Wadoux et al. 2021; Viscarra Rossel et al. 2016; Ng et al. 2022; Ge, Wadoux, and Peng 2022; Stenberg et al. 2010; Bahrami, Danesh, and Bahrami 2022).

Before being analysed with FTIR the samples were ground under 1 µm with a Pulverisette 5/4, classic line (Fritsh, Idar-Oberstein, Germany) in a 250 ml stainless steel hardness container (ISO: X105CrMo17) with five 20 mm sintered corrodium (99.7 % Al₂O₃) grinding balls. The settings were set at 350 turns per minute for 12 minutes in total.

The last step before FTIR analyse was to realise lenses from the samples with KBr pressling method. 250 mg of potassium brodime (KBr) and 1-1.3 mg of sample substance are mixed and then loaded in a hydraulic press under vacuum with around 10 - 11 tonnes and pressed for 1-2 minutes to form a transparent tablet, e.g. 10 mm in diameter and 1 mm thick.

This tablet was analysed under mid-infrared Fourier-transform spectroscopy with a Vertex 80v (Bruker OPTIK GmbH; Germany) under a control environment. The measurement resolution is 4 cm^-1 in the interval of 375 - 4500 cm ^-1, the spectrum is in absorbance and the source is MIR with the optic filter. To calibrate these samples, one control tablet made of 100% of KBr was measured at the beginning of each measurement session.

4.2 Raw spectra production

A script was written to produce a raw spectra in absorbance for 375 - 4500 cm ^-1 interval with a 4 cm^-1 resolution and export it under a .txt format.

# 0 Environment setup ##########################################################

# 0.1 Prepare environment ======================================================
# Folder check
getwd()

# Select folder
setwd(getwd())

# Clean up workspace
rm(list = ls(all.names = TRUE))

# 0.2 Install packages =========================================================

# Load packages
install.packages("pacman")  #Install and load the "pacman" package (allow easier download of packages)
library(pacman)
pacman::p_load(prospectr, remotes, caret , dplyr, readr)   ### Install required packages
remotes::install_github("philipp-baumann/simplerspec") #Install package from Baumann for spectral analysis
library(simplerspec)

# 0.3 Show session infos =======================================================
sessionInfo()

## R version 4.4.0 (2024-04-24 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 22621)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=French_France.utf8  LC_CTYPE=French_France.utf8   
## [3] LC_MONETARY=French_France.utf8 LC_NUMERIC=C                  
## [5] LC_TIME=French_France.utf8    
## 
## time zone: Asia/Tbilisi
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] simplerspec_0.2.1 foreach_1.5.2     dplyr_1.1.4       caret_6.0-94     
## [5] lattice_0.22-6    ggplot2_3.5.1     readr_2.1.5       remotes_2.5.0    
## [9] prospectr_0.2.7  
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6         xfun_0.46            bslib_0.8.0         
##  [4] recipes_1.1.0        mathjaxr_1.6-0       tzdb_0.4.0          
##  [7] vctrs_0.6.5          tools_4.4.0          generics_0.1.3      
## [10] curl_5.2.1           stats4_4.4.0         parallel_4.4.0      
## [13] proxy_0.4-27         tibble_3.2.1         fansi_1.0.6         
## [16] ModelMetrics_1.2.2.2 pkgconfig_2.0.3      Matrix_1.7-0        
## [19] data.table_1.15.4    lifecycle_1.0.4      stringr_1.5.1       
## [22] compiler_4.4.0       munsell_0.5.1        codetools_0.2-20    
## [25] htmltools_0.5.8.1    class_7.3-22         sass_0.4.9          
## [28] yaml_2.3.10          prodlim_2024.06.25   pillar_1.9.0        
## [31] jquerylib_0.1.4      MASS_7.3-60.2        cachem_1.1.0        
## [34] gower_1.0.1          iterators_1.0.14     rpart_4.1.23        
## [37] nlme_3.1-164         parallelly_1.38.0    lava_1.8.0          
## [40] tidyselect_1.2.1     digest_0.6.36        stringi_1.8.4       
## [43] future_1.34.0        reshape2_1.4.4       purrr_1.0.2         
## [46] bookdown_0.40        listenv_0.9.1        splines_4.4.0       
## [49] fastmap_1.2.0        grid_4.4.0           colorspace_2.1-1    
## [52] cli_3.6.3            magrittr_2.0.3       survival_3.5-8      
## [55] utf8_1.2.4           e1071_1.7-14         future.apply_1.11.2 
## [58] withr_3.0.2          scales_1.3.0         lubridate_1.9.3     
## [61] timechange_0.3.0     rmarkdown_2.27       globals_0.16.3      
## [64] nnet_7.3-19          timeDate_4032.109    hms_1.1.3           
## [67] evaluate_0.24.0      knitr_1.48           hardhat_1.4.0       
## [70] rlang_1.1.4          Rcpp_1.0.13          glue_1.7.0          
## [73] pROC_1.18.5          ipred_0.9-15         rstudioapi_0.16.0   
## [76] jsonlite_1.8.8       R6_2.5.1             plyr_1.8.9

# 01 Prepare and import data ---------------------------------------------------

# 01.1 Import the raw spectra files ############################################
lfMIR <- read_opus_univ(fnames = dir("./data/spectra/", full.names = TRUE), extract = c("spc"))

# 01.2 Preparing the MIR Data ##################################################

MIRspec_tbl <- lfMIR %>%
  gather_spc() %>%    # Gather list of spectra data into tibble data frame
  resample_spc(wn_lower = 375, wn_upper = 4500, wn_interval = 4) %>%     # Resample spectra to new wavenumber interval
  average_spc(by = "sample_id") # Average replicate scans per sample_id

MIRspec_tbl_rs <- MIRspec_tbl[seq(1, nrow(MIRspec_tbl)), c("sample_id", "metadata", "wavenumbers_rs", "spc_mean")]
MIRspec_tbl_rs <- MIRspec_tbl_rs[,-c(2,3)]  

# Create a data frame and change the columns names
MIRspec_wn <- data.frame(matrix(unlist(MIRspec_tbl_rs$spc_mean), nrow = nrow(MIRspec_tbl_rs), byrow = TRUE), stringsAsFactors = FALSE)
rownames(MIRspec_wn) <- substring(MIRspec_tbl$sample_id, 1)[seq(1, nrow(MIRspec_tbl))]
wn <- list(names(MIRspec_tbl_rs[[2]][[1]]))
wn <- unlist(wn)
names(MIRspec_wn) <- wn
MIRspec_wn <- MIRspec_wn[, -c(1)] # Remove NA values from the 4499  cm-1

# 01.3 Export the MIR raw data #################################################
write.table(MIRspec_wn, "./export/spectra/Spectra_raw_Wavenumber.txt", dec = ".", sep = ";", row.names = TRUE, col.names = TRUE, append = FALSE)

4.3 Prepare the spectra regarding state of the art

4.3.1 Interval interferances

The MIR spectra present interference in different intervals such as 375 - 499 cm^-1 area and 2451 - 2500 cm ^-1 area (Ng et al. 2018).

# 02 Prepare the spectra regarding the state of the art ------------------------

# 2.1 First cleaning of the spectra ============================================

MIRspec_tbl <- lfMIR %>%
  gather_spc() %>%    # Gather list of spectra data into tibble data frame
  resample_spc(wn_lower = 499, wn_upper = 4500, wn_interval = 4) %>%     # Resample spectra to new wavenumber interval
  average_spc(by = "sample_id") # Average replicate scans per sample_id

MIRspec_tbl_rs <- MIRspec_tbl[seq(1, nrow(MIRspec_tbl)), c("sample_id", "metadata", "wavenumbers_rs", "spc_mean")]
MIRspec_tbl_rs <- MIRspec_tbl_rs[,-c(2,3)]  

# Create a data frame and change the columns names
MIRspec_wn <- data.frame(matrix(unlist(MIRspec_tbl_rs$spc_mean), nrow = nrow(MIRspec_tbl_rs), byrow = TRUE), stringsAsFactors = FALSE)
rownames(MIRspec_wn) <- substring(MIRspec_tbl$sample_id, 1)[seq(1, nrow(MIRspec_tbl))]
wn <- list(names(MIRspec_tbl_rs[[2]][[1]]))
wn <- unlist(wn)
names(MIRspec_wn) <- wn
MIRspec_wn <- MIRspec_wn[, -c(1)] # Remove NA values from the 4499  cm-1

# 2.1 Remove interference =====================================================

MIRspec_wn <- MIRspec_wn[, -c(500:511)]

4.3.2 Outlier values

Some values of the spectra can also present high interference and therefore should be removed or at least be noticed (Ng et al. 2018). This concerned values over 2 or lower than -2. Here we export the different rows containing values over 2.

# Check max value and remove it
max(MIRspec_wn)  # Remove value higher than + 2 (Ng et al., 2018; Curran et al., 1996)
MIRspec_wn_remove_up <- MIRspec_wn[apply(MIRspec_wn, 1, function(row) any(row > 2)), ] 

# Check lower value and remove it
min(MIRspec_wn)  # Remove value lower than - 2 (Ng et al., 2018; Curran et al., 1996)
MIRspec_wn <- MIRspec_wn[!rowSums(MIRspec_wn < -2),]
MIRspec_wn_remove_low <- MIRspec_wn[apply(MIRspec_wn, 1, function(row) any(row < - 2)), ]

MIRspec_wn_remove <- cbind(MIRspec_wn_remove_low, MIRspec_wn_remove_up)
write.table(MIRspec_wn_remove, "./export/Interference_spectra.txt", dec = ".", sep = ";", row.names = TRUE, col.names = TRUE, append = FALSE)

4.4 Convert into different spectra variation

4.4.1 Convert into wavelength

# 03 Convert into different spectra variation ----------------------------------

# 3.1 Convert in Wavelength ####################################################
MIRspec_wn <- anti_join(MIRspec_wn, MIRspec_wn_remove, by = names(MIRspec_wn))
MIRspec_wl <- MIRspec_wn
wn <- (1 / as.numeric(names(MIRspec_wn))) * 1e7
wn <- round(wn, digits=0)
names(MIRspec_wl) <- wn 
MIRspec_wl <- na.omit(MIRspec_wl)

4.4.2 Spectra transformations

We converted the MIR data into a total of 14 spectra transformations according to literature review (Ludwig et al. 2023; Ng et al. 2018)

# 3.2 Prepare the functions ####################################################

# Remove near zero variable
remove_nzv <- function(x){
  y <- nearZeroVar(x, saveMetrics = TRUE)
  ifelse(sum(y$nzv == TRUE) == 0, x <- x, x <- x[-nearZeroVar(x)])
  return(x)
}       

# Remove variable with high correlation
remove_hcd <- function(x){
  y <- findCorrelation(x, cutoff = .98)
  ifelse(sum(y) == 0, x <- x, x <- x[,-y])
  return(x)
}

# 3.3 Convert in other spectra #################################################

Spectra <- MIRspec_wn
#Different transformation of the Spectra according to Ludwig et al. 2023
IRspectraList_Raw <- list("SG 1.5" = as.data.frame(prospectr::savitzkyGolay(Spectra, m = 1, p = 1, w = 5)),
                          "SG 1.11" = as.data.frame(prospectr::savitzkyGolay(Spectra, m = 1, p = 1, w = 11)),
                          "SG 1.17" = as.data.frame(prospectr::savitzkyGolay(Spectra, m = 1, p = 1, w = 17)),
                          "SG 1.23" = as.data.frame(prospectr::savitzkyGolay(Spectra, m = 1, p = 1, w = 23)),
                          "SG 2.5" = as.data.frame(prospectr::savitzkyGolay(Spectra, m = 1, p = 2, w = 5)),
                          "SG 2.11" = as.data.frame(prospectr::savitzkyGolay(Spectra, m = 1, p = 2, w = 11)),
                          "SG 2.17" = as.data.frame(prospectr::savitzkyGolay(Spectra, m = 1, p = 2, w = 17)),
                          "SG 2.23" = as.data.frame(prospectr::savitzkyGolay(Spectra, m = 1, p = 2, w = 23)),
                          "moving averages 5" = as.data.frame(prospectr::movav(Spectra, w = 5)),
                          "moving averages 11" = as.data.frame(prospectr::movav(Spectra, w = 11)),
                          "moving averages 17" = as.data.frame(prospectr::movav(Spectra, w = 17)),
                          "moving averages 23" = as.data.frame(prospectr::movav(Spectra, w = 23)),
                          "SNV-SG" = as.data.frame(prospectr::standardNormalVariate(prospectr::savitzkyGolay(Spectra, m = 1, p = 2, w = 11))), #Best option for machine learning treatment (See Ng et al. 2018)
                          "continuum removal" = as.data.frame(prospectr::continuumRemoval(Spectra, as.numeric(colnames(Spectra)), type = "R")))

IRspectraList <- IRspectraList_Raw

# Remove the near zero and highly correlated values
for (i in 1:length(IRspectraList)) {
  IRspectraList[[i]] <- remove_hcd(IRspectraList[[i]])
  IRspectraList[[i]] <- remove_nzv(IRspectraList[[i]])
}

IRspectraList <- c(IRspectraList, list("raw" = as.data.frame(Spectra)))

# 3.4 Export all the spectra ###################################################

#Export csv of files
for (i in names(IRspectraList)) {
  write.table(IRspectraList[i], file = paste0("./export/spectra/Full_spectra_",names(IRspectraList[i]),".txt"), dec = ".", sep = ";", row.names = TRUE, col.names = TRUE, append = FALSE, fileEncoding = "UTF-8")
}

write.table(MIRspec_wn, "./export/spectra/Spectra_Wavenumber.txt", dec = ".", sep = ";", row.names = TRUE, col.names = TRUE, append = FALSE)
write.table(MIRspec_wl, "./export/spectra/Spectra_Wavelength.txt", dec = ".", sep = ";", row.names = TRUE, col.names = TRUE, append = FALSE)

4.5 Plot the spectra

4.5.1 The spectra in absorbance

# 04 Plot the Spectrum of the spectra ------------------------------------------

# 4.1 Plot the absorbance ######################################################

IRSpectra <- as.data.frame(row.names(MIRspec_wl))
IRSpectra$wn <- MIRspec_wn
IRSpectra$wnA<- log(1/MIRspec_wn) # Convert to wave number absorbance
IRSpectra$wl <- MIRspec_wl
IRSpectra$wlA <- log(1/MIRspec_wl) # Convert to wavelength absorbance
rownames(IRSpectra) <- IRSpectra$`row.names(MIRspec_wl)`

png("Soil spectra in wavenumbers absorbance.png", width = 297, height = 210, units = "mm", res = 300)
pdf("Soil spectra in wavenumbers absorbance.pdf", width = 10*2, height = 6*2)

matplot(x = colnames(IRSpectra$wn), y = t(IRSpectra$wnA),
        xlab = expression(paste("Wavenumber ", "(cm"^"-1", ")")),
        ylab = "Absorbance",
        type = "l",   #"l" = ligne
        lty = 1,
        col = 1:nrow(IRSpectra$wn))
dev.off()


png("Soil spectra in wavelength absorbance.png", width = 297, height = 210, units = "mm", res = 300)
pdf("Soil spectra in wavelength absorbance.pdf", width = 10*2, height = 6*2)

matplot(x = colnames(IRSpectra$wl), y = t(IRSpectra$wlA),
        xlab = "Wavelength /nm ",
        ylab = "Absorbance",
        type = "l",   #"l" = ligne
        lty = 1,
        col = 1:nrow(IRSpectra$wl))
dev.off()

4.5.2 The spectra in reflectance

# 4.2 Plot the reflectance #####################################################
png("Soil spectra in wavenumbers reflectance.png", width = 297, height = 210, units = "mm", res = 300)
pdf("Soil spectra in wavenumbers reflectance.pdf", width = 10*2, height = 6*2)

matplot(x = colnames(IRSpectra$wn), y = t(IRSpectra$wn),
        xlab = expression(paste("Wavenumber ", "(cm"^"-1", ")")),
        ylab = "Reflectance",
        type = "l",   #"l" = ligne
        lty = 1,
        col = 1:nrow(IRSpectra$wn))
dev.off()


png("Soil spectra in wavelength reflectance.png", width = 297, height = 210, units = "mm", res = 300)
pdf("Soil spectra in wavelength reflectance.pdf", width = 10*2, height = 6*2)

matplot(x = colnames(IRSpectra$wl), y = t(IRSpectra$wl),
        xlab = "Wavelength /nm ",
        ylab = "Reflectance",
        type = "l",   #"l" = ligne
        lty = 1,
        col = 1:nrow(IRSpectra$wl))
dev.off()

4.6 Kennard Stone sampling

The Kennard stone sampling is used to select a wide range of a population that will represent the diversity of the individuals (Kennard and Stone 1969). This sampling strategy has proven to be efficient in the soils spectrometry context (Ramirez-Lopez et al. 2014). We selected different numbers of samples for each depth increment with an over-representation of the topsoil (0 - 10 cm). All the samples from 2017 - 2018 were analysed, so no sampling strategy has been applied to them.

(#tab:sampling table)Ratio of C factor between different land uses.
Depth (cm)	Year	Number
0 - 10	2022	30
10 - 30	2022	5
30 - 50	2022	5
50 - 70	2022	5
70 - 100	2022	5
0 - 10	2023	10
10 - 30	2023	4
30 - 50	2023	6
50 - 70	2023	6
70 - 100	2023	3

In the selection of samples, you can choose the year and depth increment. Here, we show a selection for the 2022 campaign at 0 - 10 cm depth. The table Samples_info use the information collected during the field campaign and the Field_observations file. The different entries are:

Lab_ID Laboratory number gave at the samples when enter into TÃ¼bingen Soil Science and Geomrophology laboratory inventory.
Site_name the site’s name according to the CLHS sampling.
Depth_cm depth increment of the sampling (0 - 10 - 30 - 50 - 70 - 100 cm).
X_WGS84 longitude in WGS84 (epsg:4326), measured from Garmin, GPSMAP 60Cx with ± 10 - 3 m accuracy (depending of satellite coverage).
Y_WGS84 latitude in WGS84 (epsg:4326), measured from Garmin, GPSMAP 60Cx with ± 10 - 3 m accuracy (depending of satellite coverage).

# 05 Kennard Stone sampling ----------------------------------------------------
# 5.1 Select the samples #######################################################

Samples_info <- read_delim("./data/Samples_info.csv", delim = ";")
selection <- Samples_info[Samples_info$Depth_cm == "0_10" & grepl("B07_2022", Samples_info$Site_name), ] # You can select the year and the depth

# Select the samples to analyse
MIRsample <- MIRspec_wl[rownames(MIRspec_wl) %in% selection$Lab_ID,]

# 5.2 Run the sampling #########################################################
sample <- kenStone(MIRsample, k = 30, metric = "euclid") # Make samples with Kennard Stone in euclidian distance

MIRsample <- MIRsample[sample$model,]
MIRsample <- row.names(MIRsample)
write.table(MIRsample, "./export/samples/B07_2022 Samples 0 - 10 cm.txt", dec = ".", sep = ";", row.names = FALSE, col.names = FALSE, append = FALSE)

References

Bahrami, Amir, Majid Danesh, and Mehdi Bahrami. 2022. “Studying Sand Component of Soil Texture Using the Spectroscopic Method.” Infrared Physics & Technology 122 (May): 104056. https://doi.org/10.1016/j.infrared.2022.104056.

Ge, Y., Alexandre M. J.-C. Wadoux, and Y Peng. 2022. A Primer on Soil Analysis Using Visible and Near-Infrared (Vis-NIR) and Mid-Infrared (MIR) Spectroscopy. FAO.

Kennard, R. W., and L. A. Stone. 1969. “Computer Aided Design of Experiments.” Technometrics 11 (1): 137–48. https://doi.org/10.2307/1266770.

Ludwig, Bernard, Isabel Greenberg, Michael Vohland, and Kerstin Michel. 2023. “Optimised Use of Data Fusion and Memory‐based Learning with an Austrian Soil Library for Predictions with Infrared Data.” European Journal of Soil Science 74 (4): e13394. https://doi.org/10.1111/ejss.13394.

Ng, Wartini, Brendan Malone, Budiman Minasny, and Sangho Jeon. 2022. “Near and Mid Infrared Soil Spectroscopy.” In Reference Module in Earth Systems and Environmental Sciences, B9780128229743000227. Elsevier. https://doi.org/10.1016/B978-0-12-822974-3.00022-7.

Ng, Wartini, Budiman Minasny, Brendan Malone, and Patrick Filippi. 2018. “In Search of an Optimum Sampling Algorithm for Prediction of Soil Properties from Infrared Spectra.” PeerJ 6 (October): e5722. https://doi.org/10.7717/peerj.5722.

Ramirez-Lopez, Leonardo, Karsten Schmidt, Thorsten Behrens, Bas van Wesemael, Jose A. M. Demattê, and Thomas Scholten. 2014. “Sampling Optimal Calibration Sets in Soil Infrared Spectroscopy.” Geoderma 226-227 (August): 140–50. https://doi.org/10.1016/j.geoderma.2014.02.002.

Stenberg, Bo, Raphael A. Viscarra Rossel, Abdul Mounem Mouazen, and Johanna Wetterlind. 2010. “Visible and Near Infrared Spectroscopy in Soil Science.” In Advances in Agronomy, 107:163–215. Elsevier. https://doi.org/10.1016/S0065-2113(10)07005-7.

Viscarra Rossel, R. A., T. Behrens, E. Ben-Dor, D. J. Brown, J. A. M. Demattê, K. D. Shepherd, Z. Shi, et al. 2016. “A Global Spectral Library to Characterize the World’s Soil.” Earth-Science Reviews 155 (April): 198–230. https://doi.org/10.1016/j.earscirev.2016.01.012.

Wadoux, Alexandre M. J.-C., Brendan Malone, Budiman Minasny, Mario Fajardo, and Alex B. McBratney. 2021. Soil Spectral Inference with R Analysing Digital Soil Spectra. S.l.: SPRINGER NATURE. http://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&db=nlabk&AN=2760028.