4 Fourier-transform infrared spectroscopy of soil samples
4.1 Protocol and devices
We analysed all of our 29 samples from 2017-2018 campaigns and 532 from 2022-2023 campaigns with mid-infrared spectroscopy to serve as predictors for the prediction model on the soil properties. These techniques are now well knowned and commonly used in soil science as they allow spare time and money on the different laboratory measurements (Wadoux et al. 2021; Viscarra Rossel et al. 2016; Ng et al. 2022; Ge, Wadoux, and Peng 2022; Stenberg et al. 2010; Bahrami, Danesh, and Bahrami 2022).
Before being analysed with FTIR the samples were ground under 1 µm with a Pulverisette 5/4, classic line (Fritsh, Idar-Oberstein, Germany) in a 250 ml stainless steel hardness container (ISO: X105CrMo17) with five 20 mm sintered corrodium (99.7 % Al2O3) grinding balls. The settings were set at 350 turns per minute for 12 minutes in total.
The last step before FTIR analyse was to realise lenses from the samples with KBr pressling method. 250 mg of potassium brodime (KBr) and 1-1.3 mg of sample substance are mixed and then loaded in a hydraulic press under vacuum with around 10 - 11 tonnes and pressed for 1-2 minutes to form a transparent tablet, e.g. 10 mm in diameter and 1 mm thick.
This tablet was analysed under mid-infrared Fourier-transform spectroscopy with a Vertex 80v (Bruker OPTIK GmbH; Germany) under a control environment. The measurement resolution is 4 cm-1 in the interval of 375 - 4500 cm -1, the spectrum is in absorbance and the source is MIR with the optic filter. To calibrate these samples, one control tablet made of 100% of KBr was measured at the beginning of each measurement session.
4.2 Raw spectra production
A script was written to produce a raw spectra in absorbance for 375 - 4500 cm -1 interval with a 4 cm-1 resolution and export it under a .txt format.
# 0 Environment setup ##########################################################
# 0.1 Prepare environment ======================================================
# Folder check
getwd()
# Select folder
setwd(getwd())
# Clean up workspace
rm(list = ls(all.names = TRUE))
# 0.2 Install packages =========================================================
# Load packages
install.packages("pacman") #Install and load the "pacman" package (allow easier download of packages)
library(pacman)
pacman::p_load(prospectr, remotes, caret , dplyr, readr) ### Install required packages
remotes::install_github("philipp-baumann/simplerspec") #Install package from Baumann for spectral analysis
library(simplerspec)
## R version 4.4.0 (2024-04-24 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 22621)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=French_France.utf8 LC_CTYPE=French_France.utf8
## [3] LC_MONETARY=French_France.utf8 LC_NUMERIC=C
## [5] LC_TIME=French_France.utf8
##
## time zone: Asia/Tbilisi
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] simplerspec_0.2.1 foreach_1.5.2 dplyr_1.1.4 caret_6.0-94
## [5] lattice_0.22-6 ggplot2_3.5.1 readr_2.1.5 remotes_2.5.0
## [9] prospectr_0.2.7
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.6 xfun_0.46 bslib_0.8.0
## [4] recipes_1.1.0 mathjaxr_1.6-0 tzdb_0.4.0
## [7] vctrs_0.6.5 tools_4.4.0 generics_0.1.3
## [10] curl_5.2.1 stats4_4.4.0 parallel_4.4.0
## [13] proxy_0.4-27 tibble_3.2.1 fansi_1.0.6
## [16] ModelMetrics_1.2.2.2 pkgconfig_2.0.3 Matrix_1.7-0
## [19] data.table_1.15.4 lifecycle_1.0.4 stringr_1.5.1
## [22] compiler_4.4.0 munsell_0.5.1 codetools_0.2-20
## [25] htmltools_0.5.8.1 class_7.3-22 sass_0.4.9
## [28] yaml_2.3.10 prodlim_2024.06.25 pillar_1.9.0
## [31] jquerylib_0.1.4 MASS_7.3-60.2 cachem_1.1.0
## [34] gower_1.0.1 iterators_1.0.14 rpart_4.1.23
## [37] nlme_3.1-164 parallelly_1.38.0 lava_1.8.0
## [40] tidyselect_1.2.1 digest_0.6.36 stringi_1.8.4
## [43] future_1.34.0 reshape2_1.4.4 purrr_1.0.2
## [46] bookdown_0.40 listenv_0.9.1 splines_4.4.0
## [49] fastmap_1.2.0 grid_4.4.0 colorspace_2.1-1
## [52] cli_3.6.3 magrittr_2.0.3 survival_3.5-8
## [55] utf8_1.2.4 e1071_1.7-14 future.apply_1.11.2
## [58] withr_3.0.2 scales_1.3.0 lubridate_1.9.3
## [61] timechange_0.3.0 rmarkdown_2.27 globals_0.16.3
## [64] nnet_7.3-19 timeDate_4032.109 hms_1.1.3
## [67] evaluate_0.24.0 knitr_1.48 hardhat_1.4.0
## [70] rlang_1.1.4 Rcpp_1.0.13 glue_1.7.0
## [73] pROC_1.18.5 ipred_0.9-15 rstudioapi_0.16.0
## [76] jsonlite_1.8.8 R6_2.5.1 plyr_1.8.9
# 01 Prepare and import data ---------------------------------------------------
# 01.1 Import the raw spectra files ############################################
lfMIR <- read_opus_univ(fnames = dir("./data/spectra/", full.names = TRUE), extract = c("spc"))
# 01.2 Preparing the MIR Data ##################################################
MIRspec_tbl <- lfMIR %>%
gather_spc() %>% # Gather list of spectra data into tibble data frame
resample_spc(wn_lower = 375, wn_upper = 4500, wn_interval = 4) %>% # Resample spectra to new wavenumber interval
average_spc(by = "sample_id") # Average replicate scans per sample_id
MIRspec_tbl_rs <- MIRspec_tbl[seq(1, nrow(MIRspec_tbl)), c("sample_id", "metadata", "wavenumbers_rs", "spc_mean")]
MIRspec_tbl_rs <- MIRspec_tbl_rs[,-c(2,3)]
# Create a data frame and change the columns names
MIRspec_wn <- data.frame(matrix(unlist(MIRspec_tbl_rs$spc_mean), nrow = nrow(MIRspec_tbl_rs), byrow = TRUE), stringsAsFactors = FALSE)
rownames(MIRspec_wn) <- substring(MIRspec_tbl$sample_id, 1)[seq(1, nrow(MIRspec_tbl))]
wn <- list(names(MIRspec_tbl_rs[[2]][[1]]))
wn <- unlist(wn)
names(MIRspec_wn) <- wn
MIRspec_wn <- MIRspec_wn[, -c(1)] # Remove NA values from the 4499 cm-1
# 01.3 Export the MIR raw data #################################################
write.table(MIRspec_wn, "./export/spectra/Spectra_raw_Wavenumber.txt", dec = ".", sep = ";", row.names = TRUE, col.names = TRUE, append = FALSE)
4.3 Prepare the spectra regarding state of the art
4.3.1 Interval interferances
The MIR spectra present interference in different intervals such as 375 - 499 cm-1 area and 2451 - 2500 cm -1 area (Ng et al. 2018).
# 02 Prepare the spectra regarding the state of the art ------------------------
# 2.1 First cleaning of the spectra ============================================
MIRspec_tbl <- lfMIR %>%
gather_spc() %>% # Gather list of spectra data into tibble data frame
resample_spc(wn_lower = 499, wn_upper = 4500, wn_interval = 4) %>% # Resample spectra to new wavenumber interval
average_spc(by = "sample_id") # Average replicate scans per sample_id
MIRspec_tbl_rs <- MIRspec_tbl[seq(1, nrow(MIRspec_tbl)), c("sample_id", "metadata", "wavenumbers_rs", "spc_mean")]
MIRspec_tbl_rs <- MIRspec_tbl_rs[,-c(2,3)]
# Create a data frame and change the columns names
MIRspec_wn <- data.frame(matrix(unlist(MIRspec_tbl_rs$spc_mean), nrow = nrow(MIRspec_tbl_rs), byrow = TRUE), stringsAsFactors = FALSE)
rownames(MIRspec_wn) <- substring(MIRspec_tbl$sample_id, 1)[seq(1, nrow(MIRspec_tbl))]
wn <- list(names(MIRspec_tbl_rs[[2]][[1]]))
wn <- unlist(wn)
names(MIRspec_wn) <- wn
MIRspec_wn <- MIRspec_wn[, -c(1)] # Remove NA values from the 4499 cm-1
# 2.1 Remove interference =====================================================
MIRspec_wn <- MIRspec_wn[, -c(500:511)]
4.3.2 Outlier values
Some values of the spectra can also present high interference and therefore should be removed or at least be noticed (Ng et al. 2018). This concerned values over 2 or lower than -2. Here we export the different rows containing values over 2.
# Check max value and remove it
max(MIRspec_wn) # Remove value higher than + 2 (Ng et al., 2018; Curran et al., 1996)
MIRspec_wn_remove_up <- MIRspec_wn[apply(MIRspec_wn, 1, function(row) any(row > 2)), ]
# Check lower value and remove it
min(MIRspec_wn) # Remove value lower than - 2 (Ng et al., 2018; Curran et al., 1996)
MIRspec_wn <- MIRspec_wn[!rowSums(MIRspec_wn < -2),]
MIRspec_wn_remove_low <- MIRspec_wn[apply(MIRspec_wn, 1, function(row) any(row < - 2)), ]
MIRspec_wn_remove <- cbind(MIRspec_wn_remove_low, MIRspec_wn_remove_up)
write.table(MIRspec_wn_remove, "./export/Interference_spectra.txt", dec = ".", sep = ";", row.names = TRUE, col.names = TRUE, append = FALSE)
4.4 Convert into different spectra variation
4.4.1 Convert into wavelength
# 03 Convert into different spectra variation ----------------------------------
# 3.1 Convert in Wavelength ####################################################
MIRspec_wn <- anti_join(MIRspec_wn, MIRspec_wn_remove, by = names(MIRspec_wn))
MIRspec_wl <- MIRspec_wn
wn <- (1 / as.numeric(names(MIRspec_wn))) * 1e7
wn <- round(wn, digits=0)
names(MIRspec_wl) <- wn
MIRspec_wl <- na.omit(MIRspec_wl)
4.4.2 Spectra transformations
We converted the MIR data into a total of 14 spectra transformations according to literature review (Ludwig et al. 2023; Ng et al. 2018)
# 3.2 Prepare the functions ####################################################
# Remove near zero variable
remove_nzv <- function(x){
y <- nearZeroVar(x, saveMetrics = TRUE)
ifelse(sum(y$nzv == TRUE) == 0, x <- x, x <- x[-nearZeroVar(x)])
return(x)
}
# Remove variable with high correlation
remove_hcd <- function(x){
y <- findCorrelation(x, cutoff = .98)
ifelse(sum(y) == 0, x <- x, x <- x[,-y])
return(x)
}
# 3.3 Convert in other spectra #################################################
Spectra <- MIRspec_wn
#Different transformation of the Spectra according to Ludwig et al. 2023
IRspectraList_Raw <- list("SG 1.5" = as.data.frame(prospectr::savitzkyGolay(Spectra, m = 1, p = 1, w = 5)),
"SG 1.11" = as.data.frame(prospectr::savitzkyGolay(Spectra, m = 1, p = 1, w = 11)),
"SG 1.17" = as.data.frame(prospectr::savitzkyGolay(Spectra, m = 1, p = 1, w = 17)),
"SG 1.23" = as.data.frame(prospectr::savitzkyGolay(Spectra, m = 1, p = 1, w = 23)),
"SG 2.5" = as.data.frame(prospectr::savitzkyGolay(Spectra, m = 1, p = 2, w = 5)),
"SG 2.11" = as.data.frame(prospectr::savitzkyGolay(Spectra, m = 1, p = 2, w = 11)),
"SG 2.17" = as.data.frame(prospectr::savitzkyGolay(Spectra, m = 1, p = 2, w = 17)),
"SG 2.23" = as.data.frame(prospectr::savitzkyGolay(Spectra, m = 1, p = 2, w = 23)),
"moving averages 5" = as.data.frame(prospectr::movav(Spectra, w = 5)),
"moving averages 11" = as.data.frame(prospectr::movav(Spectra, w = 11)),
"moving averages 17" = as.data.frame(prospectr::movav(Spectra, w = 17)),
"moving averages 23" = as.data.frame(prospectr::movav(Spectra, w = 23)),
"SNV-SG" = as.data.frame(prospectr::standardNormalVariate(prospectr::savitzkyGolay(Spectra, m = 1, p = 2, w = 11))), #Best option for machine learning treatment (See Ng et al. 2018)
"continuum removal" = as.data.frame(prospectr::continuumRemoval(Spectra, as.numeric(colnames(Spectra)), type = "R")))
IRspectraList <- IRspectraList_Raw
# Remove the near zero and highly correlated values
for (i in 1:length(IRspectraList)) {
IRspectraList[[i]] <- remove_hcd(IRspectraList[[i]])
IRspectraList[[i]] <- remove_nzv(IRspectraList[[i]])
}
IRspectraList <- c(IRspectraList, list("raw" = as.data.frame(Spectra)))
# 3.4 Export all the spectra ###################################################
#Export csv of files
for (i in names(IRspectraList)) {
write.table(IRspectraList[i], file = paste0("./export/spectra/Full_spectra_",names(IRspectraList[i]),".txt"), dec = ".", sep = ";", row.names = TRUE, col.names = TRUE, append = FALSE, fileEncoding = "UTF-8")
}
write.table(MIRspec_wn, "./export/spectra/Spectra_Wavenumber.txt", dec = ".", sep = ";", row.names = TRUE, col.names = TRUE, append = FALSE)
write.table(MIRspec_wl, "./export/spectra/Spectra_Wavelength.txt", dec = ".", sep = ";", row.names = TRUE, col.names = TRUE, append = FALSE)
4.5 Plot the spectra
4.5.1 The spectra in absorbance
# 04 Plot the Spectrum of the spectra ------------------------------------------
# 4.1 Plot the absorbance ######################################################
IRSpectra <- as.data.frame(row.names(MIRspec_wl))
IRSpectra$wn <- MIRspec_wn
IRSpectra$wnA<- log(1/MIRspec_wn) # Convert to wave number absorbance
IRSpectra$wl <- MIRspec_wl
IRSpectra$wlA <- log(1/MIRspec_wl) # Convert to wavelength absorbance
rownames(IRSpectra) <- IRSpectra$`row.names(MIRspec_wl)`
png("Soil spectra in wavenumbers absorbance.png", width = 297, height = 210, units = "mm", res = 300)
pdf("Soil spectra in wavenumbers absorbance.pdf", width = 10*2, height = 6*2)
matplot(x = colnames(IRSpectra$wn), y = t(IRSpectra$wnA),
xlab = expression(paste("Wavenumber ", "(cm"^"-1", ")")),
ylab = "Absorbance",
type = "l", #"l" = ligne
lty = 1,
col = 1:nrow(IRSpectra$wn))
dev.off()
png("Soil spectra in wavelength absorbance.png", width = 297, height = 210, units = "mm", res = 300)
pdf("Soil spectra in wavelength absorbance.pdf", width = 10*2, height = 6*2)
matplot(x = colnames(IRSpectra$wl), y = t(IRSpectra$wlA),
xlab = "Wavelength /nm ",
ylab = "Absorbance",
type = "l", #"l" = ligne
lty = 1,
col = 1:nrow(IRSpectra$wl))
dev.off()
4.5.2 The spectra in reflectance
# 4.2 Plot the reflectance #####################################################
png("Soil spectra in wavenumbers reflectance.png", width = 297, height = 210, units = "mm", res = 300)
pdf("Soil spectra in wavenumbers reflectance.pdf", width = 10*2, height = 6*2)
matplot(x = colnames(IRSpectra$wn), y = t(IRSpectra$wn),
xlab = expression(paste("Wavenumber ", "(cm"^"-1", ")")),
ylab = "Reflectance",
type = "l", #"l" = ligne
lty = 1,
col = 1:nrow(IRSpectra$wn))
dev.off()
png("Soil spectra in wavelength reflectance.png", width = 297, height = 210, units = "mm", res = 300)
pdf("Soil spectra in wavelength reflectance.pdf", width = 10*2, height = 6*2)
matplot(x = colnames(IRSpectra$wl), y = t(IRSpectra$wl),
xlab = "Wavelength /nm ",
ylab = "Reflectance",
type = "l", #"l" = ligne
lty = 1,
col = 1:nrow(IRSpectra$wl))
dev.off()
4.6 Kennard Stone sampling
The Kennard stone sampling is used to select a wide range of a population that will represent the diversity of the individuals (Kennard and Stone 1969). This sampling strategy has proven to be efficient in the soils spectrometry context (Ramirez-Lopez et al. 2014). We selected different numbers of samples for each depth increment with an over-representation of the topsoil (0 - 10 cm). All the samples from 2017 - 2018 were analysed, so no sampling strategy has been applied to them.
Depth (cm) | Year | Number |
---|---|---|
0 - 10 | 2022 | 30 |
10 - 30 | 2022 | 5 |
30 - 50 | 2022 | 5 |
50 - 70 | 2022 | 5 |
70 - 100 | 2022 | 5 |
0 - 10 | 2023 | 10 |
10 - 30 | 2023 | 4 |
30 - 50 | 2023 | 6 |
50 - 70 | 2023 | 6 |
70 - 100 | 2023 | 3 |
In the selection of samples, you can choose the year and depth increment. Here, we show a selection for the 2022 campaign at 0 - 10 cm depth. The table Samples_info use the information collected during the field campaign and the Field_observations file. The different entries are:
- Lab_ID Laboratory number gave at the samples when enter into Tübingen Soil Science and Geomrophology laboratory inventory.
- Site_name the site’s name according to the CLHS sampling.
- Depth_cm depth increment of the sampling (0 - 10 - 30 - 50 - 70 - 100 cm).
- X_WGS84 longitude in WGS84 (epsg:4326), measured from Garmin, GPSMAP 60Cx with ± 10 - 3 m accuracy (depending of satellite coverage).
- Y_WGS84 latitude in WGS84 (epsg:4326), measured from Garmin, GPSMAP 60Cx with ± 10 - 3 m accuracy (depending of satellite coverage).
# 05 Kennard Stone sampling ----------------------------------------------------
# 5.1 Select the samples #######################################################
Samples_info <- read_delim("./data/Samples_info.csv", delim = ";")
selection <- Samples_info[Samples_info$Depth_cm == "0_10" & grepl("B07_2022", Samples_info$Site_name), ] # You can select the year and the depth
# Select the samples to analyse
MIRsample <- MIRspec_wl[rownames(MIRspec_wl) %in% selection$Lab_ID,]
# 5.2 Run the sampling #########################################################
sample <- kenStone(MIRsample, k = 30, metric = "euclid") # Make samples with Kennard Stone in euclidian distance
MIRsample <- MIRsample[sample$model,]
MIRsample <- row.names(MIRsample)
write.table(MIRsample, "./export/samples/B07_2022 Samples 0 - 10 cm.txt", dec = ".", sep = ";", row.names = FALSE, col.names = FALSE, append = FALSE)