6 Spectra model prediction
We used only the raw spectra and three spectra transformations from the 15 spectra produced in the FTIR part. This transformation are improving the quality of the prediction model (Ludwig et al. 2023). In total we had four spectra:
- Raw spectra.
- SG-2-11 the Savitzky-Golay with a polynomial order of 2 and a window size of 11.
- Moving average of 11. -SNV-SG standard normal variate transformation on the Savitzky-Golay with a polynomial order of 2 and a window size of 11.
6.1 Model implementation
We implemented the prediction model in Python language. A sanity check was performed to assess the conformity of the data set Sanity_Check_code.

We split the data into an 85% training set and a 15% testing set.
As EC values had outliers we removed all samples with values higher than 400 µS/cm: 53802 (827 µS/cm), 53871 (857 µS/cm), 53872 (932 µS/cm), 53902 (718 µS/cm), 53938 (447 µS/cm), 53954 (426 µS/cm) and 55852 (636 µS/cm).
All the properties were predicted using the Cubist model under Python environment Cubist_Kfold_code. A 5 time stratified k-fold was used with a total of 9 combinations for the hyperparameter. The stratified k-fold helps have a higher diversity of sampling in the training and testing set.
n_rules | n_committees |
---|---|
20 | 5 |
30 | 5 |
40 | 5 |
20 | 10 |
30 | 10 |
40 | 10 |
20 | 15 |
30 | 15 |
40 | 15 |
6.2 Model evaluation
6.2.1 All spectra tranformations metrics
Metric | MSE | MAE | R2 | RMSE | CCC | RPIQ |
---|---|---|---|---|---|---|
pH | 0.0219010 | 0.1171735 | 0.4401298 | 0.1479898 | 0.6082283 | 1.2423942 |
CaCO3 | 16.9688594 | 2.8067966 | 0.8882458 | 4.1193275 | 0.9485314 | 2.3192395 |
Nt | 0.0029586 | 0.0325372 | 0.6229612 | 0.0543926 | 0.7375194 | 1.4595850 |
Ct | 0.3130813 | 0.3976554 | 0.8913525 | 0.5595367 | 0.9495172 | 2.3885632 |
Corg | 0.4488793 | 0.4225908 | 0.5264148 | 0.6699846 | 0.6461363 | 0.9218613 |
EC | 4402.7957520 | 52.3580817 | 0.1141583 | 66.3535662 | 0.3278390 | 0.9523699 |
Sand | 54.1693978 | 5.1840530 | 0.7674734 | 7.3599863 | 0.8707758 | 2.3624346 |
Silt | 86.6205805 | 6.5901216 | 0.2136148 | 9.3070178 | 0.4181861 | 0.8782774 |
Clay | 63.5131858 | 5.2104216 | 0.6930271 | 7.9695160 | 0.8342033 | 2.3912120 |
MWD | 0.0029143 | 0.0373586 | 0.4745772 | 0.0539841 | 0.6278402 | 1.2472350 |
\[\\[1.5cm]\]
Metric | MSE | MAE | R2 | RMSE | CCC | RPIQ |
---|---|---|---|---|---|---|
pH | 0.0210390 | 0.1141724 | 0.4621653 | 0.1450483 | 0.6475573 | 1.221859 |
CaCO3 | 17.9060238 | 2.8147659 | 0.8820737 | 4.2315510 | 0.9460955 | 2.389729 |
Nt | 0.0028760 | 0.0288061 | 0.6334873 | 0.0536280 | 0.7473087 | 1.410630 |
Ct | 0.2587904 | 0.3585550 | 0.9101929 | 0.5087145 | 0.9600705 | 2.228431 |
Corg | 0.4632358 | 0.4024382 | 0.5112682 | 0.6806143 | 0.6651811 | 1.087405 |
EC | 4018.7082500 | 48.6017677 | 0.1914366 | 63.3932824 | 0.4001273 | 1.114983 |
Sand | 62.0563611 | 5.4404186 | 0.7336180 | 7.8775860 | 0.8500844 | 2.300951 |
Silt | 98.1877111 | 7.2681208 | 0.1086026 | 9.9089712 | 0.3801260 | 0.875066 |
Clay | 49.4520568 | 4.9724694 | 0.7609876 | 7.0322156 | 0.8670415 | 2.343124 |
MWD | 0.0029262 | 0.0377074 | 0.4724332 | 0.0540942 | 0.6365206 | 1.331620 |
\[\\[1.5cm]\]
Metric | MSE | MAE | R2 | RMSE | CCC | RPIQ |
---|---|---|---|---|---|---|
pH | 0.0231528 | 0.1177269 | 0.4081296 | 0.1521604 | 0.5814332 | 0.9712110 |
CaCO3 | 14.9984789 | 2.6152336 | 0.9012224 | 3.8727870 | 0.9545313 | 2.7690478 |
Nt | 0.0050750 | 0.0467659 | 0.3532399 | 0.0712392 | 0.4579116 | 0.4787131 |
Ct | 0.4564530 | 0.4259481 | 0.8415988 | 0.6756130 | 0.9176226 | 1.8099295 |
Corg | 0.7257209 | 0.5595446 | 0.2343362 | 0.8518925 | 0.3608645 | 0.3591207 |
EC | 4740.0612760 | 55.0806415 | 0.0463005 | 68.8481029 | 0.3026389 | 0.9539687 |
Sand | 55.5155791 | 5.6998147 | 0.7616948 | 7.4508777 | 0.8668046 | 2.4868337 |
Silt | 74.3838628 | 6.0993695 | 0.3247059 | 8.6246080 | 0.5037383 | 0.9129198 |
Clay | 58.4387460 | 5.9952896 | 0.7175530 | 7.6445239 | 0.8360514 | 2.4276119 |
MWD | 0.0031091 | 0.0395931 | 0.4394541 | 0.0557593 | 0.6241677 | 1.1867736 |
\[\\[1.5cm]\]
Metric | MSE | MAE | R2 | RMSE | CCC | RPIQ |
---|---|---|---|---|---|---|
pH | 0.0235093 | 0.1187346 | 0.3990151 | 0.1533275 | 0.6013606 | 0.8891613 |
CaCO3 | 16.8746809 | 2.8344541 | 0.8888660 | 4.1078803 | 0.9489888 | 2.7618729 |
Nt | 0.0013671 | 0.0217745 | 0.8257789 | 0.0369741 | 0.8966433 | 1.7119021 |
Ct | 0.2524407 | 0.3139760 | 0.9123964 | 0.5024348 | 0.9616470 | 2.7212452 |
Corg | 0.3671160 | 0.3711759 | 0.6126784 | 0.6059009 | 0.7584114 | 1.1525632 |
EC | 3507.4818920 | 45.0053330 | 0.2942953 | 59.2239976 | 0.4909250 | 0.9349021 |
Sand | 44.1217193 | 4.7525580 | 0.8106039 | 6.6424182 | 0.8969430 | 2.6565763 |
Silt | 68.4702337 | 6.2101832 | 0.3783928 | 8.2746742 | 0.5599477 | 0.9772474 |
Clay | 63.5502746 | 5.6798514 | 0.6928479 | 7.9718426 | 0.8251282 | 2.2699483 |
MWD | 0.0029995 | 0.0373378 | 0.4592173 | 0.0547675 | 0.6325722 | 1.1640412 |
6.2.2 Selected transformed spectra
Regarding the results of the predictions, we selected for each variables the following spectra:
- pH Raw spectra.
- CaCO\(_3\) SG.2.11 spectra.
- Nt SNV-SG spectra.
- Ct SNV-SG spectra.
- SOC SNV-SG spectra.
- EC SNV-SG spectra.
- Sand SNV-SG spectra.
- Silt SNV-SG spectra.
- Clay SG.2.11 spectra.
- MWD Raw spectra.
## Warning: Removed 39 rows containing missing values or values outside the scale range
## (`geom_point()`).
6.3 Predicted values
6.3.1 Full predicted values table
The full table of 2022 and 2023 samples predicted is visible below or can be found on the deposit as Mir_spectra_prediction file or at : Bellat, Mathias; Glissmann, Benjamin; Rentschler, Tobias; Sconzo, Paola; Pfälzner, Peter; Brifkany, Bekas; Scholten, Thomas (2024): “Soil properties predicted on mid-infrared (MIR) spectroscopy measurements in North-Western Kurdistan region, Iraq [dataset]”. PANGAEA, https://doi.org/10.1594/PANGAEA.973700
We removed the sample with ID 55864 which is an outlier.
6.3.2 Boxplot of the predicted values
\[\\[1.5cm]\]
\[\\[1.5cm]\]
\[\\[1.5cm]\]
\[\\[1.5cm]\]
\[\\[1.5cm]\]
6.3.3 Texture of the predicted samples
We ploted the soil texture according to (WRB 2006) classification.
\[\\[1.5cm]\]
\[\\[1.5cm]\]
\[\\[1.5cm]\]
\[\\[1.5cm]\]
\[\\[1.5cm]\]