6 Spectra model prediction

We used only the raw spectra and three spectra transformations from the 15 spectra produced in the FTIR part. This transformation are improving the quality of the prediction model (Ludwig et al. 2023). In total we had four spectra:

Raw spectra.
SG-2-11 the Savitzky-Golay with a polynomial order of 2 and a window size of 11.
Moving average of 11. -SNV-SG standard normal variate transformation on the Savitzky-Golay with a polynomial order of 2 and a window size of 11.

6.1 Model implementation

We implemented the prediction model in Python language. A sanity check was performed to assess the conformity of the data set Sanity_Check_code.

Density plot of pH

We split the data into an 85% training set and a 15% testing set.

As EC values had outliers we removed all samples with values higher than 400 µS/cm: 53802 (827 µS/cm), 53871 (857 µS/cm), 53872 (932 µS/cm), 53902 (718 µS/cm), 53938 (447 µS/cm), 53954 (426 µS/cm) and 55852 (636 µS/cm).

All the properties were predicted using the Cubist model under Python environment Cubist_Kfold_code. A 5 time stratified k-fold was used with a total of 9 combinations for the hyperparameter. The stratified k-fold helps have a higher diversity of sampling in the training and testing set.

Table 6.1: Hyperparamters of the Cubist model.
n_rules	n_committees
20	5
30	5
40	5
20	10
30	10
40	10
20	15
30	15
40	15

6.2 Model evaluation

6.2.1 All spectra tranformations metrics

(#tab:df raw show)Cubist model evaluation metrics on raw spectra.
Metric	MSE	MAE	R2	RMSE	CCC	RPIQ
pH	0.0219010	0.1171735	0.4401298	0.1479898	0.6082283	1.2423942
CaCO3	16.9688594	2.8067966	0.8882458	4.1193275	0.9485314	2.3192395
Nt	0.0029586	0.0325372	0.6229612	0.0543926	0.7375194	1.4595850
Ct	0.3130813	0.3976554	0.8913525	0.5595367	0.9495172	2.3885632
Corg	0.4488793	0.4225908	0.5264148	0.6699846	0.6461363	0.9218613
EC	4402.7957520	52.3580817	0.1141583	66.3535662	0.3278390	0.9523699
Sand	54.1693978	5.1840530	0.7674734	7.3599863	0.8707758	2.3624346
Silt	86.6205805	6.5901216	0.2136148	9.3070178	0.4181861	0.8782774
Clay	63.5131858	5.2104216	0.6930271	7.9695160	0.8342033	2.3912120
MWD	0.0029143	0.0373586	0.4745772	0.0539841	0.6278402	1.2472350

\[\\[1.5cm]\]

(#tab:df moving show)Cubist model evaluation metrics on mooving average 11 transformed spectra.
Metric	MSE	MAE	R2	RMSE	CCC	RPIQ
pH	0.0210390	0.1141724	0.4621653	0.1450483	0.6475573	1.221859
CaCO3	17.9060238	2.8147659	0.8820737	4.2315510	0.9460955	2.389729
Nt	0.0028760	0.0288061	0.6334873	0.0536280	0.7473087	1.410630
Ct	0.2587904	0.3585550	0.9101929	0.5087145	0.9600705	2.228431
Corg	0.4632358	0.4024382	0.5112682	0.6806143	0.6651811	1.087405
EC	4018.7082500	48.6017677	0.1914366	63.3932824	0.4001273	1.114983
Sand	62.0563611	5.4404186	0.7336180	7.8775860	0.8500844	2.300951
Silt	98.1877111	7.2681208	0.1086026	9.9089712	0.3801260	0.875066
Clay	49.4520568	4.9724694	0.7609876	7.0322156	0.8670415	2.343124
MWD	0.0029262	0.0377074	0.4724332	0.0540942	0.6365206	1.331620

\[\\[1.5cm]\]

(#tab:df SG show)Cubist model evaluation metrics on SG transformed spectra.
Metric	MSE	MAE	R2	RMSE	CCC	RPIQ
pH	0.0231528	0.1177269	0.4081296	0.1521604	0.5814332	0.9712110
CaCO3	14.9984789	2.6152336	0.9012224	3.8727870	0.9545313	2.7690478
Nt	0.0050750	0.0467659	0.3532399	0.0712392	0.4579116	0.4787131
Ct	0.4564530	0.4259481	0.8415988	0.6756130	0.9176226	1.8099295
Corg	0.7257209	0.5595446	0.2343362	0.8518925	0.3608645	0.3591207
EC	4740.0612760	55.0806415	0.0463005	68.8481029	0.3026389	0.9539687
Sand	55.5155791	5.6998147	0.7616948	7.4508777	0.8668046	2.4868337
Silt	74.3838628	6.0993695	0.3247059	8.6246080	0.5037383	0.9129198
Clay	58.4387460	5.9952896	0.7175530	7.6445239	0.8360514	2.4276119
MWD	0.0031091	0.0395931	0.4394541	0.0557593	0.6241677	1.1867736

\[\\[1.5cm]\]

(#tab:df SNV show)Cubist model evaluation metrics on SNV-SG transformed spectra.
Metric	MSE	MAE	R2	RMSE	CCC	RPIQ
pH	0.0235093	0.1187346	0.3990151	0.1533275	0.6013606	0.8891613
CaCO3	16.8746809	2.8344541	0.8888660	4.1078803	0.9489888	2.7618729
Nt	0.0013671	0.0217745	0.8257789	0.0369741	0.8966433	1.7119021
Ct	0.2524407	0.3139760	0.9123964	0.5024348	0.9616470	2.7212452
Corg	0.3671160	0.3711759	0.6126784	0.6059009	0.7584114	1.1525632
EC	3507.4818920	45.0053330	0.2942953	59.2239976	0.4909250	0.9349021
Sand	44.1217193	4.7525580	0.8106039	6.6424182	0.8969430	2.6565763
Silt	68.4702337	6.2101832	0.3783928	8.2746742	0.5599477	0.9772474
Clay	63.5502746	5.6798514	0.6928479	7.9718426	0.8251282	2.2699483
MWD	0.0029995	0.0373378	0.4592173	0.0547675	0.6325722	1.1640412

6.2.2 Selected transformed spectra

Regarding the results of the predictions, we selected for each variables the following spectra:

pH Raw spectra.
CaCO\(_3\) SG.2.11 spectra.
Nt SNV-SG spectra.
Ct SNV-SG spectra.
SOC SNV-SG spectra.
EC SNV-SG spectra.
Sand SNV-SG spectra.
Silt SNV-SG spectra.
Clay SG.2.11 spectra.
MWD Raw spectra.

## Warning: Removed 39 rows containing missing values or values outside the scale range
## (`geom_point()`).

6.3 Predicted values

6.3.1 Full predicted values table

The full table of 2022 and 2023 samples predicted is visible below or can be found on the deposit as Mir_spectra_prediction file or at : Bellat, Mathias; Glissmann, Benjamin; Rentschler, Tobias; Sconzo, Paola; PfÃ¤lzner, Peter; Brifkany, Bekas; Scholten, Thomas (2024): “Soil properties predicted on mid-infrared (MIR) spectroscopy measurements in North-Western Kurdistan region, Iraq [dataset]”. PANGAEA, https://doi.org/10.1594/PANGAEA.973700

We removed the sample with ID 55864 which is an outlier.

6.3.2 Boxplot of the predicted values

\[\\[1.5cm]\]

6.3.3 Texture of the predicted samples

We ploted the soil texture according to (WRB 2006) classification.

\[\\[1.5cm]\]

6.4 Spatial ploting

6.4.1 0 - 10 cm depth

6.4.2 10 - 30 cm depth

6.4.3 30 - 50 cm depth

6.4.4 50 - 70 cm depth

6.4.5 70 - 100 cm depth

References

Ludwig, Bernard, Isabel Greenberg, Michael Vohland, and Kerstin Michel. 2023. “Optimised Use of Data Fusion and Memory‐based Learning with an Austrian Soil Library for Predictions with Infrared Data.” European Journal of Soil Science 74 (4): e13394. https://doi.org/10.1111/ejss.13394.

WRB, IUSS Working Group. 2006. Guidelines for Soil Description. 4th ed. Rome: Food; Agriculture Organization of the United Nations.