2 Conditioned Latin hypercube sampling

2.1 Introduction

2.1.1 Purpose

This document presents the results of the sampling design for the 2022 and 2023 summer field campaigns of the B07 subproject in CRC1070. We used a specific sampling technique called conditioned Latin hypercube sampling (cLHS). This sampling strategy based on combinations of different strata/layers gave a diversity of samplings according to the heterogeneity of the layers. We tried different numbers of sampling locations (100, 150, 200, 250, 300, 350 and 400) to finally agree on a set of 100 samplings (We majored the number of samples by 10% - to reach 110 locations - to avoid any issue on the file in case of non-accessible area). The study area covers over 830 km\(^2\) in 2022 and 490 km\(^2\) in 2023. The size of area C explored in 2023 is equal to 1450 km\(^2\). Still, due to inaccessible places in high mountains or conflict zones between PKK (Kurdistan Workers’ Party) and the Turkish army (Ertan 2022), we reduce the explored area.

Sampling areas for 2022 and 2023
Sampling areas for 2022 and 2023

2.1.2 Procedure

We first produced several maps composed of different factors before using them as layers for the conditioned Latin hypercube sampling. We used different software such as GIS 3.24.1 version and SAGA GIS 7.8.2, GoogleEarthEngine (GEE) and R 4.4 with Rstudio. The procedure is partially based on the Sampling package from Dick Brus and its reference book (Brus 2022).

###Input data

All the data listed below are available freely online excepted for the digitized maps (geomorphology).

(#tab:imput_2)Data used in the production of the conditioned Latin hypercube sampling.
Name/ID Original resolution (m) Type/Unit Source
DEM/TE.5 25 Meters ESA and Airbus (2022)
RUSLE map/OT.11 25 Mg/ha\(^{-1}\) year\(^{-1}\) Mathias Bellat
Slope/TE.16 25 Radians ESA and Airbus (2022)
TWI/TE.22 25 - ESA and Airbus (2022)
Soil estimation/OT.12 10 - Nafiseh Kakhani / USGS (2018)
Geomorphology/OT.3 1 : 250 000 - Forti et al. (2021)
Landsat 8 Green/LA.3 30 0.53 - 0.59 µm USGS (2018)
Landsat 8 Red/LA.4 30 0.64 - 0.67 µm USGS (2018)
Landsat 8 NIR/LA.5 30 0.85 - 0.88 µm USGS (2018)
Landsat 8 SWIR1/LA.6 30 1.57 - 1.65 µm USGS (2018)
Landsat 8 SWIR2/LA.7 30 2.11 - 2.29 µm USGS (2018)

2.2 Data treatement

2.2.1 Soil prediction map (Nafiseh Kakhani)

The Landsat 8 images were collected via a Google Earth Engine script on a period covering 2020, the median of the composite image from Tier 1 TOA collection was used. To produce the soil estimation map we based our prediction on a Google Earth Engine written script. The javascript codes for scraping these images and computing the soil estimation indexes are available in the supplementary file inside the “2 - CLHS/code” folder. We computed four soil indexes:

\[ Landsat~8~Clay~index = \frac{SWIR1}{SWIR2} \] Hengl (2007), USDA/GO (former GTAC)

\[ Landsat~8~Ferrous~minerals~ratio = \frac{SWIR1}{NIR} \]

Al-Mousawi, Fouad, and Sissakian (2007), Henrich et al. (2012) (Ferric Oxides), USDA/GO (former GTAC)

\[ Landsat~8~Carbonate~index = \frac{Red- Green}{Red +Green} \]

Pouget et al. (1991) (Color, Hematite/ goethite), USDA/GO (former GTAC)

\[ Landsat~8~Rocks~outcrop~index = \frac{SWIR1-Green}{SWIR1+Green} \]

USDA/GO (former GTAC)

2.2.2 Soil erosion based on RUSLE model

The RUSLE map was previously computed in Chapter 1.

2.2.3 DEM, slope and topographic wetness index

The DEM is on a 25 x 25 m scale based on the ESA data (ESA and Airbus 2022). The slope under SAGA GIS 7.8.2 with the function Slope, Aspect, Curvature with the method by Zevenbergen, L.W., Thorne, C.R. (1987). For the topographic wetness index the Topographic Wetness Index (One step) command was used with Multiflow direction parameters

2.2.4 Geomorphological map

The geomorphological map is computed based on (Forti et al. 2021). We created 17 different layers for each of the geomorphons present.

2.3 conditioned Latin hypercube sampling script

2.3.1 Preparation of the R environement

# 0 Environment setup ##########################################################

# 0.1 Prepare environment ======================================================
# Folder check
getwd()

# Select folder
setwd(getwd())

# Clean up workspace
rm(list = ls(all.names = TRUE))

# 0.2 Install packages =========================================================

# Load packages
library(pacman) #Easier way of loading packages
pacman::p_load(sf, mapview,raster, clhs) # Specify required packages and download it if needed
# 0.3 Show session infos =======================================================
sessionInfo()
## R version 4.4.0 (2024-04-24 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 22621)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=French_France.utf8  LC_CTYPE=French_France.utf8   
## [3] LC_MONETARY=French_France.utf8 LC_NUMERIC=C                  
## [5] LC_TIME=French_France.utf8    
## 
## time zone: Asia/Tbilisi
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] bookdown_0.40  tufte_0.13     rmarkdown_2.27 knitr_1.48     ggplot2_3.5.1 
##  [6] mapview_2.11.2 clhs_0.9.0     sf_1.0-16      raster_3.6-26  sp_2.1-4      
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6       xfun_0.46          bslib_0.8.0        htmlwidgets_1.6.4 
##  [5] lattice_0.22-6     vctrs_0.6.5        tools_4.4.0        crosstalk_1.2.1   
##  [9] generics_0.1.3     stats4_4.4.0       tibble_3.2.1       proxy_0.4-27      
## [13] fansi_1.0.6        highr_0.11         cluster_2.1.6      pkgconfig_2.0.3   
## [17] KernSmooth_2.23-22 satellite_1.0.5    leaflet_2.2.2      lifecycle_1.0.4   
## [21] compiler_4.4.0     stringr_1.5.1      munsell_0.5.1      terra_1.7-78      
## [25] codetools_0.2-20   htmltools_0.5.8.1  class_7.3-22       sass_0.4.9        
## [29] yaml_2.3.10        pillar_1.9.0       jquerylib_0.1.4    classInt_0.4-10   
## [33] cachem_1.1.0       tidyselect_1.2.1   digest_0.6.36      stringi_1.8.4     
## [37] dplyr_1.1.4        reshape2_1.4.4     fastmap_1.2.0      grid_4.4.0        
## [41] colorspace_2.1-1   cli_3.6.3          magrittr_2.0.3     base64enc_0.1-3   
## [45] utf8_1.2.4         leafem_0.2.3       e1071_1.7-14       withr_3.0.2       
## [49] scales_1.3.0       png_0.1-8          evaluate_0.24.0    rlang_1.1.4       
## [53] Rcpp_1.0.13        glue_1.7.0         DBI_1.2.3          rstudioapi_0.16.0 
## [57] jsonlite_1.8.8     R6_2.5.1           plyr_1.8.9         units_0.8-5

2.3.2 Preparation of the data

# 1 Import and prepare the data  -----------------------------------------------
# 1.1 Select the the files #####################################################
Area <- st_read("./data/Sampling_area_2022.gpkg", layer = "Sampling_area_2022")  # Change the area for 2022 and 2023 campaigns
DEM <- raster("./data/DEM.tif")
Geomorpho <- raster("./data/Geomorphology.tif")
Soil_erosion <- raster("./data/RUSLE_map.tif")
Soil_prediction <- raster("./data/Soil_prediction.tif")
Slope <- raster("./data/Slope.tif")
Wetness <- raster("./data/TWI.tif")

# 1.2 Clean and prepare the DEM ################################################

# Crop the DEM to the study area
DEM <- crop(DEM, Area, inverse=FALSE, updatevalue=NA, updateNA=TRUE)

# Mask the DEM to the study area
DEM <- mask(DEM, Area, inverse=FALSE, updatevalue=NA, updateNA=TRUE)

# 1.3 Clean and prepare the covariates #########################################

# Create a list of the different variables
list <- list(Geomorpho, Soil_erosion, Soil_prediction, Slope, Wetness) 
names(list) <- c("Geomorpho","Soil_erosion","Soil_prediction","Slope", "Wetness")

# Set the CRS WGS84 UTM38N to all raster
ZoneUTM <- c("+init=epsg:32638")
for (i in names(list)) { 
  r <- list[[i]]
  list[[i]] <- projectRaster(r, crs = ZoneUTM)
}

# Resample according to the DEM 
# With NGB only for the categorical raster
list[[1]]  <- resample(list[[1]], DEM, "ngb")
list[[3]]  <- resample(list[[3]], DEM, "ngb")

# With bilinear only for the numerical raster
list[[2]]  <- resample(list[[2]], DEM, "bilinear")
list[[4]]  <- resample(list[[4]], DEM, "bilinear")
list[[5]]  <- resample(list[[5]], DEM, "bilinear")

# 1.4 Stack all the predictors #################################################
stack <- stack(DEM, list$Geomorpho, list$Soil_erosion,list$Soil_prediction,
              list$Slope, list$Wetness)  # stack all the data in a stacked raster

plot(stack$DEM)
# 1.5 Save and export the predictors ###########################################
writeRaster(stack ,filename = "./export/predictors.tif", format="GTiff",overwrite=T)

rm(list = ls())

2.3.3 Calculation of the conditioned Latin hypercube sampling

# 2 Create the Conditioned Latin Hypercube Sampling  ---------------------------
# 2.1 Import and prepare the files #############################################
predictors <- raster::stack("./export/predictors.tif")
names(predictor@layers) <- c("DEM","Geomorpho","Soil_erosion","Soil_prediction","Slope", "Wetness" )

# Convert the file into Data Frame
PredForMap  <- as(predictors, 'SpatialPixelsDataFrame')
PredForMap <- data.frame(PredForMap)
stack_frame <- PredForMap[complete.cases(PredForMap),] # Remove NA values

names(stack_frame) <- c("DEM","Geomorpho","Soil_erosion","Soil_prediction","Slope", "Wetness", "x", "y")
stack_frame$Geomorpho <- round(stack_frame$Geomorpho, digits=0) # If ever the categorical classification did not work
stack_frame$Soil_prediction <- round(stack_frame$Soil_prediction, digits=0) # If ever the categorical classification did not work
stack_frame <- na.omit(stack_frame) 

save.image("./export/Pre-process.RData") # Save the image

# 2.2 Conditioned Latin Hypercube sampling parameters ##########################

# Select the predictors for the Conditioned Latin Hypercube
preds <- c("DEM","Geomorpho","Soil_erosion","Soil_prediction","Slope","Wetness") 

# Set the size of the sampling 
c = 110  

# Set a seed
set.seed(1070) 

# Run the sampling
res <- clhs(stack_frame[, preds], size = c,  tdecrease = 0.95, iter = 50000, progress = FALSE, simple = FALSE) 

# 2.3 Export the results #######################################################

# Fit the results with the line in the table
CLHS_sampled_res <- stack_frame[res$index_samples, ]
# Convert to spatial data frame point
CLHS_sampled_df <- SpatialPoints(coords =  CLHS_sampled_res[, c("x", "y")], proj4string=CRS("+init=epsg:32638")) 

# Export the results in a CSV format
write.table(CLHI_sampled_df, "./export/CLHS_2022.csv", col.names = TRUE, sep = ";", row.names = FALSE, 
          fileEncoding = "UTF-8") #Replace 2022 with 2023 for the 2023 summer sampling campaign

2.4 Sampling locations

Some sampling locations were located in inaccessible areas so not all the spots have been sampled. Only 101 in 2022 and 21 in 2023. Despite this lack of few samples sites, the distribution of the cLHS variables was similar to better than a classical random sampling method.

Plot for 2022 campaign.

Plot for 2023 campaign.

References

Al-Mousawi, Hala, Saffa Fouad, and Varoujan Sissakian. 2007. “Geological Map of Zakho Quadrangle.” Geological map. Baghdad: GEOSURV.
Brus, Dick J. 2022. Spatial Sampling with R. 1st ed. Boca Raton: Chapman; Hall/CRC. https://doi.org/10.1201/9781003258940.
Ertan, Nazlan. 2022. “Turkey Launches Offensive Against PKK Targets in Northern Iraq.” Al-Monitor, April. https://www.al-monitor.com/originals/2022/04/turkey-launches-offensive-against-pkk-targets-northern-iraq.
ESA, and Airbus. 2022. “Copernicus DEM.” https://doi.org/10.5270/ESA-c5d3d65.
Forti, Luca, Alessandro Perego, Filippo Brandolini, Guido S. Mariani, Mjahid Zebari, Kathleen Nicoll, Eleonora Regattieri, et al. 2021. “Geomorphology of the Northwestern Kurdistan Region of Iraq: Landscapes of the Zagros Mountains Drained by the Tigris and Great Zab Rivers.” Journal of Maps 17 (2): 225–36. https://doi.org/10.1080/17445647.2021.1906339.
Hengl, Tomislav. 2007. A Practical Guide to Geostatistical Mapping of Environmental Variables. Edited by European Commission. Luxembourg: Publications Office.
Henrich, V., G. Krauss, C. Götze, and C. Sandow. 2012. IDB – Www.indexdatabase.de, Entwicklung Einer Datenbank Für Fernerkundungsindizes.” Poster. Bochum. https://www.indexdatabase.de/info/credits.php.
Pouget, M., José Madeira Netto, É L. Floc’h, and S. Kamal. 1991. “Caractéristiques Spectrales Des Surfaces Sableuses de La Région Côtière Nord-Ouest de l’Egypte : Application Aux Données Satellitaires SPOT.” In. https://www.semanticscholar.org/paper/Caract%C3%A9ristiques-spectrales-des-surfaces-sableuses-Pouget-Netto/082a77649ddd9d2b2427f96a3fb2659acb1c2883.
USGS. 2018. “Landsat Collections.” 2018-3049. U.S. Geological Survey. https://doi.org/10.3133/fs20183049.