Statistical rigorous framework for Digital Soil Mapping – RoVaD

Statistical rigorous framework for Digital Soil Mapping

This public work investigates how modern machine-learning methods can be used to produce Digital Soil Mapping (DSM) products that support statistical inference, both at local scale (individual locations) and at higher aggregation levels such as land-use classes. Some of the key drivers were the following questions:

  • Which statistical quantities must be estimated for DSM maps to support meaningful inference and decision making?
  • What are the current limitations of Random Forest–based DSM approaches for uncertainty quantification and statistical inference?
  • Can DSM-derived estimators complement or even outperform traditional “design-based” estimators for carbon stock monitoring?
  • How can reproducibility and transparency of DSM workflows be improved?
  • What is the relation between predictive performance (e.g., R²) and inferential validity?

Focus was to establish a statistically rigorous framework. Special emphasis was placed on compositional soil properties (sand, silt and clay), uncertainty quantification, model validation and reproducibility.

The framework was applied to the 2025 Cmon dataset to produce maps of soil organic carbon, bulk density, pH and soil texture across multiple depth intervals in Flanders.

The work builds upon the ISRIC Seedling DSM platform. Among other, main contributions include:

  • Standard error estimation for expected values using infinitesimal jackknife and delta-method approximations.
  • Aleatoric variability estimation through MSPE modelling.
  • Improved feature and model selection inspired by glmnet principles.
  • Variance estimation of cross-validation–based generalisation error.
  • Estimation of aggregated statistics and distributions at land-use level.
  • Self-contained HTML-reports documenting the complete modelling workflow (with embedded GeoTiff layers).

Validation of map-derived estimates was done against design-based land-use level statistics. The resulting report can be found here.