How Machine Learning Is Transforming Soil Sampling and Soil Mapping


In this week’s Passion Academy, we revised 3 scientific articles about soil sampling and soil mapping for geographical applications.
Soil sampling is about finding the minimum number of sample points while still getting a good understanding of an area. Soil mapping is about using those points to predict soil properties in places where no samples were taken. These methods are widely used in agriculture, environmental studies and in climate research to monitor land and better understand soils.
The articles explore different machine learning techniques used for these problems: clustering methods & genetic algorithms for sampling, Random Forests for soil mapping, and Convolutional Neural Networks for analyzing historical satellite images. These approaches help reduce time, effort and cost. This is a great example of how data science and geography work together
Soil data is often collected to measure attributes such as:
The challenge is deciding where these sampling points should be placed.
A naive approach would simply spread sampling points evenly across an area. While straightforward, this often creates two major problems:
Large agricultural regions, environmental monitoring projects and land development initiatives all face this challenge.
The goal is simple: Collect as little data as possible while learning as much as possible.
A major principle behind modern soil sampling comes from something called the Third Law of Geography:
"The more similar environmental conditions are between two locations, the more similar their geographical properties are likely to be."
In practical terms: If two locations have similar rainfall, elevation, vegetation and climate conditions, their soil properties may also be similar. This allows researchers to make smarter sampling decisions rather than treating every location equally.
One approach highlighted in the session was:
Adaptive Uncertainty Guided Stepwise Sampling (AUGSS)
The goal is to reduce uncertainty while using as few samples as possible.
The process works by:
This repeats until:
Rather than sampling blindly, the system continuously learns where new information matters most.
Models used in this process include:
Another interesting approach explored was: Genetic Algorithm Sampling (GAS)
This method balances two competing goals:
Rather than adding samples step by step, GAS searches for optimal sampling strategies using evolutionary optimisation techniques.
It was tested in flat terrain environments and compared against methods such as:
This is particularly useful when trying to optimise large scale land surveys.
Collecting samples is only half the challenge. Once physical measurements are collected, researchers still need to predict conditions across the rest of the landscape. This is known as soil mapping.
Machine learning models help interpolate between known sampling points and estimate values in areas where no physical testing has occurred.
These predictions can be used to map:
Without interpolation, organisations would need significantly more expensive physical sampling.
One of the most interesting developments is combining soil sampling with historical satellite data.
Researchers explored using:
These datasets were combined with physical soil measurements to predict soil organic carbon at regional scale.
In one study:
This creates a much richer picture than relying purely on physical sampling.
Better soil mapping has major implications for industries like agriculture and environmental science.
It can help organisations:
And the efficiency gains are significant.
Smarter sampling methods can reduce required sampling points by around 30% while maintaining similar levels of information quality. That is a meaningful operational improvement when working across large geographic areas.
This is a good example of where machine learning creates value outside traditional AI conversations. By combining spatial reasoning, optimisation algorithms and environmental data, machine learning is helping organisations understand physical environments faster and at lower cost.
As climate monitoring, agriculture and land management become increasingly data driven, this type of work will likely become far more important.