LiDAR, Localisation and Machine Learning


Autonomous systems need to answer two fundamental questions at all times: where am I, and what is around me? In a recent Passion Academy session, Ivo Kosa broke down how LiDAR sensors tackle both of these challenges, what makes the data they produce so difficult to work with and how machine learning is changing what's possible.
LiDAR is at its core a range measurement sensor. It fires pulses of light and measures the time taken for each pulse to return. Do this fast enough in every direction and you get a dense snapshot of the surrounding environment at each timestep.
In 2D, each point is described by an angle and a distance. In 3D, each point gets full X, Y, Z coordinates with the sensor as the origin. The result is what's known as a point cloud: a spatial fingerprint of the world around the sensor.
Point clouds are rich but awkward. They can be represented in three main ways: as raw point clouds, as meshes where surfaces are defined by connected faces, or as voxels where 3D space is divided into a uniform grid of occupied or empty cells. Each representation trades off detail, structure and computational cost differently.
The deeper problem is that point clouds don't behave like the data most neural networks are designed for. There's no fixed ordering (shuffling the points doesn't change the shape) but most networks treat input order as meaningful. Different scans produce wildly different numbers of points, making fixed input layers impossible. Points aren't evenly distributed, with nearby surfaces densely sampled and distant areas sparse. And CNNs, which rely on sliding a kernel across a regular grid, have nothing to grip onto.
These aren't minor inconveniences. They're the reason entirely new architectures had to be developed.
PointNet addressed the ordering problem by processing each point independently through shared MLP layers, producing a representation that's invariant to input order. PointNet++ improved on this by applying the same idea hierarchically across subsets of the full point cloud, capturing both local and global structure.
Both networks can be used for scene classification (labelling the environment as a whole) or for per-point segmentation, assigning a category to every individual point. That second capability is what makes them particularly powerful in real-world applications.
The session covered four application areas that illustrate the range of the technology.
In autonomous vehicles, real-time semantic segmentation of LiDAR scans distinguishes road surface, pedestrians, cyclists and other vehicles as the system moves. In construction and infrastructure, Bbuilding Information Modelling uses scanned point clouds to automatically label walls, floors, ceilings, pipes and beams in existing structures. Subsea, LiDAR-based inspection systems detect structural deformation along pipeline runs without human divers. And across robotics broadly, LiDAR is the primary sensor for real-time mapping and localisation during autonomous navigation.
It's worth noting that no sensor is perfect. LiDAR offers high resolution and works in the dark, but loses accuracy in fog or deep water. Sonar has long range underwater but low resolution. Radar penetrates occluded environments well but is similarly low resolution. Depth cameras return highly structured data that CNNs can use directly, but their depth images are often grainy and unreliable. Real systems tend to fuse several of these together.
Getting a robot to navigate autonomously requires solving two problems at once: building a map of the environment, and figuring out where within that map you currently are. The catch is that you need the map to localise and you need your location to build the map. This chicken-and-egg challenge is known as Simultaneous Localisation and Mapping, or SLAM.
Two main approaches were covered.
Kalman Filter SLAM solves this incrementally. At each step, the system takes a LiDAR scan, uses a motion model to predict where the robot has moved, aligns the new scan with the existing map, then uses the Kalman Filter to combine the predicted pose with the measured one to produce an improved estimate. The map is updated and the loop repeats.
Factor Graph SLAM takes a fundamentally different view. Rather than asking "given my last position and this new observation, where am I now?", it asks "given every pose, motion and observation across the entire trajectory, what is the most consistent path the robot could have taken?" The whole problem is modelled as a graph where nodes represent poses and landmarks and edges represent sensor measurements or motion constraints. It's more computationally expensive, but produces significantly more accurate maps over long trajectories where small errors would otherwise accumulate.
LiDAR sits at the intersection of physics, geometry and machine learning in a way few other sensing technologies do. The hardware captures the world in three dimensions. The algorithms have to make sense of data that breaks most of the assumptions standard neural networks rely on. And the applications (from self-driving vehicles to subsea inspection to autonomous construction surveying) span some of the most demanding environments imaginable.
The field is moving quickly and architectures like PointNet++ are still being refined. But the core insight is already clear: to work with the physical world at scale, AI systems need to get comfortable with the messy, unordered, variable-density data that the real world actually produces.
H. Zhang et al. Deep Learning-Based 3D Point Cloud Classification: A Systematic Survey and OutlookC. Qi et al. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric SpaceLeica Geosystems — Mapping Underwater Terrain with Bathymetric LiDARF. Dellaert et al. Square Root SAM: Simultaneous Localization and Mapping via Square Root Information Smoothing