Reinforcement Learning for Battery Fast-Charging Protocols

.jpg)

.jpg)
In a recent edition of the Passion Academy, we explored how reinforcement learning (RL) can be applied to a real-world, safety-critical problem: battery fast charging, with a particular focus on electric vehicles — though the ideas extend to any battery-powered system.
The session examined not only how reinforcement learning can be used in this domain, but also why careful problem formulation matters more than the learning algorithm itself .
From a user experience perspective, faster charging is always better. But batteries don’t like being rushed.
Charging too aggressively can lead to:
Today’s charging strategies rely heavily on heuristics. Common approaches include constant current–constant voltage (CCCV) charging — fast at first, then slower near full capacity — and thermal throttling, where charging power is reduced when temperature exceeds certain thresholds.
These methods work reasonably well, but they are hand-crafted rules. The natural question is: can learning-based methods do better?
Reinforcement learning provides a natural framework for sequential decision-making under trade-offs.
In this setup:
Training typically happens in simulation. Learning directly on real batteries would be slow, expensive and destructive (literally wearing out batteries in the process) .
For an RL agent to make good decisions, its state representation must capture everything relevant.
In battery charging, we cannot directly observe internal chemical processes like lithium plating. Instead, we rely on indirect, imperfect signals such as:
This means the environment is only partially observable, which already makes the problem more challenging. However, with careful design, these signals can still provide enough information for meaningful learning .
At first glance, the reward function seems straightforward:
In practice, this quickly becomes tricky.
If penalties are too small, the agent may learn that slightly damaging the battery is worth the speed gain. If penalties are too large, learning can become unstable or overly conservative.
Some approaches use degradation-aware rewards, where the agent is penalised based on estimated battery degradation derived from physical or empirical models. However, these degradation models are uncertain and errors in them can lead the agent to learn the wrong behaviour entirely .
An alternative approach from recent research is: don’t reward safety, enforce it.
Instead of letting the agent choose any action and penalising unsafe ones after the fact, the agent is only allowed to choose from a set of safe actions. Unsafe actions are removed entirely from the action space.
This is achieved through a concept called shielding:
Crucially, this requires predicting the future impact of an action i.e. what will happen to temperature or voltage after the action is applied. To do this, the system uses a surrogate model, such as a Gaussian Process, which can provide both predictions and uncertainty estimates .
Shielding changes how the agent learns:
This approach is especially valuable in safety-critical systems, where relying on an agent to “learn the right thing eventually” is not acceptable.
One of the strongest takeaways is that the hardest part of reinforcement learning is rarely the algorithm itself.
The real challenge lies in:
In complex, real-world systems, solving the problem often requires extending the standard RL framework, not just plugging in a learning method and hoping for the best .
Reinforcement learning has enormous potential beyond games and simulations but only when applied thoughtfully.
Battery fast charging is a powerful example of how AI, physics, and safety engineering must work together. Sometimes, the smartest learning system is one that knows which decisions it is never allowed to make.
Reference
Chowdhury, Myisha A., Saif SS Al‐Wahaibi, and Qiugang Lu. "Adaptive safe reinforcement learning‐enabled optimization of battery fast‐charging protocols." AIChE Journal 71.1 (2025): e18605.