< Academy

Reinforcement Learning for Battery Fast-Charging Protocols

Research
Dr Nadine Kroher
Chief Scientific Officer

In a recent edition of the Passion Academy, we explored how reinforcement learning (RL) can be applied to a real-world, safety-critical problem: battery fast charging, with a particular focus on electric vehicles — though the ideas extend to any battery-powered system.

The session examined not only how reinforcement learning can be used in this domain, but also why careful problem formulation matters more than the learning algorithm itself .

The Core Problem: Fast Charging vs. Battery Health\

From a user experience perspective, faster charging is always better. But batteries don’t like being rushed.

Charging too aggressively can lead to:

  • Lithium plating, often called the silent death of a battery, where capacity degrades over time without obvious warning.
  • Excess heat, which shortens battery lifespan even if it never reaches dangerous levels.
  • High voltage stress, which can accelerate degradation and increase safety risks.

Today’s charging strategies rely heavily on heuristics. Common approaches include constant current–constant voltage (CCCV) charging — fast at first, then slower near full capacity — and thermal throttling, where charging power is reduced when temperature exceeds certain thresholds.

These methods work reasonably well, but they are hand-crafted rules. The natural question is: can learning-based methods do better?

Framing Battery Charging as a Reinforcement Learning Problem

Reinforcement learning provides a natural framework for sequential decision-making under trade-offs.

In this setup:

  • The agent is the charging controller.
  • The actions are charging decisions — how much current or power to apply at each time step.
  • The environment is the battery and its thermal and electrical dynamics.
  • An episode represents a full charging session, for example from 20% to 80% state of charge.

Training typically happens in simulation. Learning directly on real batteries would be slow, expensive and destructive (literally wearing out batteries in the process) .

Designing the State Space: What the Agent Can See

For an RL agent to make good decisions, its state representation must capture everything relevant.

In battery charging, we cannot directly observe internal chemical processes like lithium plating. Instead, we rely on indirect, imperfect signals such as:

  • State of charge (SOC)
  • Voltage
  • Temperature
  • Possibly previous actions, elapsed charging time, ambient conditions, or estimated internal resistance

This means the environment is only partially observable, which already makes the problem more challenging. However, with careful design, these signals can still provide enough information for meaningful learning .

The Hard Part: Reward Design

At first glance, the reward function seems straightforward:

  • Penalise time to encourage faster charging.
  • Penalise unsafe behaviour like excessive temperature or voltage.

In practice, this quickly becomes tricky.

If penalties are too small, the agent may learn that slightly damaging the battery is worth the speed gain. If penalties are too large, learning can become unstable or overly conservative.

Some approaches use degradation-aware rewards, where the agent is penalised based on estimated battery degradation derived from physical or empirical models. However, these degradation models are uncertain and errors in them can lead the agent to learn the wrong behaviour entirely .

A Different Philosophy: Hard Safety Constraints via Shielding

An alternative approach from recent research is: don’t reward safety, enforce it.

Instead of letting the agent choose any action and penalising unsafe ones after the fact, the agent is only allowed to choose from a set of safe actions. Unsafe actions are removed entirely from the action space.

This is achieved through a concept called shielding:

  • The policy proposes an action.
  • A safety module evaluates whether that action is safe given the current state.
  • If unsafe, the action is either rejected or replaced with the closest safe alternative.

Crucially, this requires predicting the future impact of an action i.e. what will happen to temperature or voltage after the action is applied. To do this, the system uses a surrogate model, such as a Gaussian Process, which can provide both predictions and uncertainty estimates .

Why This Matters

Shielding changes how the agent learns:

  • Unsafe actions effectively do not exist from the agent’s perspective.
  • The agent never receives reward from risky behaviour because it can never execute it.
  • Safety becomes a hard constraint, not a trade-off.

This approach is especially valuable in safety-critical systems, where relying on an agent to “learn the right thing eventually” is not acceptable.

The Bigger Lesson: Problem Formulation Beats Algorithms

One of the strongest takeaways is that the hardest part of reinforcement learning is rarely the algorithm itself.

The real challenge lies in:

  • Designing meaningful state representations
  • Defining actions realistically
  • Constructing environments that reflect reality
  • Deciding which constraints should be learned and which must be enforced

In complex, real-world systems, solving the problem often requires extending the standard RL framework, not just plugging in a learning method and hoping for the best .

Final Thoughts

Reinforcement learning has enormous potential beyond games and simulations but only when applied thoughtfully.

Battery fast charging is a powerful example of how AI, physics, and safety engineering must work together. Sometimes, the smartest learning system is one that knows which decisions it is never allowed to make.

Reference

Chowdhury, Myisha A., Saif SS Al‐Wahaibi, and Qiugang Lu. "Adaptive safe reinforcement learning‐enabled optimization of battery fast‐charging protocols." AIChE Journal 71.1 (2025): e18605.

< back to academy
< previous
Next >