Age-Mem: Teaching AI Agents to Manage Their Own Memory

Research

Dr Fabio Rodriguez

Senior ML Engineer

‍

AI agents are getting better at complex, multi-step tasks. But there's a fundamental constraint that doesn't get talked about enough: every agent has a finite context window. There's only so much information it can hold in mind at once. In a recent Passion Academy session, Dr. Fabio Rodríguez walked through Age-Mem : a research framework that tackles this problem head on by teaching agents to actively manage their own memory using reinforcement learning.

‍

This session builds on our earlier exploration of A-MEM, where Dr. Fabio Rodríguez introduced how agents can store and organise memory as a living, evolving structure (article here). Age-Mem takes that foundation further, asking not just how memory is organised but how an agent learns to manage it intelligently under pressure.

‍

The Problem: Too Much Information, Not Enough Space

‍

Imagine asking an AI agent for a focused three-day ML crash course on face recognition but in the same conversation you've also mentioned quantum computing, blockchain, robotics, sourdough bread and latte art. A well-functioning agent needs to filter all of that out and focus on what actually matters. A poorly designed one will either get overwhelmed or treat everything as equally important.

‍

This is the core challenge of long-horizon reasoning. As tasks get longer and more complex, agents accumulate noise. Without a principled way to manage what goes in and out of memory, performance degrades. The context window fills up, relevant information gets crowded out and the agent loses track of what it was actually trying to do.

‍

How Memory Has Been Handled Until Now

‍

Previous approaches to agentic memory fall into two broad categories, both with significant limitations.

‍

The first approach keeps long-term and short-term memory as entirely separate systems with pre-programmed rules deciding when to store something. The problem is those rules are static (they can't adapt to context) and the agent has no real control over what ends up in memory.

‍

The second approach adds a separate Memory Manager model to handle long-term storage but short-term memory is still managed rigidly via retrieval-augmented generation. The overhead is high and the two systems don't coordinate well.

‍

Age-Mem takes a different approach entirely. A single LLM policy manages both long-term and short-term memory simultaneously, using tool calls to take direct memory actions. Rather than following fixed rules, the agent decides for itself what to store, update, retrieve or delete.

‍

What the Agent Can Actually Do

‍

Age-Mem gives the agent six memory operations it can call at any point during a task.

‍

On the long-term memory side: retrieve relevant facts into active context, store new knowledge and refine existing entries.

‍

On the short-term memory side: cache important content, filter out irrelevant noise based on semantic thresholds and summarise history to save space.

‍

The key shift here is that memory management becomes part of the agent's decision-making, not a separate system bolted on top. The LLM is fine-tuned to treat memory operations the same way it treats any other action i.e. something to reason about and execute based on what the task actually requires.

‍

The Reinforcement Learning Challenge

‍

This is where it gets interesting and where standard RL runs into a real problem.

‍

When an agent stores a fact, it often doesn't know immediately whether that was a useful thing to do. The value of that memory action might only become clear a hundred steps later when the fact is needed to answer a question. If you reward the agent immediately for storing things, it learns to hoard. For example, filling memory with irrelevant information just to accumulate reward signals.

‍

Age-Mem addresses this with a three-stage progressive training strategy:

‍

In stage one, the agent learns to build long-term memory by storing relevant facts.
In stage two, it learns short-term memory control. filtering noise and summarising history.
In stage three, both systems are used together as the agent faces real queries that require retrieving and reasoning over everything it has accumulated.

‍

By separating the stages, the agent learns each skill properly before being asked to combine them.

‍

How Reward is Calculated

‍

The reward signal is deliberately composite, balancing four things at once.

‍

Task completion is scored by a separate LLM acting as a judge, evaluating the accuracy of the final answer.
Context management rewards the agent for compressing efficiently, avoiding overflow and preserving content that matters.
Memory quality rewards selective, high-quality storage and penalises keeping stale or irrelevant facts. Penalties are applied when the agent exceeds dialogue length limits or triggers context overflow.

‍

In other words: The reward system acts like a teacher evaluating not just whether the answer was right but whether the agent studied intelligently to get there

‍

The Broader Picture

‍

Memory is increasingly the bottleneck for capable AI agents. Getting the right answer at the end of a long task depends entirely on having managed context well throughout. For example, knowing what to hold onto, what to discard and when to retrieve something that was stored steps earlier.

‍

Age-Mem points toward a future where agents don't just use memory passively but manage it as an active skill. That matters for any application where tasks are long, context is noisy, and the cost of losing track is high (which, as AI agents take on more real-world work, is most of them).

‍

If this raised questions about how agentic memory is structured in the first place, our earlier session on A-MEM covers the architecture underneath. Including how memory units are built, linked and evolved over time, see it here.

‍

< back to academy

< previous

Next >