When Embeddings Hit Their Limits

Research

Dr Nadine Kroher

Chief Scientific Officer

‍

Why talk about this

‍

Text embeddings sit quietly behind almost every AI system you use. From chatbots and search tools to recommendation engines. They’re what allow systems to “understand” similarity: when two pieces of text mean roughly the same thing, their embeddings sit close together in a mathematical space.

‍

A new paper from DeepMind takes a closer look at the limits of that idea. It shows that, under certain conditions, even very large embeddings can’t perfectly capture all possible relationships between queries and documents. It backs it up with a clever experiment.

‍

So, what does this mean for people building retrieval-augmented generation (RAG) systems every day?

‍

A quick recap: how embeddings work

‍

When you ask a question, say, “How do I reset my company email?”, the system converts that question into a vector, a kind of numerical fingerprint.

‍
It does the same for all documents in its knowledge base. Then it finds which vectors are closest together those are assumed to be the most relevant results.

‍

That’s the basic mechanism behind everything from internal search tools to support chatbots.

‍

The DeepMind study: finding the edges

‍

The DeepMind team started by asking a theoretical question:

‍

Are there cases where no fixed-size embedding space can represent all the right relationships between questions and answers?

‍

Their answer was yes, there are always some patterns that can’t be captured, no matter how many dimensions the embedding has.

‍

To test this in practice, they built what they call a “limit dataset.”
It’s intentionally simple:

‍

1,000 questions such as “Who likes pizza?”, “Who likes tennis?”
10,000 short profiles like “Ben likes pizza and beer.”

‍

Each question has exactly two correct answers.
‍

The challenge: can an embedding model place those two correct documents close to the question and everything else far away?

‍

What they found

‍

Surprisingly, today’s high-performing neural embedding models (the kind used across modern AI systems) struggled with this setup.

Simpler, keyword-based methods (like BM25) and token-level embeddings actually performed better.
Increasing the embedding size helped a bit but didn’t solve the problem.
And this wasn’t due to “domain shift” or training quirks, the limitation held even when models were retrained on the same data.

‍

It’s a fascinating result because it demonstrates, with both theory and experiment, that embeddings have structural limits.
Some relationships between questions and answers are just too constrained to fit neatly into a single-vector space.

‍

Should you be worried?

‍

For most real-world retrieval systems, no.

‍

Typical applications (customer support, document search, chat assistants) don’t look like this highly structured “two-correct-answers” puzzle.

‍
In practice, embedding models work well across messy, overlapping human language.

‍
When retrieval fails, it’s usually due to query phrasing, chunking, or missing data, not the geometry of the embedding space itself.

‍

Why the study still matters

‍

This research is important because it reminds us that embedding-based retrieval isn’t magic, it’s an approximation.
It works beautifully most of the time, but not for every possible structure of text relationships.

‍

It also highlights alternatives worth exploring:

‍

Multi-vector models, which encode several vectors per document instead of one.
Hybrid retrieval, combining keyword signals with semantic similarity.
And continuing work on higher-dimensional or adaptive embedding spaces that adjust to task structure.

The bigger takeaway

‍

AI progress often comes from practical experiments, what works and what doesn’t rather than theoretical proofs.
Studies like this one from DeepMind are essential. They help us understand the boundaries of the tools we rely on and remind us where creativity in model design still matters.

‍

For most builders, the message isn’t to abandon embeddings it’s to know when they might need help.
And for researchers, it’s another step towards grounding our everyday tools in solid theory.

‍

Reference

‍

Weller, Orion, et al. “On the theoretical limitations of embedding-based retrieval.” arXiv preprint arXiv:2508.21038 (2025).

‍

< back to academy

< previous

Next >