When LLMs Aren’t Enough: The Subset Sum Problem and AI’s Limits

Research

Dr Fabio Rodriguez

Senior ML Engineer

‍

Large language models like ChatGPT are incredibly versatile. But not every problem is suited for them especially when computation gets tough. This article looks at where LLMs fall short, using a deceptively simple example: the Subset Sum Problem.

‍

The Problem

‍

Imagine you have a group of T people that you need to allocate across N houses. Each house has a different capacity, and your goal is to pick the right combination of houses to fit everyone perfectly.

‍

Mathematically, this turns into the subset sum problem, finding a subset of numbers that add up to a target (in this case, the total number of people).

‍

Sounds easy? It’s not.

‍

Subset sum is what computer scientists call NP-complete, meaning there’s no known efficient (polynomial-time) algorithm to solve it. The only guaranteed way is to check every possible combination... However, the number of combinations grows exponentially with the size of your list.

‍

Even with just 50 or 60 elements, a brute-force approach could take years on a normal computer.

‍

Why This Matters

‍

NP problems show up everywhere, from the travelling salesman problem (TSP) to routing and resource allocation. These are real-world challenges that can’t easily be cracked by pattern recognition alone.

‍

And that’s where LLMs hit their limits.

‍

Testing LLMs vs Classical Methods

‍

In a recent internal experiment, we compared three approaches to solving a version of this problem:

‍

GPT (LLM-based reasoning)
Google OR-Tools (CPSAT solver)
A dynamic programming method

‍

The results were clear:

‍

For small or medium-sized cases (e.g. 100 houses and 1,000 people), GPT gave reasonable answers in about 30 seconds.
As problem size grew, response times shot up (up to four minutes) and accuracy dropped.
For large instances, API calls simply timed out.

‍

Meanwhile, the solver and dynamic programming methods handled large cases within seconds, remained stable, and didn’t require GPUs or API calls.

‍

The solver even provided partial solutions when interrupted, something LLMs can’t yet do.

‍

The Takeaway

‍

LLMs are powerful reasoning engines, but they’re not all-purpose problem solvers.

‍

When it comes to hard computational problems, especially those with exponential complexity, traditional algorithms and well-designed heuristics still win.

‍

Sometimes, the best AI solution isn’t about scaling up the model, it’s about picking the right tool for the job.

‍

< back to academy

< previous

Next >

When LLMs Aren’t Enough: The Subset Sum Problem and AI’s Limits

The Problem

Why This Matters

Testing LLMs vs Classical Methods

The Takeaway

Join our newsletter for human-first AI insights, drawn from deep research and bold experimentation.

Built with Heart. By Humans.