When LLMs Aren’t Enough: The Subset Sum Problem

Large language models like ChatGPT are incredibly versatile. But not every problem is suited for them especially when computation gets tough. This article looks at where LLMs fall short, using a deceptively simple example: the Subset Sum Problem.
Imagine you have a group of T people that you need to allocate across N houses. Each house has a different capacity, and your goal is to pick the right combination of houses to fit everyone perfectly.
Mathematically, this turns into the subset sum problem, finding a subset of numbers that add up to a target (in this case, the total number of people).
Sounds easy? It’s not.
Subset sum is what computer scientists call NP-complete, meaning there’s no known efficient (polynomial-time) algorithm to solve it. The only guaranteed way is to check every possible combination... However, the number of combinations grows exponentially with the size of your list.
Even with just 50 or 60 elements, a brute-force approach could take years on a normal computer.
NP problems show up everywhere, from the travelling salesman problem (TSP) to routing and resource allocation. These are real-world challenges that can’t easily be cracked by pattern recognition alone.
And that’s where LLMs hit their limits.
In a recent internal experiment, we compared three approaches to solving a version of this problem:
The results were clear:
Meanwhile, the solver and dynamic programming methods handled large cases within seconds, remained stable, and didn’t require GPUs or API calls.
The solver even provided partial solutions when interrupted, something LLMs can’t yet do.
LLMs are powerful reasoning engines, but they’re not all-purpose problem solvers.
When it comes to hard computational problems, especially those with exponential complexity, traditional algorithms and well-designed heuristics still win.
Sometimes, the best AI solution isn’t about scaling up the model, it’s about picking the right tool for the job.