How LLMs Are Changing Databases

Research

George Marmaras

Junior ML Engineer

Databases sit at the core of almost every modern system, yet interacting with them has barely changed in decades. We still write complex SQL, tune performance by hand and leave vast amounts of data unused simply because accessing it is too difficult.

In a recent Passion Academy, Machine Learning Engineer George Marmaras explored how large language models (LLMs) are beginning to change this picture. Not by replacing databases but by reshaping how we interact with and optimise them .

‍

View the presentation slides below

‍

What a DBMS Actually Does (and Why It’s So Complex)

‍

At its core, a Database Management System (DBMS) is software that stores data, indexes it, queries it efficiently and enforces consistency. Popular examples include relational systems like PostgreSQL and MySQL as well as NoSQL systems like MongoDB and Firebase .

‍

Under the hood, however, a DBMS is anything but simple. A single SQL query flows through a long pipeline: parsing, semantic analysis, access control, cardinality estimation, cost modelling, plan selection, execution, memory management and recovery. Each step relies on carefully engineered heuristics and rules built up over decades.

‍

This complexity is precisely where LLMs start to become interesting.

‍

Why Integrate LLMs with Databases?

‍

From a user perspective, the motivation is straightforward: databases are powerful but hard to use.

‍

Many business users know what they want to ask but not how to express it in SQL. As a result, more than 50% of enterprise data reportedly goes unused, simply because accessing it is too complex .

‍

LLMs promise a more natural interface:

‍

Asking questions in plain English instead of writing SQL
Exploring data without deep schema knowledge
Lowering the barrier between questions and answers

‍

But usability is only part of the story.

‍

From a system perspective, LLMs can also help databases:

‍

Predict query costs more accurately
Adapt optimisations to real workloads
Automate tuning and configuration
Move beyond static, rule-based optimisers toward learned behaviour

‍

In short: instead of following fixed rules, databases can begin to learn from experience .

‍

Two Ways LLMs Meet Databases

‍

The presentation outlines two fundamentally different integration strategies.

‍

1. LLM as an External Plug-in

‍

In this setup, the LLM sits outside the database as middleware between the user and the DBMS.

‍

The LLM can:

‍

Inspect schemas, statistics and query plans (read-only)
Generate or rewrite SQL from natural language
Estimate how expensive a query might be

‍

The database remains fully in control of execution and correctness. The primary use case today is text-to-SQL .

‍

Research systems use techniques such as:

Query decomposition into smaller steps
Multi-agent collaboration
Self-correction loops
Smart example retrieval from past queries

‍

Despite rapid progress, accuracy on realistic enterprise workloads remains limited. Even state-of-the-art systems struggle with complex schemas and business logic, highlighting a large gap between lab demos and production readiness .

‍

2. LLM Embedded in the DBMS

‍

The more radical approach places the LLM inside the database engine itself.

‍

Here, the LLM becomes part of the execution stack and can:

‍

Replace or augment rule-based optimisers
Predict cardinalities and query costs from past workloads
Enable learned indexes for faster lookups
Automatically tune database configuration parameters

‍

In this model, the database doesn’t just execute queries, it adapts over time.

‍

Research systems already demonstrate significant gains:

‍

Learned indexes replacing B-trees with faster lookups and less memory
Bao (MIT) steering PostgreSQL query optimisers with learned hints
GPTuner automatically finding optimal DB configurations far faster than manual tuning

‍

The trade-off is complexity: embedding LLMs requires deep changes to database internals and is primarily pursued by database vendors and infrastructure teams.

‍

Plug-in vs Embedded: A Practical Comparison

‍

The presentation compared the two approaches across architectural and business dimensions.

‍

External plug-ins offer fast time-to-market, low engineering cost and high portability. However provide weaker guarantees around correctness and scalability.
Embedded LLMs provide stronger control, performance and governance. However, come with higher integration cost, slower rollout and greater vendor lock-in .

‍

In practice, this leads to a clear pattern:

Start with external LLMs for exploration and natural-language interfaces
Move toward embedded LLMs when performance, scale and reliability matter most

‍

What This Means Going Forward

‍

LLMs won’t replace databases but they are changing what databases can be.

‍

In the near term, we’ll see:

More natural ways to query data
Human-in-the-loop systems for enterprise analytics
Smarter automation of operational tasks

‍

In the longer term, the most interesting shift is philosophical: databases evolving from static systems into learning systems that adapt to their workloads over time.

‍

As with many advances in AI infrastructure, the real challenge isn’t the model… it’s integration, correctness and knowing where learning helps and where hard guarantees still matter most.

‍

References

‍

< back to academy

< previous

Next >