< Academy

How LLMs Are Changing Databases

Research
George Marmaras
Junior ML Engineer

Databases sit at the core of almost every modern system, yet interacting with them has barely changed in decades. We still write complex SQL, tune performance by hand and leave vast amounts of data unused simply because accessing it is too difficult.

In a recent Passion Academy, Machine Learning Engineer George Marmaras explored how large language models (LLMs) are beginning to change this picture. Not by replacing databases but by reshaping how we interact with and optimise them .

View the presentation slides below

What a DBMS Actually Does (and Why It’s So Complex)

At its core, a Database Management System (DBMS) is software that stores data, indexes it, queries it efficiently and enforces consistency. Popular examples include relational systems like PostgreSQL and MySQL as well as NoSQL systems like MongoDB and Firebase .

Under the hood, however, a DBMS is anything but simple. A single SQL query flows through a long pipeline: parsing, semantic analysis, access control, cardinality estimation, cost modelling, plan selection, execution, memory management and recovery. Each step relies on carefully engineered heuristics and rules built up over decades.

This complexity is precisely where LLMs start to become interesting.

Why Integrate LLMs with Databases?

From a user perspective, the motivation is straightforward: databases are powerful but hard to use.

Many business users know what they want to ask but not how to express it in SQL. As a result, more than 50% of enterprise data reportedly goes unused, simply because accessing it is too complex .

LLMs promise a more natural interface:

  • Asking questions in plain English instead of writing SQL
  • Exploring data without deep schema knowledge
  • Lowering the barrier between questions and answers

But usability is only part of the story.

From a system perspective, LLMs can also help databases:

  • Predict query costs more accurately
  • Adapt optimisations to real workloads
  • Automate tuning and configuration
  • Move beyond static, rule-based optimisers toward learned behaviour

In short: instead of following fixed rules, databases can begin to learn from experience .

Two Ways LLMs Meet Databases

The presentation outlines two fundamentally different integration strategies.

1. LLM as an External Plug-in

In this setup, the LLM sits outside the database as middleware between the user and the DBMS.

The LLM can:

  • Inspect schemas, statistics and query plans (read-only)
  • Generate or rewrite SQL from natural language
  • Estimate how expensive a query might be

The database remains fully in control of execution and correctness. The primary use case today is text-to-SQL .

Research systems use techniques such as:

  • Query decomposition into smaller steps
  • Multi-agent collaboration
  • Self-correction loops
  • Smart example retrieval from past queries

Despite rapid progress, accuracy on realistic enterprise workloads remains limited. Even state-of-the-art systems struggle with complex schemas and business logic, highlighting a large gap between lab demos and production readiness .

2. LLM Embedded in the DBMS

The more radical approach places the LLM inside the database engine itself.

Here, the LLM becomes part of the execution stack and can:

  • Replace or augment rule-based optimisers
  • Predict cardinalities and query costs from past workloads
  • Enable learned indexes for faster lookups
  • Automatically tune database configuration parameters

In this model, the database doesn’t just execute queries, it adapts over time.

Research systems already demonstrate significant gains:

  • Learned indexes replacing B-trees with faster lookups and less memory
  • Bao (MIT) steering PostgreSQL query optimisers with learned hints
  • GPTuner automatically finding optimal DB configurations far faster than manual tuning

The trade-off is complexity: embedding LLMs requires deep changes to database internals and is primarily pursued by database vendors and infrastructure teams.

Plug-in vs Embedded: A Practical Comparison

The presentation compared the two approaches across architectural and business dimensions.

  • External plug-ins offer fast time-to-market, low engineering cost and high portability. However provide weaker guarantees around correctness and scalability.
  • Embedded LLMs provide stronger control, performance and governance. However, come with higher integration cost, slower rollout and greater vendor lock-in .

In practice, this leads to a clear pattern:

  • Start with external LLMs for exploration and natural-language interfaces
  • Move toward embedded LLMs when performance, scale and reliability matter most

What This Means Going Forward

LLMs won’t replace databases but they are changing what databases can be.

In the near term, we’ll see:

  • More natural ways to query data
  • Human-in-the-loop systems for enterprise analytics
  • Smarter automation of operational tasks

In the longer term, the most interesting shift is philosophical: databases evolving from static systems into learning systems that adapt to their workloads over time.

As with many advances in AI infrastructure, the real challenge isn’t the model… it’s integration, correctness and knowing where learning helps and where hard guarantees still matter most.

References

< back to academy
< previous
Next >