We Built a Text-to-SQL Solution in Two Hours. Here’s What That Actually Taught Us.

text to sql

We Built a Text-to-SQL Solution in Two Hours. Here’s What That Actually Taught Us.

By Willo van der Merwe – Technical Competency Lead. Recently we ran an internal session where we explored a fairly simple-sounding problem: How do you let someone query a relational database using nothing but natural language? In other words, instead of writing SQL, a user could ask something like: “Tell me how many hours were spent working on Project X last month.” The system would interpret the question, generate the appropriate SQL, execute it, and return the answer. That’s Text-to-SQL in a nutshell.

Key Takeaways

  • Text-to-SQL prototypes are easy; production-ready systems are not.
  • The real challenge is ambiguity in business language, not SQL generation.
  • LLMs don’t invent data; they reflect the clarity (or confusion) in your model.
  • Weak data modelling is exposed immediately.
  • Guardrails, governance, and architecture discipline are essential.
  • When the fundamentals are strong, the payoff is significant.

Some key terms, before we start

We throw around a few terms in this article. For added clarity, and in keeping with our ‘no tech jargon here’ approach, here are a few definitions that will be useful. Just so the rest of this article makes more sense.

  • SQL is simply the language databases understand. It’s how you ask a database a clear, structured question so it can give you the exact information you’re looking for.
  • MCP is the set of rules that lets an AI system safely connect to other tools, like a database. It makes sure the AI knows what it’s allowed to access and how to use it properly.
  • A repo, short for repository, is just the main project folder where all the code for a piece of software lives. It keeps everything organised and tracks changes as the system evolves.
  • An API is a way for different software systems to talk to each other. It’s like a well-defined handshake that lets one system ask another for information or to perform an action.

 

Why Text-to-SQL is Not Just a Demo Use Case

Demand for real-time management information

For many of our clients, this is not a novelty use case. There is a constant demand for relevant management information drawn directly from operational data. Leadership teams want timely, reliable answers without waiting for a reporting cycle or a developer to write a query.

 

The pressure is always the same: reduce the lag between a business question and a defensible answer. Text-to-SQL sits squarely in that gap, which is why it’s such a topical subject.

 

How We Built a Working Text-to-SQL Prototype in Two Hours

We decided to build a working version of that using MCP, gave ourselves two hours, and see what happens. No slide decks. No big promises. Just a database, a scaffolded repo, and a mix of developers and business consultants in the room.
I wasn’t interested in whether it could work. We all know it can. What I wanted to understand was what starts to break once you move past the demo.

 

The first surprise was how quickly we could get something running. With modern LLM tooling and a clean integration pattern, you can wire up natural language to SQL and have results coming back faster than you’d expect. From a distance, it looks impressive. Someone types a question, SQL appears, the database responds. It feels like progress.

 

The Illusion of Simplicity in Natural Language Queries

But then we started asking normal business questions. “Who’s under-allocated right now?”
That sounds harmless enough. Until you ask: what does “right now” mean? This week? This sprint? The current month? And what exactly qualifies as “under-allocated”? Less than 60%? Less than 80%? Does planned leave count? Does forecast count?
The model will happily generate SQL for you. The real work is deciding what the question actually means.

 

Core Insight: Text-to-SQL Reflects Ambiguity

One important distinction became clear very quickly: the data itself is still coming directly from the database. The LLM is not inventing numbers. Text-to-SQL, when working correctly, simply translates intent into a query and returns what the data contains. The risk lies elsewhere. If the prompt is ambiguous, the query can faithfully execute the wrong interpretation. The system can be technically correct and semantically wrong at the same time.

 

That was the consistent pattern throughout the session. The LLM wasn’t the limiting factor. Our shared understanding of intent was.

 

We also ran straight into the reality of our data model. Nothing dramatic. Just normal enterprise data. A bit denormalised here, a field that only makes sense if you’ve worked with it before, a relationship that isn’t obvious from the table name alone.

Text-to-SQL doesn’t hide those things. It exposes them immediately.

When the schema doesn’t map cleanly to business language, the model has to guess. Sometimes it guesses well. Sometimes it doesn’t. And validating whether it guessed correctly takes longer than writing the SQL yourself would have.

That’s when it becomes obvious that this isn’t primarily an AI problem. It’s a modelling problem. If the data isn’t well-described and semantically aligned with the business, the LLM can’t magically fix that. It will just reflect the ambiguity back to you.

Another interesting moment was the difference between “it works” and “we’d be comfortable running this in production.”

Within two hours, most teams had something that worked in the narrow sense. Queries executed. Results came back. They looked plausible. But would we expose that directly against operational data? Not without guardrails.

Once you move beyond experimentation, the conversation shifts very quickly to scoping, validation, and control. Which tables are accessible? How do we prevent runaway queries? How do we ensure the model doesn’t infer joins that technically exist but make no business sense? How do we explain where an answer came from if someone challenges it? That’s not prompt tweaking. That’s system design.

We also saw that model choice genuinely affects outcomes. Some models adhered more closely to the schema. Some were more disciplined. Some were more “creative.” If you’re responsible for architecture decisions, that matters. It’s not just a cost conversation. It’s about predictability and reliability under real-world ambiguity.

 

How AI Tooling Accelerates Engineering, But Doesn’t Replace It

One thing that did stand out in a positive way was how much AI-assisted development tooling accelerated the build itself. Iteration was faster. Boilerplate was easier. Refactoring felt lighter. But it was also clear that the teams who moved confidently had solid engineering habits to begin with. Good Git hygiene. Comfort with APIs. Clear debugging approach. The tools amplified competence. They didn’t replace it.

And the two-hour constraint? It did exactly what it was supposed to do. It forced us to focus. It prevented endless abstraction. It made trade-offs visible very quickly.

But it also made something else clear: two hours is enough to understand the shape of the problem. It is not enough to make it robust.

 

If we had kept going, the work would have shifted into refining metadata, tightening definitions, adding validation layers, and hardening guardrails. In other words, the unglamorous work that turns a demo into something dependable.

 

Technical Leadership Takeaways

If you’re a technical leader thinking about Text-to-SQL or similar LLM integrations, the takeaway isn’t that it’s risky or overhyped. It’s that it behaves like every other serious system integration.
  • It rewards clarity.
  • It exposes ambiguity.
  • It depends on data quality.
  • It needs constraints.
The technology is impressive. It really is. But the fundamentals haven’t changed.
  • Clear modelling still matters.
  • Shared definitions still matter.
  • Architecture discipline still matters.
The difference now is that when those things are in place, the payoff can be substantial. When they aren’t, the cracks show faster.
And in a way, that’s the most useful thing the two hours gave us – not a flashy demo, but a clearer understanding of where the real work lives.

 

FAQs

What is Text-to-SQL?

Text-to-SQL is the process of translating natural language questions into structured SQL queries that execute directly against a relational database. Instead of writing SQL manually, a user can ask a question in plain language, such as “How many hours were logged on Project X last month?”, and an LLM generates the corresponding query, executes it, and returns the result. Importantly, the model does not generate new data. It produces SQL that retrieves data that already exists in the database.

How does Text-to-SQL work in practice?

At a high level, Text-to-SQL systems combine:

  • A large language model (LLM)
  • Access to database schema metadata
  • A query execution layer
  • Guardrails and validation controls

The user submits a natural language question. The model interprets intent, generates SQL based on the schema, and the system executes the query against the database. The result is returned to the user. In production environments, additional constraints are required to manage scope, prevent unsafe queries, and validate outputs.

What are the biggest challenges in implementing Text-to-SQL?

The most common challenges are:

  • Ambiguous business terminology
  • Poorly documented schemas
  • Inconsistent data definitions
  • Lack of governance and validation
  • Overexposing operational databases without constraints

In most cases, these are not AI problems. They are data architecture and modelling problems that AI simply makes visible.

How do you make Text-to-SQL reliable?

Reliable implementations typically include:

  • Restricted table and schema access
  • Query validation layers
  • Predefined semantic mappings
  • Metadata enrichment
  • Monitoring and logging
  • Guardrails to prevent expensive or unsafe queries

In short, it requires the same architectural discipline as any other production integration.

Share this post


Saratoga Software