We Built a Text-to-SQL Solution in Two Hours. Here’s What That Actually Taught Us.
By Willo van der Merwe – Technical Competency Lead. Recently we ran an internal session where we explored a fairly simple-sounding problem: How do you let someone query a relational database using nothing but natural language? In other words, instead of writing SQL, a user could ask something like: “Tell me how many hours were spent working on Project X last month.” The system would interpret the question, generate the appropriate SQL, execute it, and return the answer. That’s Text-to-SQL in a nutshell.
Key Takeaways
- Text-to-SQL prototypes are easy; production-ready systems are not.
- The real challenge is ambiguity in business language, not SQL generation.
- LLMs don’t invent data; they reflect the clarity (or confusion) in your model.
- Weak data modelling is exposed immediately.
- Guardrails, governance, and architecture discipline are essential.
- When the fundamentals are strong, the payoff is significant.
Some key terms, before we start
We throw around a few terms in this article. For added clarity, and in keeping with our ‘no tech jargon here’ approach, here are a few definitions that will be useful. Just so the rest of this article makes more sense.
- SQL is simply the language databases understand. It’s how you ask a database a clear, structured question so it can give you the exact information you’re looking for.
- MCP is the set of rules that lets an AI system safely connect to other tools, like a database. It makes sure the AI knows what it’s allowed to access and how to use it properly.
- A repo, short for repository, is just the main project folder where all the code for a piece of software lives. It keeps everything organised and tracks changes as the system evolves.
- An API is a way for different software systems to talk to each other. It’s like a well-defined handshake that lets one system ask another for information or to perform an action.
Why Text-to-SQL is Not Just a Demo Use Case
Demand for real-time management information
How We Built a Working Text-to-SQL Prototype in Two Hours
I wasn’t interested in whether it could work. We all know it can. What I wanted to understand was what starts to break once you move past the demo.
The first surprise was how quickly we could get something running. With modern LLM tooling and a clean integration pattern, you can wire up natural language to SQL and have results coming back faster than you’d expect. From a distance, it looks impressive. Someone types a question, SQL appears, the database responds. It feels like progress.
The Illusion of Simplicity in Natural Language Queries
Core Insight: Text-to-SQL Reflects Ambiguity
Text-to-SQL doesn’t hide those things. It exposes them immediately.
That’s when it becomes obvious that this isn’t primarily an AI problem. It’s a modelling problem. If the data isn’t well-described and semantically aligned with the business, the LLM can’t magically fix that. It will just reflect the ambiguity back to you.
Another interesting moment was the difference between “it works” and “we’d be comfortable running this in production.”
Within two hours, most teams had something that worked in the narrow sense. Queries executed. Results came back. They looked plausible. But would we expose that directly against operational data? Not without guardrails.
Once you move beyond experimentation, the conversation shifts very quickly to scoping, validation, and control. Which tables are accessible? How do we prevent runaway queries? How do we ensure the model doesn’t infer joins that technically exist but make no business sense? How do we explain where an answer came from if someone challenges it? That’s not prompt tweaking. That’s system design.
We also saw that model choice genuinely affects outcomes. Some models adhered more closely to the schema. Some were more disciplined. Some were more “creative.” If you’re responsible for architecture decisions, that matters. It’s not just a cost conversation. It’s about predictability and reliability under real-world ambiguity.
How AI Tooling Accelerates Engineering, But Doesn’t Replace It
And the two-hour constraint? It did exactly what it was supposed to do. It forced us to focus. It prevented endless abstraction. It made trade-offs visible very quickly.
But it also made something else clear: two hours is enough to understand the shape of the problem. It is not enough to make it robust.
Technical Leadership Takeaways
- It rewards clarity.
- It exposes ambiguity.
- It depends on data quality.
- It needs constraints.
- Clear modelling still matters.
- Shared definitions still matter.
- Architecture discipline still matters.
FAQs
What is Text-to-SQL?
Text-to-SQL is the process of translating natural language questions into structured SQL queries that execute directly against a relational database. Instead of writing SQL manually, a user can ask a question in plain language, such as “How many hours were logged on Project X last month?”, and an LLM generates the corresponding query, executes it, and returns the result. Importantly, the model does not generate new data. It produces SQL that retrieves data that already exists in the database.
How does Text-to-SQL work in practice?
At a high level, Text-to-SQL systems combine:
- A large language model (LLM)
- Access to database schema metadata
- A query execution layer
- Guardrails and validation controls
The user submits a natural language question. The model interprets intent, generates SQL based on the schema, and the system executes the query against the database. The result is returned to the user. In production environments, additional constraints are required to manage scope, prevent unsafe queries, and validate outputs.
What are the biggest challenges in implementing Text-to-SQL?
The most common challenges are:
- Ambiguous business terminology
- Poorly documented schemas
- Inconsistent data definitions
- Lack of governance and validation
- Overexposing operational databases without constraints
In most cases, these are not AI problems. They are data architecture and modelling problems that AI simply makes visible.
How do you make Text-to-SQL reliable?
Reliable implementations typically include:
- Restricted table and schema access
- Query validation layers
- Predefined semantic mappings
- Metadata enrichment
- Monitoring and logging
- Guardrails to prevent expensive or unsafe queries
In short, it requires the same architectural discipline as any other production integration.


