The Reality of Building with LLMs: It's Not Magic, It's Systems
I used to think that building AI features was all about writing the perfect prompt. Just say the right incantation, and the LLM will output exactly what you want, right?
Yeah, no.
After spending the last few months deeply integrating LLMs into real-world workflows, my biggest "aha!" moment has been surprisingly mundane: building with AI is mostly just traditional systems engineering.
The Prompt is the Easy Part
When you first start playing with the OpenAI or Anthropic APIs, you spend hours tweaking prompts. You add "You are a helpful assistant" and "Think step by step." It feels magical.
But when you try to put that into a production app, the magic fades fast.
The LLM will hallucinate. It will ignore your instructions. It will format the JSON wrong 5% of the time. The API will timeout. You'll hit rate limits.
Suddenly, your prompt isn't the bottleneck. The wrapper around the prompt is the bottleneck.
It's All About Scaffolding
The actual prompt engineering is maybe 10% of the work. The other 90% is building the scaffolding to make the LLM reliable:
- Validation & Retries: If the LLM returns invalid JSON, you don't throw an error. You catch it, send the error back to the LLM, and tell it to fix it.
- Context Assembly: The LLM is only as smart as the context you give it. Building robust data pipelines to fetch the right user data, format it cleanly, and inject it into the prompt is where the real value lies. (RAG is just a fancy term for this).
- Fallbacks: What happens when the API is down? You need graceful degradation.
- State Management: Tracking the history of a multi-turn conversation or a multi-step agent workflow is pure, boring state management.
Treat LLMs Like Unreliable Interns
The best mental model I've found is to treat the LLM like an incredibly fast, but easily distracted, intern.
You wouldn't hand an intern a massive, vague task and expect perfection on the first try. You'd break it down into small, isolated steps. You'd give them clear checklists. You'd review their work at every stage.
Building with LLMs is the same. Instead of one massive prompt trying to do everything, chain together smaller, specialized calls. Have one call extract the data, another format it, and a third verify it.
The magic of AI isn't in the model itself. The magic is in the robust, resilient systems we build around it.