How we built the ZonaMundial AI Coach: RAG architecture and optimized prompts
Real AI case study in production: RAG architecture, Claude API, embeddings and prompts for a fantasy football AI Coach built for the 2026 World Cup.
At SprintMarkt we usually publish theoretical guides about AI. Today is different: we are opening up one of our own in-house products in production. ZonaMundial is a fantasy football platform focused on the 2026 World Cup, fully built, operated and commercialised by SprintMarkt. Inside it lives a flagship module we call AI Coach. This post covers how we built it, what technical decisions we made and what we learned along the way — and why running an in-house product gives us an edge when applying that same muscle on client projects.
The initial challenge
Because this is an in-house product, we set the constraints ourselves: audience mainly in Latin America (Mexico, Colombia, Argentina), live sports data feeds required, and a tight deadline to make it in time for the qualifying rounds. The fourth constraint, self-imposed, was the most ambitious: the AI Coach had to go beyond a generic chatbot and give advice truly grounded in FIFA rules, historical stats and the user's own fantasy squad.
That last bit was the hardest part. A raw LLM hallucinates sports data all the time. Wrong player names, wrong clubs, invented stats. For a fantasy product where users bet reputation and (sometimes) money, that is unacceptable.
Chosen stack
After a couple of days of technical spikes we locked in the stack:
RAG architecture, in plain English
RAG (Retrieval Augmented Generation) is not magic: it just pulls relevant chunks of information before asking the LLM, so the model answers based on real data instead of its statistical memory.
ZonaMundial has three separate corpora indexed in pgvector:
When a user asks the Coach something like 'who should I captain on Saturday?', the flow is:
The (dark) art of prompting
The AI Coach system prompt went through roughly fifteen iterations before stabilizing. Key takeaways:
Need help with your project?
Calculate your budget in 2 minutes with our interactive tool.
Costs: the part nobody talks about
An AI Coach in production is not free. The real cost breakdown:
- Claude API: variable, dominant once the user base grows.
- Embeddings: dirt cheap by comparison, but they add up with weekly corpus re-indexing.
- pgvector: extra CPU and memory on the Postgres box.
- Image generation: charged per image by Stability AI, controlled with a hard monthly budget.
The biggest lever to control costs was **semantic caching**: many fantasy questions are variations of the same five or six (captain, transfer, gameweek). We cache answers keyed by the question embedding and return the cached response if similarity clears a high threshold. That meaningfully lowered token spend with no perceptible UX change.
Latency and UX
A streaming LLM takes seconds to answer. To avoid the app feeling slow, we fire three things in parallel as soon as the user starts typing: animated skeleton with messages like 'analyzing lineups', prefetch of the most likely RAG context based on the page they are on, and token-by-token streaming with progressive markdown rendering. The full answer still takes the same time, but perception changes radically.
Security and moderation
A fantasy platform with anonymous users attracts trolls. We put three layers in place:
Infrastructure and deploy
Everything goes through Cloudflare as the first hop: aggressive static caching for assets, WAF rules to block scraping bots, and per-IP rate limiting to protect AI endpoints. Application servers live at SWHosting, with SSL delegated to Cloudflare for sponsor subdomains. Deploys go via Git + automated SFTP, with a hook that purges Cloudflare cache only for the routes that changed.
Lessons learned
If we started again today we would do three things differently: semantic caching from day one instead of adding it late, designing the RAG corpus around 300-500 token chunks (we tried 1,000 and precision dropped), and running a 5% traffic canary before any prompt change to catch regressions.
Result
The AI Coach has been running in production for months, serving ZonaMundial's LATAM community. It is not perfect, but users actually use it, mention it on social and come back gameweek after gameweek. As an in-house product it sets us apart from other fantasy platforms, and it doubles as a live laboratory for the AI agents we build for our clients.
Want something like this for your business
If you are considering an AI agent with RAG, a conversational assistant on top of your own content, or a vertical coach for your industry, we do this at SprintMarkt. We always start with a 490€ AI audit to understand if it makes sense, how much it would cost and what ROI to expect. No hype, no 'digital transformation' buzzwords. Drop us a line.
Have a project in mind?
Tell us your idea and we'll help make it happen. No-obligation quote.
Related articles
Artificial Intelligence for SMEs: Practical Guide 2026
How SMEs can leverage artificial intelligence in 2026: chatbots, automation, data analysis and accessible AI tools. No jargon.
AI automation for SMBs: practical guide 2026
Practical guide to AI automation for small and medium businesses in 2026. Key areas, tools, real ROI, use cases and step-by-step implementation plan.