How we built the ZonaMundial AI Coach: RAG architecture and optimized prompts

At SprintMarkt we usually publish theoretical guides about AI. Today is different: we are opening up one of our own in-house products in production. ZonaMundial is a fantasy football platform focused on the 2026 World Cup, fully built, operated and commercialised by SprintMarkt. Inside it lives a flagship module we call AI Coach. This post covers how we built it, what technical decisions we made and what we learned along the way — and why running an in-house product gives us an edge when applying that same muscle on client projects.

The initial challenge

Because this is an in-house product, we set the constraints ourselves: audience mainly in Latin America (Mexico, Colombia, Argentina), live sports data feeds required, and a tight deadline to make it in time for the qualifying rounds. The fourth constraint, self-imposed, was the most ambitious: the AI Coach had to go beyond a generic chatbot and give advice truly grounded in FIFA rules, historical stats and the user's own fantasy squad.

That last bit was the hardest part. A raw LLM hallucinates sports data all the time. Wrong player names, wrong clubs, invented stats. For a fantasy product where users bet reputation and (sometimes) money, that is unacceptable.

Chosen stack

After a couple of days of technical spikes we locked in the stack:

Frontend: React on top of our own PHP base, new components in TypeScript and occasional 3D rendering with Three.js for stadium visualizations.

Backend: PHP 8.2 as the main business layer, with Node.js microservices for anything touching AI.

Sports data providers: Commercial real-time stats APIs (matches, lineups, events) with a Redis caching layer.

LLM: Claude Sonnet 4.6 as the Coach's main engine, with a cheaper fallback for short, repetitive answers.

RAG: pgvector on top of PostgreSQL (since we owned the stack from day one we picked Postgres as the natural choice).

Image generation: Stability AI for avatars and weekly Coach newsletter covers.

Infrastructure: Cloudflare as CDN and WAF on the edge, application servers on SWHosting, delegated SSL so sponsor subdomains could work.

RAG architecture, in plain English

RAG (Retrieval Augmented Generation) is not magic: it just pulls relevant chunks of information before asking the LLM, so the model answers based on real data instead of its statistical memory.

ZonaMundial has three separate corpora indexed in pgvector:

1FIFA rulebook and fantasy terms: official 2026 rules, fantasy scoring system, terms and conditions.

2Player and team history: aggregated stats from recent cycles, group vs. knockout performance, recurring injuries.

3Dynamic user context: their current squad, captains, position in the private league they play in.

When a user asks the Coach something like 'who should I captain on Saturday?', the flow is:

1We embed the question with text-embedding-3-small.

2We run retrieval with cosine similarity over the three corpora, top-k=8 with a minimum threshold.

3We add live context: probable lineups, match weather, last five encounters.

4We build the final prompt for Claude with a fixed system, the retrieved context and the user's question.

5We return the answer with citations so the user can verify.

The (dark) art of prompting

The AI Coach system prompt went through roughly fifteen iterations before stabilizing. Key takeaways:

Personality without overdoing it: . The Coach sounds like a close Latin American manager, drops mild cultural references and avoids the corporate 'it seems you might consider' tone.

Hard rules first: . 'Do not invent stats. If the data is not in the retrieved context, say you do not have it.' This rule, repeated three times with different wording, cut stats hallucinations by over 80% in our internal tests.

Predictable format: . We ask the model to always return a two-line summary, then details, then a short disclaimer. That lets us render the chat with consistent components.

Regional language: . We detect country by IP (with consent) and tweak vocabulary slightly. It is not translation, it is localization.

Need help with your project?

Calculate your budget in 2 minutes with our interactive tool.

Calculate budget

Costs: the part nobody talks about

An AI Coach in production is not free. The real cost breakdown:

- Claude API: variable, dominant once the user base grows.

- Embeddings: dirt cheap by comparison, but they add up with weekly corpus re-indexing.

- pgvector: extra CPU and memory on the Postgres box.

- Image generation: charged per image by Stability AI, controlled with a hard monthly budget.

The biggest lever to control costs was semantic caching: many fantasy questions are variations of the same five or six (captain, transfer, gameweek). We cache answers keyed by the question embedding and return the cached response if similarity clears a high threshold. That meaningfully lowered token spend with no perceptible UX change.

Latency and UX

A streaming LLM takes seconds to answer. To avoid the app feeling slow, we fire three things in parallel as soon as the user starts typing: animated skeleton with messages like 'analyzing lineups', prefetch of the most likely RAG context based on the page they are on, and token-by-token streaming with progressive markdown rendering. The full answer still takes the same time, but perception changes radically.

Security and moderation

A fantasy platform with anonymous users attracts trolls. We put three layers in place:

1Input filtering with a light regex blacklist (serious slurs, spam).

2Classification with a small LLM to detect prompt injection before it reaches the Coach.

3Post-filtering of Claude's answer to ensure it does not leak the user's personal data (the model sometimes tries to be friendly and uses the real name if it has seen it in context).

Infrastructure and deploy

Everything goes through Cloudflare as the first hop: aggressive static caching for assets, WAF rules to block scraping bots, and per-IP rate limiting to protect AI endpoints. Application servers live at SWHosting, with SSL delegated to Cloudflare for sponsor subdomains. Deploys go via Git + automated SFTP, with a hook that purges Cloudflare cache only for the routes that changed.

Lessons learned

If we started again today we would do three things differently: semantic caching from day one instead of adding it late, designing the RAG corpus around 300-500 token chunks (we tried 1,000 and precision dropped), and running a 5% traffic canary before any prompt change to catch regressions.

Result

The AI Coach has been running in production for months, serving ZonaMundial's LATAM community. It is not perfect, but users actually use it, mention it on social and come back gameweek after gameweek. As an in-house product it sets us apart from other fantasy platforms, and it doubles as a live laboratory for the AI agents we build for our clients.

Want something like this for your business

If you are considering an AI agent with RAG, a conversational assistant on top of your own content, or a vertical coach for your industry, we do this at SprintMarkt. We always start with a 490€ AI audit to understand if it makes sense, how much it would cost and what ROI to expect. No hype, no 'digital transformation' buzzwords. Drop us a line.