22h ago

How we shipped a three-brand AI assistant on a single VPS

Three production sites. Three AI personas. One small ARM VPS. The architecture, the retrieval index, and the trade-offs we made to keep the LLM bill at zero.

DevPilotX, PaisaReality, and Value.Codes are three very different sites. One is a studio portfolio. One is an Indian personal finance hub. One is a developer tools library. They share one thing: every page now has a small floating assistant that actually knows the site it lives on.

The constraint was non-negotiable. No LLM bill, no third-party AI API, and one 4 GB ARM VPS shared with three Next.js apps and three databases. So the whole stack had to be retrieval-only, locally hosted, and tiny enough to start in under a second.

The shape of the system is three personas registered in a single assistants table. DevSage answers code and snippet questions on Value.Codes. Yojana Mitra explains government schemes and bank rates on PaisaReality. DevPilotX Builder answers questions about the studio itself. Each persona has its own colour, voice, knowledge cutoff, and most importantly its own knowledge namespace.

The knowledge layer is a simple but disciplined ingestion pipeline. Every published page on each site is crawled, normalised, split into chunks, hashed, and stored in a per-namespace SQLite table. Term-frequency scoring with a small inverse-document-frequency boost picks the top chunks for any incoming query. The whole thing fits comfortably in memory and adds about 30 ms to a response. There is no neural model in the loop.

The answer composer is intentionally conservative. It quotes from the retrieved chunks, never invents new facts, and always shows the source URL. If retrieval finds nothing relevant, the assistant says so and offers the contact form. That single rule killed an entire category of hallucination bugs that more ambitious systems run into.

The widget on the page is one self-contained script tag. It loads the persona config, paints the bubble in the right brand colour, and posts to a single ask endpoint with the persona slug. Because it carries no framework, it embeds cleanly on a Next.js page, a server-rendered EJS page, and a vanilla PHP page without surprises.

Operationally the win is that we never pay per token. A traffic spike on Value.Codes costs the same as a quiet weekend. The cost ceiling is the VPS itself, which is fixed. The trade-off is exactly what you would expect: the assistant cannot reason across documents or write fresh prose. For a site assistant whose job is to surface the right answer fast, that is the better trade.

If you want to copy the pattern, the three things that mattered most were a strict per-namespace index, a composer that refuses to invent, and an embed widget light enough to drop on any framework. Everything else is fungible.