2d ago

Building a local RAG over my portfolio without an LLM bill

How the Ask Dipanshu widget on this site retrieves relevant resume and project snippets without calling any external model.

Most portfolio chat widgets pay an LLM per request. This one does not. The goal was to keep the assistant on the site useful, fast, and free to operate, even if traffic spikes.

The retrieval index lives in process. On boot, the corpus is tokenized once: every project description, every resume bullet, every service deliverable, the about text. Each document keeps a term-frequency map. A query is tokenized the same way, scored against every document with a small log-weighted sum, and the top matches are returned with a snippet window.

There is no neural model and no API call. The site responds in single-digit milliseconds and never has a cold start cost from an external provider. For a portfolio Q&A it is more than enough. You can see it working on every page through the floating Ask Dipanshu bubble.

The trade off is that the assistant cannot reason. It cannot summarize across documents or write fresh prose. It surfaces the right document fast and shows where the answer came from. For an interview signal or a recruiter scan, this is the better default. Cite the source, do not hallucinate it.