This article began as a casual chat with a former colleague who currently works in a platform team - the folks who build shared tools and infrastructure used across all product teams in his company.
His company recently had recently got an AI platform. The platform gave access to multiple LLMs, built in chat bot interface with other enterprise features for access control and common integration to enterprise tools for ingestion etc.
The company was promoting all employees to use the AI platform and build tools to make themselves more productive. He had a simple goal:
I want to build an internal chatbot that can help my team - and devs in another teams- quickly find answers from internal docs, wikis, and changelogs when they want to integrate with the platform.
On a recent weekend, he reached out to me brainstorm how he could do this. He had heard about RAG, he had rough idea that he had to use his company's AI platform and upload documents specific to his platform. He had tried few things but results were not as expected. So we got talking and what i have captured here is all the conversation we had around RAG and how he needs to use it in his project.
Language models like GPT are amazing, but they're trained on static datasets and often don't know your internal product details or recent changes. They hallucinate when they're unsure, confidently making up answers.
RAG fixes that by letting the model look things up in real-time. It retrieves relevant content (like docs, help articles, code), then feeds it to the LLM as context. So the model's answer is grounded in your actual data - not guesswork.
That's where RAG really earns its keep.
Even if the model has never "seen" the product during training, it can still generate accurate answers - as long as you give it the right context through retrieval.
Think of it like this: you're asking the model a question and handing it a cheat sheet at the same time. It reads the cheat sheet and gives you an answer based on that, not just its pre-trained knowledge.
Yes, just like people get overwhelmed with too many browser tabs open, LLMs can get confused if you dump in a wall of irrelevant or noisy text. You'll:
The goal isn't to give it everything - it's to give it only what it needs. Think signal over noise. The best RAG setups retrieve and inject just a few high-quality chunks, not the entire knowledge base.
We debated this one for a while. The answer is pretty clear:
Go with one RAG system per product.
Why?
The only time you might want to combine them is if your products are tightly integrated - or you've built a solid routing layer to figure out which product the question is about. Based on strategy you pick, quality of the answer from RAG/LLM can wildly swing.
This is where routing comes in. You've got a few options:
In practice, a hybrid approach works best:
Not necessarily, but it often helps to think that way. You can either:
Both approaches work. The first keeps things simple. The second is more flexible if you want to support cross-product search later.
Chunking is so underrated - get this wrong and everything else falls apart.
Here's what works:
Most modern RAG frameworks support smart chunking out of the box (LangChain, LlamaIndex, Haystack, etc.).
Unless you have a very specific use case (like legal summaries or structured formats), you probably don't need fine-tuning.
Fine-tuning:
Start by improving your retrieval pipeline:
You'll get most of the performance gains there - no GPU cluster required.
Treat each product's RAG system like a mini product in itself. For each one:
Track things like:
You can use LangChain's evals, LlamaIndex evals, or just a simple spreadsheet + eyeballs.
Your docs and knowledge base aren't static - your RAG system shouldn't be either.
To keep things fresh:
Think of RAG as a living knowledge layer, not a one-time dump.
If you're building internal tooling with RAG:
It's not about making the LLM "smarter" - it's about feeding it the right stuff at the right time.
Hungry for more hands‑on guides on coding, security, and open‑source? Join our newsletter community—new insights delivered every week. Sign up below 👇