Building a production RAG system that doesn't lie to users
A production-grade RAG pipeline needs ingestion state, chunk metadata, vector isolation, citations, queue-based indexing, and honest failure modes.
Most RAG tutorials show a 10-line PDF upload demo. That works until the second document. Then someone uploads a 400-page contract and the answer cites page 287 with no way to verify. Then someone uploads confidential HR data and the system exposes it because vectors have no access control. Production RAG is a data pipeline with real consequences.
PostgreSQL for metadata, Qdrant for vectors
PostgreSQL stores organizations, documents, chunks, sources, and permissions. Qdrant stores raw vectors keyed by chunk IDs. This split means you can audit who uploaded what, which chunks were retrieved, and which source produced an answer — without touching the vector store for security queries.
async function indexDocument(doc: Document, orgId: string) {
// 1. Create doc record with 'processing' status
const docId = await db.documents.create({ organizationId: orgId, ... });
// 2. Enqueue indexing job — never block the upload request
await queue.enqueue('index:document', { docId, orgId });
return { docId, status: 'processing' };
}
// Worker processes asynchronously:
async function processIndex(job: IndexJob) {
const text = await extractText(job.doc);
const chunks = chunkText(text, { maxTokens: 800, overlap: 100 });
for (const [i, chunk] of chunks.entries()) {
const chunkId = await db.chunks.create({ documentId: job.docId, ... });
const embedding = await getEmbedding(chunk);
await qdrant.upsert(collection, { id: chunkId, vector: embedding });
}
await db.documents.update(job.docId, { status: 'indexed' });
}Citations are not optional
Every answer must cite which documents and chunks produced it. This is the difference between a tool users trust and a tool they abandon after one wrong answer. Citations let users verify claims, let operators debug retrieval quality, and let compliance teams trace data lineage. The system should validate that cited sources exist in retrieval results before showing them.
Updated:
Ready to ship your AI product?
Start free, route across providers, and see honest cost + readiness from day one.
Related reading
- Product
VeloxAI: the multi-model control plane for product teams
Why product teams need one API for models, agents, RAG, billing, analytics, and readiness instead of another thin provider proxy.
- Models
How to choose the right AI model for every product workflow
A battle-tested model selection framework covering cost, latency, context window, tool calling, vision, and reasoning — with real numbers and a decision matrix.
- Agent Security
Agent tools are powerful. That's exactly why they need sandboxes.
Useful agents call tools. Safe agents validate tool schemas, isolate execution, cap runtime, block network egress, and log every call.