VeloxAI
Back to Blog
Knowledge Base· 13 min read

Building a production RAG system that doesn't lie to users

A production-grade RAG pipeline needs ingestion state, chunk metadata, vector isolation, citations, queue-based indexing, and honest failure modes.

Nguyen Son Everestt
Nguyen Son EveresttFounder & Engineering Lead, VeloxAI
#rag#knowledge-base#qdrant
RAG architecture
RAG architecture

Most RAG tutorials show a 10-line PDF upload demo. That works until the second document. Then someone uploads a 400-page contract and the answer cites page 287 with no way to verify. Then someone uploads confidential HR data and the system exposes it because vectors have no access control. Production RAG is a data pipeline with real consequences.

PostgreSQL for metadata, Qdrant for vectors

PostgreSQL stores organizations, documents, chunks, sources, and permissions. Qdrant stores raw vectors keyed by chunk IDs. This split means you can audit who uploaded what, which chunks were retrieved, and which source produced an answer — without touching the vector store for security queries.

async function indexDocument(doc: Document, orgId: string) {
  // 1. Create doc record with 'processing' status
  const docId = await db.documents.create({ organizationId: orgId, ... });
  // 2. Enqueue indexing job — never block the upload request
  await queue.enqueue('index:document', { docId, orgId });
  return { docId, status: 'processing' };
}

// Worker processes asynchronously:
async function processIndex(job: IndexJob) {
  const text = await extractText(job.doc);
  const chunks = chunkText(text, { maxTokens: 800, overlap: 100 });
  for (const [i, chunk] of chunks.entries()) {
    const chunkId = await db.chunks.create({ documentId: job.docId, ... });
    const embedding = await getEmbedding(chunk);
    await qdrant.upsert(collection, { id: chunkId, vector: embedding });
  }
  await db.documents.update(job.docId, { status: 'indexed' });
}
Queue-backed indexing with metadata-vector split

Citations are not optional

Every answer must cite which documents and chunks produced it. This is the difference between a tool users trust and a tool they abandon after one wrong answer. Citations let users verify claims, let operators debug retrieval quality, and let compliance teams trace data lineage. The system should validate that cited sources exist in retrieval results before showing them.

Updated:

Ready to ship your AI product?

Start free, route across providers, and see honest cost + readiness from day one.