RAG and internal knowledge base for B2B teams: pgvector + hybrid search in Lovable

by Federico May 28, 2026 6 min read

ai
rag
knowledge-base

“Can we build a chatbot that answers questions over our documentation?” is probably the most frequent request of 2025-2026. The correct answer is: it depends. It depends on how much documentation you have, how often it changes, how many users actually consult it, and — most of all — what they do today when they can’t find the answer in Notion.

In this article I tell how we implemented an internal RAG system for a B2B team using Supabase pgvector and hybrid search, and when this choice makes sense compared to a simple “enhanced search” over the existing wiki. The example comes from a project in the resale sector, but the patterns apply to any B2B team with layered operational documentation.

What we actually mean by RAG

RAG — retrieval-augmented generation — means: before asking the LLM to answer, I retrieve the most relevant chunks of documentation for the question and pass them as context. The model doesn’t “know” your documentation; it reads it on the fly at every request.

Sounds trivial, but the critical point is “retrieve the most relevant chunks”. If retrieval is poor, the model hallucinates or answers generically. If retrieval is precise, the chatbot is as useful as a senior colleague who has read everything.

Why pgvector and not a dedicated vector database

For a B2B team with a knowledge base under a million chunks, a dedicated vector database (Pinecone, Weaviate, Qdrant) is almost always overkill. You add another service, another source of truth, another network hop. Pgvector inside Postgres is “good enough” for volumes most B2B teams will never hit, and it lets you join with the rest of the database in a single query.

In the reference project we have around 12,000 chunks (PDFs and Word docs ingested) and queries stay under 50ms at P95 on a standard Supabase instance. There’s no need for more.

The pattern: hybrid search beats pure-semantic

Mistake number one we see in production: using only semantic similarity via embeddings. It works beautifully in demos and poorly in production when the user searches for a proper name, an SKU code, an internal acronym or abbreviation. The vector for “AB-1247-PRO” and the vector for “AB-1248-PRO” are semantically identical, but they’re different SKUs.

The solution is hybrid search: you combine semantic retrieval (cosine similarity on embeddings) with full-text search (Postgres tsvector with ts_rank). Each retrieved chunk gets a combined score — in our case, a weighted sum with weights 0.6 semantic / 0.4 keyword, tuned on a test set of real questions.

The practical result: on 200 test questions representative of real traffic, pure-semantic delivered answer relevance of 71%; hybrid search rose to 87% on the same test set. It’s not an algorithmic revolution, it’s the difference between a “cute” chatbot and one the team actually uses.

Chunking: the other half of the work

The other blind spot is chunking. Most tutorials say “split every 500 tokens”. In production it’s terrible, because it cuts sentences mid-flow, breaks tables, separates a question from its answer.

The pattern that works:

Structural segmentation first. If the document has sections, paragraphs, tables, use them as natural boundaries. A technical PDF segmented by H2 + H3 produces much better chunks than a blind sliding window.
Controlled overlap. 10-15% overlap between consecutive chunks to avoid losing context at the cut.
Structured metadata. Each chunk carries document_id, section, type (procedure/policy/glossary), last_updated. You filter by metadata pre-search to reduce the pool.
Re-chunk when the source changes. Automatic trigger: when a PDF is updated, chunking restarts and old embeddings are replaced.

The AI Assistant with SQL tool-use

On another project (fashion ecommerce) we built an interesting variant: a streaming AI assistant with Gemini 2.5 Pro that, in addition to doing RAG on documentation, has access to “safe” SQL tools to query live catalog and order data.

“Safe” means: the model doesn’t write SQL freely. It has access to a set of predefined tools — get_orders_by_status, check_inventory(sku), customer_summary(email) — each with typed parameters and active RLS on the caller’s role. The LLM decides which tool to call; the database trusts only parameters, not free text.

This is the difference between “AI asks questions in natural language to the DB” (injection risk, disastrous queries) and “AI orchestrates tools the backend already knows how to execute”. The first is a demo, the second is a product feature.

Versioning prompts: the AI Library

One thing that gets systematically underestimated: prompts are code. If the “master prompt” orchestrating your RAG changes, the chatbot’s behavior changes. Without versioning, three months in nobody knows why the bot answers differently than at launch.

We built an internal “AI Library”: a UI where prompts are versioned, each with metadata (author, date, release notes) and an integrated testing area. When you modify a prompt, you pre-flight it on a test set before promoting to production. It’s exactly the same pattern as a feature flag on code, applied to prompts.

The voice gate: eval before going to production

Above the AI Library sits an eval layer: a set of standard questions the chatbot must answer correctly, with evaluation criteria (relevance, faithfulness to the retrieved context, brand voice). Every new prompt or model change goes through the gate. If regression crosses a threshold (for example, losing on more than 2 critical questions), the release halts.

On a test set of 50 critical questions, we blocked 3 releases in the last quarter for regressions that would have been invisible in manual tests. It’s unglamorous and it’s what separates an experimental RAG system from one the team relies on for daily work.

When RAG is actually worth it (and when not)

This is the most important question, and the one where we see clients getting stuck.

RAG makes sense when:

Documentation is over 100 documents and growing.
Answers require pulling from multiple sources to be complete.
The team consults documents multiple times a day and wastes time searching.
There’s a disambiguation need (acronyms, codes, product variants).

RAG doesn’t make sense when:

Documentation is static and Notion search is enough. In that case, improve the wiki structure before thinking about RAG.
Documents change every day and you don’t have an automatic re-ingestion pipeline.
The audience is a single person, who already knows where to look.

A realistic metric: in the reference project, across roughly 40 active internal users, the RAG chatbot receives around 600 queries/week and has reduced internal support requests to the senior team by 35%. ROI arrived in three months, but the payoff arrived because the upfront work on chunking and hybrid search was done right — not because of “AI magic”.

If you’re evaluating whether your B2B team is ready for a RAG system, the first thing to look at isn’t the technology. It’s how many times a day someone asks a colleague “where’s that thing, I can’t find it”. If the answer is “many”, let’s talk.