A 6-Attorney Firm Cuts Contract Review Time by 70% Using RAG Over Their Own Precedent Library

The problem with legal knowledge

Most law firms’ institutional knowledge lives in three places: the heads of senior partners, a folder structure nobody can remember, and a search box that returns results by filename.

This firm had 14 years of commercial contracts (NDAs, SaaS agreements, employment contracts, licensing deals, shareholder agreements) across 23,000 documents in Google Drive. When a junior associate needed to find how they had handled a particular indemnification clause in a previous deal, the answer was to ask a senior partner or spend two hours searching. Both options were expensive.

The partners were not opposed to using AI. They were opposed to using AI that made things up. One partner described a competing firm that cited non-existent case law in a client brief after using an off-the-shelf legal AI tool. That story was circulating at every bar event that year.

What made this different from “just use a chatbot”

The requirement that shaped everything: every answer had to show the source document, the exact page, and the exact clause number. Not a summary, but the document itself. Attorneys would verify every citation before relying on anything.

This is a RAG retrieval problem, not an LLM generation problem. The model’s job is not to generate law, it is to find the right precedent and surface it accurately.

The architecture:

Ingestion: We built a custom pipeline that extracts text from PDFs (including scanned ones via Tesseract), segments by clause type (indemnification, limitation of liability, IP ownership, governing law, etc.) using a fine-tuned classifier, and stores chunks with rich metadata: document type, parties, jurisdiction, date signed, practice area, signing attorney.

Retrieval: Hybrid search, dense (Voyage AI legal embeddings) plus sparse (BM25 over clause text) with Cohere Rerank v4 as the final filter. The reranker turned out to be critical: without it, semantically similar but jurisdictionally irrelevant clauses ranked too high.

Answer generation: Claude Sonnet 4.6 with a strict prompt that says: “Return only what you found in the retrieved documents. If the documents do not answer the question, say so explicitly. Always cite document name, date, and clause number.”

Output format: The answer appears in their case management system with collapsible source citations that link directly to the relevant page in Drive.

What happened during the pilot

We ran a 6-week pilot with 3 associates and 1 partner. We gave them 40 real questions they had actually needed to answer in the past six months, questions that had taken between 1 and 4 hours to research manually.

The RAG system answered 34 of 40 correctly on the first try. Of the 6 failures:

3 occurred because the relevant document was a scan with poor OCR quality (we fixed the OCR pipeline after week 2)
2 were genuine gaps in the precedent library where the firm had never handled that clause type before (correct answer: no precedent exists)
1 cited the right document but the wrong clause number, which we corrected by adding a post-retrieval verification step that cross-references the cited clause back to the source text

After fixes, the pilot closed at 39/40. The one remaining miss involved a highly specific question about sub-licensing rights under a 2013 deal with unusual dual-jurisdiction clauses. A senior partner reviewed it manually, as they would have done regardless.

The results that mattered

Average contract review time dropped from 11.2 hours to 3.2 hours
Junior associates stopped interrupting senior partners for precedent searches, which was the largest qualitative improvement
The system surfaced a favorable indemnification clause from 2018 that a senior associate had forgotten existed, which directly influenced a negotiation and saved the client an estimated $220K in exposure
$380K in previously uncaptured billable hours was recovered in the first year (conservative estimate: 2 hours per week per attorney, 6 attorneys, $250 per hour blended rate)

What we would do differently

The clause classifier we trained took 6 weeks and a significant amount of manual labeling. In hindsight, Claude Opus 4.5 handles zero-shot clause classification well enough that we would use it instead of a custom classifier on a new engagement.

OCR quality on pre-2015 scanned documents was a persistent problem. We now recommend clients run a document quality audit before ingestion and either re-scan or accept lower recall on old documents.

Tech used

LlamaIndex · Claude Sonnet 4.6 · Voyage AI legal-2 embeddings · Qdrant · Cohere Rerank v4 · Tesseract OCR · custom PDF ingestion pipeline · Google Drive API · their existing case management system (Matter365)