Why "chat with PDF" breaks for legal documents — and how we fixed it

8 Min Read

Abstract node graph with circles connected by solid lines converging to a central double-ringed node, on a dark background, representing multi-document legal retrieval architecture

The core problem with most document chat tools is not that they give wrong answers. It is that you cannot tell when they are wrong.

For a general knowledge question, a fluent but slightly inaccurate answer is annoying. For a contract review, it is a liability. When a lawyer or a compliance team asks whether a non-compete clause covers a specific jurisdiction, they need to point to the exact text in the document — not trust that the AI interpreted it correctly.

Most PDF chat tools are built for convenience. LexReviewer is built for trust.

Why generic RAG breaks on legal documents

Standard retrieval-augmented generation pipelines have three failure modes that become critical in legal workflows.

Flat, one-size-fits-all retrieval

Most systems run every query through the same pipeline: embed the question, find the nearest chunks in vector space, return them. That works for semantic questions like "what are the payment terms" but fails completely on queries like "Section 4.2(b)". Exact clause references need exact-phrase lookup. Semantic search alone cannot reliably find them.

No linked document awareness

Real contracts do not live in isolation. A services agreement references an MSA. That MSA references a data processing addendum. The addendum references a list of approved subprocessors. When you ask a question that spans those documents, a system that only knows about the document you uploaded gives you a partial answer and presents it as complete.

A partial answer presented as complete is worse than no answer — especially in a contract review.

Citations as an afterthought

In most systems, citations are appended at the end if they appear at all. They are metadata, not the primary output. In legal workflows, the citation is the point. If you cannot instantly see which clause the answer came from and verify it in the original document, the answer is not usable.

How LexReviewer solves each of these

Hybrid retrieval: vector search plus BM25

LexReviewer combines Qdrant vector search with BM25 keyword retrieval. Semantic questions hit the vector index. Exact-phrase lookups hit BM25. Both run and results are merged. You do not have to choose between a system that understands meaning and one that finds exact text. You get both.

LangGraph agent routing

Instead of running every query through the same pipeline, a LangGraph agent decides how to answer each question. It picks from three tools: search within the current document, fetch and query a linked document, or combine both. The agent also maintains conversation state across follow-up questions, so context does not evaporate after the second message.

The practical difference: focused answers instead of retrieval noise, and the agent does not lose track of what was being discussed three messages ago.

Citations built into the stream

Answers stream as NDJSON. Each chunk includes the answer text, the agent's reasoning, and reference positions with bounding boxes. You see where the answer lives in the source PDF while it is still generating — not after. Bounding boxes are stored in MongoDB alongside the full chunk text, so the UI can highlight the exact passage in the original document without a second round-trip.

Linked document awareness

When a contract references an amendment, schedule, or MSA, the agent can fetch and query those documents too. This closes the gap that makes most document chat tools impractical for real contracts. The answer you get reflects the full contract context, not just the document you happened to upload first.

The architecture in brief

LexReviewer is a FastAPI backend. PDFs go in through the upload endpoint, get chunked by Unstructured.io (which preserves layout and bounding box metadata), get embedded with OpenAI, and get indexed into Qdrant. Full chunk text and metadata go into MongoDB as the source of truth.

When a question comes in, the LangGraph agent decides which tools to call, composes a prompt with retrieved context, runs the LLM, and streams back the response with structured references. Chat history is persisted in MongoDB by session, so multi-turn conversations stay grounded.

Observability is built in via Langfuse. You can trace retrieval, reasoning, and responses without adding instrumentation yourself.

Who it is for

LexReviewer is for developers building legal AI tools, document copilots, or compliance systems where trust and verifiability are non-negotiable. If you are using LangGraph or LangChain, it integrates directly. If you have a custom agent framework, the API endpoints give you everything you need to build on top.

It is open source and self-hostable. You run MongoDB and Qdrant, configure your environment variables, and the service is up. For teams that want hosted infrastructure, LexStack provides that with usage-based pricing and no subscription required to start.

Getting started

The repo is live at github.com/LexStack-AI/LexReviewer. The README covers setup, environment variables, and the full API reference. There is also a built-in Streamlit UI if you want to test it without building a frontend first.

If you are building something in legal AI and want to discuss architecture decisions or real-world use cases, reach out at lexstack.lexcounsel.ai/contact.

LexStack is open-source infrastructure for legal AI. It includes LexReviewer for document RAG, Law MCP for structured legal tools, and MicroEvals for CI-native evaluation.