Building a legal AI agent from scratch with LexStack
10 Min Read

Most legal AI demos look impressive. Most legal AI systems in production look very different from the demo. The gap is almost always in the infrastructure layer — the retrieval, the citation handling, the evaluation — not in the LLM itself.
This guide walks through building a legal AI agent using LexStack from the ground up. By the end you will have a working agent that ingests legal PDFs, retrieves relevant content using hybrid search, routes queries intelligently with a LangGraph agent, streams answers with citations, and has a CI evaluation pipeline that catches regressions before they reach production.
What you are building
A FastAPI backend that accepts legal PDF uploads, indexes them for retrieval, and exposes a streaming chat endpoint. The agent handles multi-turn conversations, can follow references to linked documents, and returns structured citations alongside every answer.
Stack: Python, FastAPI, LangGraph, LangChain, Qdrant, MongoDB, Unstructured.io, OpenAI.
Step 1: Environment setup
Clone the LexReviewer repo and set up your environment:
git clone https://github.com/LexStack-AI/LexReviewer cd LexReviewer python -m venv .venv && source .venv/bin/activate pip install -r requirements.txt
Copy the environment template and fill in your keys:
cp .env.example .env
You need four services running: MongoDB (chat history and document store), Qdrant (vector embeddings), OpenAI (LLM and embeddings), and Unstructured.io (PDF chunking). The .env.example file documents every variable. At minimum you need:
for LLM and text-embedding-3-large OPENAI_API_KEY
for PDF parsing and chunk layout UNSTRUCTURED_API_KEY
default mongodb://localhost:27017 MONGODB_URL
your Qdrant instance endpoint QDRANT_URL
Step 2: Starting the service
Once your environment is configured, start the API:
python app.py
This starts a Uvicorn server with hot reload at http://0.0.0.0:8000. You can also run the built-in Streamlit UI in a second terminal to interact with the agent visually while you build:
streamlit run ui/ui_app.py
Step 3: Ingesting a legal document
The ingestion pipeline is what makes the agent useful. Send a POST request to /upload-documents with a base64-encoded PDF and a document ID:
curl -X POST http://localhost:8000/upload-documents \ -H "document-id: contract-001" \ -F "file=@/path/to/contract.pdf"
Under the hood this does four things in sequence:
preserving layout metadata, bounding boxes, and page positions Unstructured.io chunks the PDF
creating compact index entries that improve retrieval precision Chunks are optionally summarised
using text-embedding-3-large for high-quality semantic vectors OpenAI generates embeddings
tagged with
document_idfor filtered retrieval Qdrant indexes the embeddings
The full chunk text and all metadata (bounding boxes, page numbers, section headers) are stored in MongoDB. Qdrant holds the vectors. MongoDB holds the source of truth. This separation matters — it means you can retrieve by vector similarity and still surface exact passage locations for UI highlighting.
Step 4: Understanding the retrieval layer
LexReviewer uses hybrid retrieval: Qdrant vector search combined with BM25 keyword search. Both run on every query and results are merged.
Vector search understands meaning. BM25 finds exact phrases. Legal documents need both — semantic questions and exact clause references are equally common in real workflows.
The retrieval layer always filters by document_id so results are scoped to the document being queried. For linked document retrieval, the agent calls a separate tool that fetches and queries referenced documents via the LINKED_DOCUMENT_FETCH_URL endpoint.
Step 5: The LangGraph agent
The agent is the core of the system. Rather than routing every query through the same pipeline, a LangGraph graph manages conversation state and selects which tools to call per question.
The graph has three key nodes:
analyses the question and decides which tools to invoke — in-document retrieval, linked document retrieval, or both Required tools generator
composes a prompt that injects retrieved context and follows the legal-answer template Agent prompt generator
runs the OpenAI LLM, optionally in reasoning mode, and streams partial answers with structured references Agent node
Conversation state is managed as AgentState across turns. Chat history is persisted in MongoDB using session IDs scoped to user_id and document_id, so follow-up questions stay grounded in prior context.
Step 6: Querying the agent
Send questions to the /ask endpoint with the document ID and user context:
curl -X POST http://localhost:8000/ask \ -H "document-id: contract-001" \ -H "user-id: user-123" \ -H "username: alex" \ -d '{"question": "What are the termination rights for the service provider?"}'
The response streams as NDJSON. Each chunk contains one of three types:
a fragment of the answer text as it generates chunk
the agent's reasoning commentary (optional, depends on model mode) thought
structured citation data including page, bounding box, and passage text reference_positions
The reference_positions field is what makes citations real rather than cosmetic. It contains everything a frontend needs to highlight the exact passage in the source PDF as the answer streams.
Step 7: Adding MicroEvals to CI
A working agent is not enough. You need to know it keeps working after every change. MicroEvals gives you that.
Create an agent wrapper that exposes the run_agent interface:
def run_agent(document: str, question: str) -> str: # call your /ask endpoint or invoke the agent directly response = call_lexreviewer(document, question) return response['answer']
Then run the evaluator against a dataset:
python cli.py my_agent.py nda_basic
Add this to your GitHub Actions workflow:
- name: Run MicroEvals run: python cli.py my_agent.py nda_basic env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
Set score thresholds in your evaluator config. If citation accuracy drops below 0.85 or hallucination rate exceeds 0.05, the build fails. You know before it reaches production.
Step 8: Observability
LexReviewer has Langfuse integration built in. Set LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY in your .env and every retrieval, reasoning, and answer step gets traced automatically. For production error tracking, add your SENTRY_DSN.
This is not optional for a legal AI system in production. When an answer is wrong, you need to be able to trace exactly what was retrieved, what prompt was composed, and where the reasoning diverged. Without tracing, debugging is guesswork.
What you have at the end
A production-ready legal AI agent that ingests any legal PDF, retrieves relevant content with hybrid search, routes queries intelligently, streams answers with verifiable citations, and has automated regression testing in CI.
Every component is open source. Every piece is extensible. The architecture is designed so you can replace the LLM, swap the vector store, or add new legal tool integrations without rebuilding the rest.
Repo: github.com/LexStack-AI/LexReviewer
LexStack is open-source infrastructure for legal AI. It includes LexReviewer for document RAG, Law MCP for structured legal tools, and MicroEvals for CI-native evaluation.