Why persistent chat history is non-negotiable for legal AI

6 Min Read

Dark background (#050505). Smooth thin white curve starting flat on the left, rising gradually to a peak in the centre-right, then descending. Inside a white-bordered rectangle. Two dashed horizontal threshold lines. Subtle dashed grid. No fills, no text, no icons. Brand visual language: geometry only.
Contract review is not a single question. It is a conversation.

A lawyer reviewing a services agreement might start by asking about the termination rights, then ask a follow-up about the notice period, then ask how that interacts with the liability cap, then ask whether the same terms apply to the auto-renewal clause. Each of those questions builds on the last. The context established in question one matters to question four.

Most document chat tools do not preserve that context. Each question is processed independently. The system has no memory of what came before. By question three, the user is re-establishing context that was already set, and by question four, the system is giving answers that contradict what it said in question one because it has no idea what it said in question one.

A legal review session is a reasoning process. Every question informs the next. A system without memory forces the user to carry all the context themselves — which defeats the purpose of having an AI assistant.

What stateless document chat actually costs

The cost of stateless chat is not just inconvenience. In legal workflows it creates specific failure modes that erode trust.

Contradictory answers across a session

Without history, the same question asked twice in slightly different ways can produce different answers depending on what context happens to be retrieved each time. For a legal professional, two different answers to the same question from the same system is not a minor inconsistency. It is a reason to stop trusting the system entirely.

Forced context repetition

If the system has no memory, the user has to re-establish relevant context in every question. 'Given that we established earlier that the termination clause requires 30 days notice, does the auto-renewal clause override that?' A stateless system cannot answer this question correctly because it has no record of what was established earlier.

No support for multi-step analysis

Many legal analysis tasks are inherently multi-step. Reviewing an indemnification clause properly requires understanding the definitions, the scope of obligations, the carve-outs, and how they interact. A user cannot work through this systematically in a stateless system because each step forgets the previous one.

How LexReviewer handles chat history

LexReviewer persists chat history in MongoDB using MongoDBChatMessageHistory. Each session is scoped by a combination of user_id and document_id, which means history is isolated by both user and document simultaneously.

session_id = f"{user_id}_{document_id}"

This scoping matters for two reasons. First, it ensures that different users reviewing the same document have independent conversation histories — one user's session does not bleed into another's. Second, it ensures that a user reviewing multiple documents has separate histories for each one, so context from one contract does not contaminate answers about another.

What is stored and how it is used

Each turn in the conversation is stored as a message pair: the user's question and the agent's answer, along with the references that were surfaced. When a new question arrives, the agent prompt generator node reads the full history for the current session and injects it into the prompt.

The LLM then has access to everything that was established in the session. Follow-up questions that reference earlier answers work correctly because the system actually has the earlier answers available. Questions like 'what did we say about the liability cap earlier' are answerable, not broken.

Handling long conversations: history summarisation

Injecting the full conversation history into every prompt works well for short sessions. For long review sessions — a lawyer working through a 300-page contract over two hours — the history eventually exceeds the context window.

LexReviewer handles this with a history summariser that condenses older turns into a compact summary while keeping recent turns in full. The summary preserves the essential conclusions and context established earlier in the session without consuming the full context window.

# Older turns get summarised summary = summarise_history(turns[:n])  # Recent turns stay in full recent = turns[n:]  # Both injected into the prompt context = [summary] + recent

The threshold at which summarisation kicks in is configurable. For most legal review sessions, keeping the last 10 to 15 turns in full and summarising everything before that strikes the right balance between context fidelity and context window efficiency.

History management endpoints

LexReviewer exposes a full set of history management endpoints so applications built on top of it have complete control over session state:

  • GET /get-history returns the full chat history for a session

  • POST /save-message-in-history persists or updates a specific message

  • POST /revert-history truncates history to a specific index — useful for branching reviews

  • DELETE /clear-history clears all history for a session

The revert endpoint is particularly useful for legal workflows. A reviewer might explore one line of questioning, decide to approach the analysis differently, and want to revert to an earlier point in the session without starting over entirely. This kind of non-linear review workflow is common in practice and impossible without a history management layer that supports it.

Session isolation and privacy

Because sessions are scoped to user_id and document_id, there is no risk of cross-user or cross-document contamination at the history layer. Each user's review of each document is a completely isolated session.

For teams running LexReviewer as a shared service — multiple users reviewing documents from the same corpus — this isolation is not optional. Legal document review involves confidential information. A system where one user's queries could influence another user's answers is not deployable in a legal context.

Why this is infrastructure, not a feature

Persistent, isolated, summarisation-aware chat history is not a feature you add to a legal AI system. It is part of the foundation. Without it, multi-step analysis does not work, follow-up questions are broken, long sessions degrade, and the system cannot be trusted for anything beyond simple one-shot lookups.

LexReviewer includes this as part of the core backend precisely because it is foundational. The history layer, the session scoping, the summarisation logic, and the management endpoints are all built in. Teams building on top do not need to implement any of this themselves.

Repo: LexReviewer

LexStack is open-source infrastructure for legal AI. It includes LexReviewer for document RAG, Law MCP for structured legal tools, and MicroEvals for CI-native evaluation.