Auditing AI Legal Work Product

The Mata v. Avianca discipline, and what a real audit produces.

The Mata v. Avianca sanctions order is now the cautionary tale every litigator knows, but the operational lesson is not "do not use AI." It is "do not file what the AI produced without verifying every citation." This article describes the audit discipline SpotlightAICore applies, both to its own work and to AI-generated work the customer produced elsewhere.

The actual failure mode

Mata and the cluster of cases that followed it share a consistent shape. Counsel uses a general-purpose LLM to research and draft, the LLM produces text that reads as competent legal analysis, the text includes case citations that look like real citations, and some material number of those citations resolve to cases that do not exist, paraphrases the model invented, or passages substantially altered from the cited source. The filing goes out, opposing counsel discovers the issue, the court sanctions counsel and orders disclosure.

The defect is not that AI was used. The defect is that no one verified the citations against the cited sources before the filing was made. The verification step is the missing discipline.

What a citation audit produces

A real audit reads every citation in the deliverable, locates the cited source in the document set, compares the deliverable's assertion to what the source actually says, and produces a pass/fail call for each citation. The aggregate is a pass rate. The SpotlightAICore threshold is 0.95: a deliverable with a pass rate below 0.95 is held for founder review and does not ship until the failed citations are corrected or removed.

The failure modes the audit looks for are specific:

Hallucinated source. The cited document does not exist in the source set at all. This is the Mata failure.

Paraphrase mismatch. The source exists; the deliverable's paraphrase departs from what the source actually says in ways that change the meaning. Often the source supports a narrower or qualified proposition than the deliverable's paraphrase asserts.

Page-number error. The citation points to a page or paragraph that does not contain the quoted material. The source exists, the quote exists somewhere in it, but not at the cited locator. Easy to fix; embarrassing if filed.

Quote substantially altered. The deliverable presents something in quotation marks that does not match the source verbatim. Sometimes a minor punctuation drift; sometimes a material rewrite.

Document not in source set. The citation references a document outside the production. Could be intentional (an external authority) or could be a hallucination dressed up as external authority.

Auditing third-party AI work product

SpotlightAICore builds D33 — a Citation Verification Audit deliverable specifically for AI-generated work the customer produced elsewhere, whether from Harvey, CoCounsel, Legora, Eve, or a do-it-yourself Claude or ChatGPT pipeline. The deliverable reports verified citations, failed citations with their specific failure mode, and a plain-language verdict on whether the work product is safe to rely on in a filing.

The prompt for D33 forbids language that assigns blame to the third-party AI work. The audit reports facts and remediation, not fault. That posture is intentional: the customer commissioned the audit because they want to know whether to file the work, not to run an inquest on their existing vendor.

Why founder review still matters

The 0.95 pass-rate threshold catches mechanical failures. It does not catch reasoning failures — a deliverable whose citations all resolve correctly but whose argument depends on a misreading of the cited material. That is what founder review is for, and it is the discipline the catalog's published methodology commits to. Citation verification is necessary; it is not sufficient.

See the Citation Verification Audit Free First Matter on Cat 8