JusticeBench

Multi-Modal LLMs for Justice: Extracting Legal Data from Paper Forms

This project, led by Hannes Westermann and Jaromir Savelka, investigates how multi-modal large language models (LLMs) can help laypeople and self-represented litigants overcome the challenges of dealing with legal documents—especially when relevant information exists only in paper form. The study explores whether state-of-the-art models like GPT-4o can accurately extract structured data from images of handwritten or printed forms, even when the data is messy, incomplete, or captured in low-light conditions.

Problem Addressed

People navigating the legal system must often interpret and extract information from official documents like leases, benefit forms, and letters—many of which exist only in paper format. This is a major barrier to accessing justice. Tools like LLMs can help answer questions or fill out forms, but they usually rely on the user to type in the necessary information correctly. This creates friction and limits usability.

Approach

To test the potential of LLMs to solve this challenge, the team created a benchmark dataset consisting of filled-out versions of Ontario's Residential Tenancy Agreement (Standard Form of Lease). The forms were completed in three different information scenarios, each with varying levels of complexity and missing data. For each scenario, the form was rendered in five formats:

Typed PDF (screenshot)
Neatly handwritten, high-quality image
Sloppily handwritten, high-quality image
Neatly handwritten, low-quality image
Sloppily handwritten, low-quality image

The GPT-4o model was given instructions and base64-encoded images to extract 14 key data fields, including landlord and tenant names, addresses, and unit details.

Key Findings

Overall Accuracy: The model correctly extracted 73% of all fields across scenarios.
Typed forms performed best: With nearly 98% accuracy, digital forms yielded near-perfect extraction.
Messy handwriting and poor image quality reduced performance, but the model could still locate fields and partial content in most cases.
Certain fields like city, province, and street name were more robust, while street number and uncommon names posed greater difficulty.
Models favored common names (e.g., Jane over Jame), revealing potential bias embedded in token distributions.

Implications

This study is a promising step toward developing user-facing AI systems that can simply ask someone to “take a picture” of a lease or document, then help extract and reuse key information to:

Populate forms
Explain rights
Generate legal drafts

It highlights multi-modal LLMs as a tool for reducing cognitive and technical barriers in justice processes, particularly for those without legal training or strong digital literacy.

Future Work

Larger-scale evaluation with a broader range of forms
Refining prompts and model selection to boost accuracy
Integrating this capability into complete legal help systems

This work adds to a growing body of research demonstrating how AI and LLMs can increase access to justice—not just by interpreting law, but by helping people handle the messy paperwork required to use it.

Read more at the paper from Hannes and Jaromir: https://arxiv.org/html/2412.15260v1

Lease Data Extractor with AI

Project Description

Problem Addressed

Approach

Key Findings

Implications

Future Work

Link to Project