Learned Hands: Labeled Dataset of Legal Issues in Reddit problem stories

A crowdsourced, expert-reviewed dataset of legal issue labels on real-world problem narratives, created to support machine learning and research on legal needs and access to justice.
Description
The Learned Hands Dataset is a publicly available, expertly structured dataset of over 3,400 anonymized real-world legal problem descriptions, each labeled across 20 legal issue categories. Each problem story, from r/legaladvice, is labeled with standardized legal issue taxonomy code from the LIST taxonomy.
Developed by the Stanford Legal Design Lab and the Suffolk LIT Lab with support from the Pew Charitable Trusts, the dataset serves as a foundational resource for building AI models that can detect legal issues in free-text narratives—whether from Reddit, intake forms, chat transcripts, or email.
The dataset was created using Learned Hands, an online game that invites legal professionals and students to spot legal issues in anonymous stories submitted online. Players label each text with one or more standardized legal issue codes, contributing to a growing training set of human-labeled data. To date, over 35,550 individual label “votes” have been submitted, generating confidence-scored annotations that are included in the dataset in both binary and continuous forms. These confidence scores help quantify how likely it is that a majority of reviewers would agree that a specific legal issue is present.
The current release includes several CSV files:
- Best Guess (3,459 texts / 35,550 labels): Each label represents the estimated agreement level among reviewers.
- 95% Confidence (1,506 texts / 16,014 labels): Labels included only when statistical confidence is high that a majority view exists.
Both are provided in binary and continuous formats, offering flexibility for researchers working on different classification approaches.
This dataset is more than a technical asset—it reflects a methodological innovation in legal AI development. Each story in the dataset has been assessed for what LIST issue taxonomy codes are present or not.
The dataset supports broader research questions in access to justice: What kinds of legal needs are people expressing? How are these needs articulated outside of formal legal contexts? What types of issues tend to cluster together? Beyond AI training, the Learned Hands project also supports topic modeling and taxonomy development, exploring latent structures in legal need expressions and supporting the design of legal health diagnostics, content taxonomies, and triage tools.
- Learned Hands Game: https://learnedhands.law.stanford.edu