Legal Help Synthetic Query Pack - LHSQ115

Legal Help Synthetic Query Pack - LHSQ115

A set of 115 fully synthetic legal-help queries drafted from patterns observed in 2025 online legal help questions from members of the US public, labeled with LIST issue codes, language, and key metadata for safe sharing and evaluation.

Description

The LHSQ115 dataset contains 115 example user questions and prompts designed to reflect the kinds of messages that arrive in online legal help services.

All items are synthetic: they were written by Legal Design Lab staff based on common patterns and themes to reflect real-world phrasing and scenarios, but no record contains real user text. Because the set contains no real user text or identifiers, it is intended to be safer to share for research, evaluation, and tool development.

Each query is labeled with the LIST Legal Issues Taxonomy at both the top-level parent category and the more specific issue code, supporting consistent filtering, benchmarking, and cross-jurisdiction comparisons. Records also include metadata such as the language of the query and flags indicating whether a query mentions a jurisdiction (state/city/court) or other contextual details. The dataset is suitable for use in training small routing models, building evaluation suites for triage and referral tools, and testing multilingual experiences.

The dataset is distributed via Airtable and is available on request. For access and reuse permissions, contact the Stanford Legal Design Lab.

Field-level data dictionary

User Query

Type: Long text
What it is: The synthetic question/search prompt written to resemble a real online user legal help query input.

Sophistication of Query

Type: Single select (or lookup/single select)
What it is: Coarse complexity level of the query’s language and detail.
Values: a “1–4” scale (e.g., short/basic vs longer scenario description).

  • 1 = Basic (few words)
  • 2 = Medium (single sentence)
  • 3 = Lengthy scenario (multiple sentences)
  • 4 = Expert (technical terms, procedural specificity)

Legal issue

Type: LIST text name, drawn from the Legal Issues Taxonomy (LIST)
What it is: The specific LIST issue term(s) assigned to the query.

Issue Parents Category

Type: Text high-level category name of specific issues
What it is: Parent/top-level category for the linked LIST issue term(s).
Common values: Housing, Family, Public Benefits, Crime and Prisons, etc.
Note: If multiple specific issues are selected, this may produce duplicates unless deduped.

LIST Code

Type: standardized code for specific LIST issues
What it is: The LIST code(s) corresponding to the specific Legal issue term(s).
Example pattern: HO-05-03-00-00

Language

Type: Single select
What it is: Language of the query text (e.g., English, Spanish).
Use: Language parity analysis and multilingual routing tests.

Real or Synthetic?

What it is: Provenance indicator.
Use: Makes governance explicit for public sharing.

Jdx Mentioned

Type: Checkbox
What it is: Whether the query explicitly mentions a jurisdiction (state, city, county, court).
Use: Helps evaluate whether systems detect and localize jurisdiction.

Entity Mentioned

Type: Text name
What it is: Whether the query names a specific organization/company/entity (e.g., “Wells Fargo,” “Comcast,” “SSA”).
Use: Useful for redaction checks, retrieval routing, and entity-aware safety policies.

Legal Term of Art?

Type: Checkbox
What it is: Whether the query uses legal jargon/terms of art (e.g., “motion to set aside,” “unlawful detainer,” “prima facie”).
Use: Measures accessibility and tests model behavior on jargon vs colloquial phrasing.

Query ID

Type: unique ID
What it is: Record identifier for referencing items in evaluation sets.

Access the Dataset

https://airtable.com/appukFbwYnTMxuibS/shr6G7PdeDw0YY1cg