The [Plain Language Assistant](https://plainlanguage.joename.com) is a RAG-powered writing tool built on U.S. Department of Labor plain language guidelines.
# About the Plain Language Assistant
The Plain Language Assistant is a writing tool that helps transform dense regulatory and legal text into clear, accessible language — without losing legally required meaning. It's built on the U.S. Department of Labor's plain language guidelines and resources from plainlanguage.gov.
The system is powered by a four-stage Retrieval-Augmented Generation (RAG) pipeline backed by a curated knowledge base of plain language resources, regulatory source material, and real-world simplification examples.
---
## The Knowledge Base
The application draws on a relational database of over 18,000 curated records spanning four collections:
- **Terminology** — A glossary of government and legal jargon mapped to plain language alternatives, sourced from DOL, CMS, and plainlanguage.gov resources.
- **Patterns** — Decomposed plain language writing techniques covering word choice, sentence structure, document organization, and audience focus, with before-and-after examples.
- **Source Material** — Regulatory text from unemployment insurance handbooks and federal notice templates, structured by chapter, section, and regulatory topic.
- **Exemplars** — Thousands of quality-scored complex-to-simple text pairs that demonstrate effective simplification across a range of complexity levels.
Every record was processed through a multi-stage preparation pipeline that cleaned, classified, and enriched the raw source data before embedding it into a vector database for semantic retrieval.
---
## How It Works: Four-Stage RAG
When you submit text, the system runs a four-stage pipeline:
1. **Embed** — Your input text is converted into a semantic vector using an embedding model.
2. **Retrieve** — The vector is used to search across all four knowledge collections in parallel, pulling the most relevant terminology, techniques, regulatory context, and simplification examples.
3. **Assemble** — Retrieved results are combined into a structured context package tailored to the task.
4. **Generate** — An LLM produces the final output using the assembled context, grounded in real plain language guidelines and examples rather than generic training data alone.
This approach means every rewrite and recommendation is informed by actual DOL regulatory text, established plain language techniques, and proven simplification patterns — not just the LLM's general knowledge.
---
## Features
### Simplify
Paste complex government or legal text and receive a plain language rewrite. The system identifies jargon, retrieves relevant writing techniques, finds similar simplification examples for few-shot guidance, and generates a rewrite that preserves all legally required meaning. You also get a list of key changes and the reasoning behind each one.
### Analyze
Submit a paragraph or document for readability analysis. The system scores your text using the Flesch-Kincaid readability formula, breaks it into chunks with per-chunk complexity scores, identifies jargon throughout, and recommends applicable plain language techniques. You get a before-and-after grade level comparison showing exactly where the difficult passages are.
### Comply
Check a draft notice or form against regulatory plain language requirements. Optionally filter by one of 11 regulatory topics (eligibility, benefits, appeals, coverage, and more). The system retrieves relevant regulatory source material, compares your draft against it, and flags gaps with specific suggestions for improvement.
### Glossary
Look up any term or phrase. The system performs a semantic search across the full terminology collection — you don't need an exact match. Results include the formal definition, a plain language alternative, and the regulatory domain each term belongs to.
---
## Technology
The application is a lightweight, single-binary web server with a minimal frontend — no JavaScript frameworks and no build step. On the backend, it connects to a vector database for fast semantic search, a relational database for deep record lookups, and an LLM API for embedding and text generation.