MESA Validate

A Streamlit application for human-in-the-loop validation of LLM-extracted structured outputs against Pydantic schemas within the MESA (Medical-concept Extraction with Schema Alignment) framework.

Repo Structure

presto-validate/
├── Home.py                   # Entry point
├── pages/
│   ├── 1_Sessions.py         # Session management
│   ├── 2_Validate.py         # Validation interface
│   └── 3_Analysis.py         # View results and metrics
├── utils/
│   ├── models.py             # Defining data models
│   ├── session_manager.py    # 'CRUD' functions
│   ├── schema_loader.py      # Schema config loading
│   ├── schema_inspector.py   # Schema introspection
│   ├── predictions_loader.py # File handling
│   ├── validation_ui.py      # UI generation
│   ├── metrics.py            # Calculating metrics
│   └── styles.py             # CSS styles
├── sessions/                 # Session data
├── predictions/              # JSON files in subfolders
├── schemas.yaml              # configuration schemas here

Installation

Install dependencies, including schema(s):
```
pip install -r requirements.txt
```

Configure schemas in schemas.yaml:

schemas:
  - module: your_schema_module_name
    root_class: YourRootClassName

Add prediction files to subdirectory in predictions/. For additional sample predictions run git submodule update --init.

Launch

streamlit run Home.py

Default URL: http://localhost:8501

Prediction File Format

The preferred format uses document-level fields. Source content and inference output should be provided in separate aggregate JSON or JSONL files, with records sharing a document_id.

[
  {
    "document_id": "doc-1",
    "document_content": "The original document text..."
  }
]

{"document_id": "doc-1", "document_inference": {"field1": "value1", "field2": {}}}

document_content: Original document text (string)
document_inference: LLM extraction result (object matching your Pydantic schema; JSON-encoded strings are also accepted)
document_id: Used to join split content and inference records

Legacy per-document files with content and output are still supported and are normalised internally.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
pages		pages
predictions		predictions
sessions/deed4d0f-47a0-4ecc-8bfe-06d86e25cf86		sessions/deed4d0f-47a0-4ecc-8bfe-06d86e25cf86
tests		tests
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
GUIDE.md		GUIDE.md
Home.py		Home.py
README.md		README.md
aic_logo.png		aic_logo.png
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
schemas.yaml		schemas.yaml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MESA Validate

Repo Structure

Installation

Launch

Prediction File Format

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

MESA Validate

Repo Structure

Installation

Launch

Prediction File Format

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages