Skip to content

Add MedLingo dataset + jargon expansion task (with tests, docs, example)#1111

Open
anuragraychowdhury wants to merge 2 commits intosunlabuiuc:masterfrom
anuragraychowdhury:medlingo-dataset-task
Open

Add MedLingo dataset + jargon expansion task (with tests, docs, example)#1111
anuragraychowdhury wants to merge 2 commits intosunlabuiuc:masterfrom
anuragraychowdhury:medlingo-dataset-task

Conversation

@anuragraychowdhury
Copy link
Copy Markdown

Summary

  • Add MedLingoDataset implementation + config YAML
  • Add MedLingoJargonExpansionTask for prompt/label construction (supports both shot modes)
  • Add fast synthetic unit tests for dataset + task (tmp dirs, no network)
  • Add API docs stubs and index toctree entries
  • Add an example script demonstrating a default synthetic smoke run and an optional TransformersModel forward pass behind an env flag

Test plan
PYTHONPATH="$(pwd)" pytest -q tests/test_medlingo_dataset.py tests/test_medlingo_jargon_expansion_task.py
PYTHONPATH="$(pwd)" python examples/medlingo_medlingo_jargon_expansion_transformersmodel.py

Notes / scope vs paper
This implementation treats the task as multiclass classification over answer strings; the paper’s MedLingo evaluation is open-ended generation (often judged with an LLM), so metrics are not expected to match that protocol out of the box.

Original Paper: https://arxiv.org/abs/2505.15024
examples/medlingo_medlingo_jargon_expansion_transformersmodel.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant