Skip to content

Finalized example and slight fixes#1129

Open
guanedwa wants to merge 3 commits intosunlabuiuc:masterfrom
guanedwa:DSA-dataset-and-task
Open

Finalized example and slight fixes#1129
guanedwa wants to merge 3 commits intosunlabuiuc:masterfrom
guanedwa:DSA-dataset-and-task

Conversation

@guanedwa
Copy link
Copy Markdown

I don't have your name or NetID — fill those in where indicated, then this is ready to submit.


[Dataset + Task] UCI Daily and Sports Activities (DSA) — DPAM Replication

Contributor: <Edward Guan (edwardg2)> — <edwardg2@illinois.edu>

Type: Dataset + Task (reproducibility contribution)

Paper: Zhang et al. "Daily Physical Activity Monitoring — Adaptive Learning from Multi-source Motion Sensor Data." CHIL 2024, PMLR 248:39-54. https://proceedings.mlr.press/v248/zhang24a.html


Description

This PR adds PyHealth support for the UCI Daily and Sports Activities (DSA) dataset and implements the Inter-domain Pairwise Distance (IPD) transfer learning framework from Zhang et al. (CHIL 2024). The dataset contains motion sensor recordings of 19 activities from 8 subjects across 5 simultaneously worn body-part sensors (torso, right arm, left arm, right leg, left leg), each producing 9-channel time series at 25 Hz in 5-second windows.

The implementation provides two classification tasks — binary one-vs-rest (replicating the paper's protocol) and 19-class multiclass (our extension) — along with the full IPD computation pipeline for cross-domain transfer weighting. The contribution replicates the author's released code rather than the paper description where the two diverge; all discrepancies are documented in module-level docstrings.

An ablation study in the example notebook investigates how target domain selection, input scaling, and activity difficulty each affect LSTM classification performance, finding that the paper's transfer gains are sensitive to the balanced evaluation protocol and do not generalise to natural class distributions or the multiclass setting.



File Guide

Core Implementation

  • pyhealth/datasets/dsa.pyDSADataset class: download, verification, indexing, time series loading, subject-split utilities
  • pyhealth/tasks/dsa.py — both task classes + full IPD pipeline + ExperimentConfig / ExperimentResult
  • pyhealth/datasets/configs/dsa.yaml — BaseDataset YAML config

Tests

  • tests/core/test_dsa_dataset.py — synthetic-data tests: metadata indexing, patient/event structure, time series loading, scaling, subject splits, error handling
  • tests/core/test_dsa_tasks.py — synthetic-data tests: both task classes, full IPD pipeline, config/result dataclasses

Documentation

  • docs/api/datasets/pyhealth.datasets.DSADataset.rst
  • docs/api/tasks/pyhealth.tasks.DSAActivityClassification.rst
  • docs/api/tasks/pyhealth.tasks.DSABinaryActivityClassification.rst
  • docs/api/datasets.rst — updated table of contents
  • docs/api/tasks.rst — updated table of contents

Example / Ablation

  • examples/dsa_activity_classification.ipynb — end-to-end replication of No Transfer / Naive Transfer / Weighted Transfer conditions with ablations across distance metrics and epoch-weighting strategies

Dependency

  • pyproject.toml — added optional [dsa] dependency group for pyts>=0.12.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant