Skip to content

feat: incorporate ECViT backbones into DINOv3 LTDETR#778

Draft
gabrielfruet wants to merge 2 commits into
gabriel-trn-2143-port-ecvit-to-lightlytrainfrom
gabriel-trn-2144-add-ecvit-ltdetr-model-support
Draft

feat: incorporate ECViT backbones into DINOv3 LTDETR#778
gabrielfruet wants to merge 2 commits into
gabriel-trn-2143-port-ecvit-to-lightlytrainfrom
gabriel-trn-2144-add-ecvit-ltdetr-model-support

Conversation

@gabrielfruet

@gabrielfruet gabrielfruet commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Incorporates the EdgeCrafter ECViT backbones into the DINOv3 LTDETR object detection task model, exposing four new model names under the edgecrafter/ package prefix:

  • edgecrafter/ecvitt-ltdetr
  • edgecrafter/ecvittplus-ltdetr
  • edgecrafter/ecvits-ltdetr
  • edgecrafter/ecvitsplus-ltdetr

Each is a drop-in replacement for the DINOv3 ViT LTDETR backbones — only the model argument changes:

lightly_train.train_object_detection(
    out="out/my_experiment",
    model="edgecrafter/ecvitt-ltdetr",  # only change vs. dinov3/*-ltdetr
    data={...},
)

Stacking

This PR stacks on #772 (gabriel-trn-2143-port-ecvit-to-lightlytrain), which moved the ECViT backbone port into src/lightly_train/_models/ecvit/. #772 must be merged first. The base branch for this PR is gabriel-trn-2143-port-ecvit-to-lightlytrain, not main.

Design

  • Incorporation strategy: extended the existing DINOv3LTDETRObjectDetection task model to dispatch on package_helpers.parse_model_name(model_name). Accepts both "dinov3" and "edgecrafter" packages. No new task model class.
  • No new encoder/decoder configs. The ECViT wrapper emits (P3, P4, P5) at strides 8/16/32 with per-level channel counts equal to proj_dim (192 for ecvitt, 256 for the other three). These match the existing DINOv3 ViT-shaped HybridEncoder / RTDETRTransformerv2 / DFINETransformer configs by proj_dim, so the configs are reused:
    • ecvitt_DINOv3LTDETRObjectDetectionViTTConfig (in_channels 192)
    • ecvittplus / ecvits / ecvitsplus_DINOv3LTDETRObjectDetectionViTTPlusConfig (in_channels 256)
  • use_sta=False. ECViT already fuses its own pyramid internally; no SpatialPriorModulev2 is instantiated. The wrapper is a pass-through: forward(x) → ECViTWrapper(x).
  • No mask_token freeze. ECViT has no mask_token; the DINOv3 ViT branch's backbone.mask_token.requires_grad = False line is skipped.
  • Fixed patch_size=16. ECViT-NN uses a ConvPyramidPatchEmbed that only supports patch_size=16. The train/val transform's image size and scale_jitter.divisible_by are resolved to match (multiples of 32).
  • Fixed _expected_input_channels=3. Multi-channel input is intentionally not supported for ECViT on this first cut. The constructor ignores backbone_args["in_chans"] and image_normalize-driven channel counts on the edgecrafter branch.
  • Pretrained weights. Loaded via ECVIT_PRETRAINED_URLS through the new EdgeCrafterPackage.get_model, which downloads to Env.LIGHTLY_TRAIN_MODEL_CACHE_DIR and loads into ECViTWrapper.backbone via the existing _load_backbone_weights path. No hash check on first cut (matches the DINOv3 TODO).

Files

New (3):

  • src/lightly_train/_models/ecvit/ecvit_package.pyEdgeCrafterPackage (public name "edgecrafter", alias ECVIT_PACKAGE)
  • src/lightly_train/_task_models/dinov3_ltdetr_object_detection/ecvit_vit_wrapper.pyECViTBackboneWrapper (pass-through adapter)
  • tests/_task_models/dinov3_ltdetr_object_detection/test_ecvit_vit_wrapper.py — parametrized forward-shape parity, patch_size=16, backbone_model identity, no mask_token

Edited (4):

  • src/lightly_train/_models/package_helpers.py — register EDGE_CRAFTER_PACKAGE
  • src/lightly_train/_task_models/dinov3_ltdetr_object_detection/task_model.py — dispatch, list_model_names, parse_model_name, constructor
  • src/lightly_train/_task_models/dinov3_ltdetr_object_detection/train_model.pyresolve_auto ECViT branch → patch_size=16
  • tests/_task_models/dinov3_ltdetr_object_detection/test_task_model.py — ECViT parametrizations (is_supported_model, parse_model_name, _create_train_model, set_train_mode, resolve_auto)

No Makefile change required: the blanket add-header walk stamps the Lightly header on every .py under src/ unless excluded by -x, and the new files are not excluded.

Verification (local, Python 3.13)

  • make format
  • make static-checks ✓ — mypy: "Success: no issues found in 557 source files"
  • Focused tests: 63 passed, 2 skipped in 9.67s (uv run --frozen pytest tests/_task_models/dinov3_ltdetr_object_detection/test_ecvit_vit_wrapper.py tests/_task_models/dinov3_ltdetr_object_detection/test_task_model.py -q). The 2 skips are pre-existing ONNX tests that need onnx/onnxruntime/GPU.
  • make test was intentionally skipped locally per the established pattern (CI runs it).

Out of scope (intentional)

  • No documentation changes in this PR. The edgecrafter/ package page (docs/source/pretrain_distill/models/edgecrafter.md), the toctree/bullet entry in docs/source/pretrain_distill/models/index.md, and the "LTDETR EdgeCrafter (ECViT) Models" subsection in docs/source/object_detection.md are deferred to a separate docs-only follow-up PR stacked on top of this one.
  • No ECViTWrapperECViTModelWrapper rename in this PR. The backbone-level naming standardization will land in a separate follow-up PR.
  • No *-ltdetr-coco rows in the docs table — we have no COCO-fine-tuned ECViT weights yet.
  • No mAP numbers in the docs table.
  • No Quick Start snippet for ECViT in object_detection.md (the DINOv3 snippet applies verbatim; only the model argument changes).
  • No multi-channel support for ECViT (always 3 input channels).
  • No new _HybridEncoder* / _RTDETRTransformerv2* / _DFINETransformer* configs — the existing DINOv3 ViT-shaped configs are reused.
  • No new ECViT-specific backbone-wrapper config (_RTDETRBackboneWrapperViTT*Config) — those are STA-specific and we have use_sta=False.

Related

Test plan for reviewer

  1. uv run --frozen pytest tests/_task_models/dinov3_ltdetr_object_detection/ -q → expect 63 passed, 2 skipped.
  2. uv run --frozen pytest tests/_models/ecvit/ -q → expect existing ECViT tests to still pass (no changes to ecvit.py).
  3. lightly_train.list_models() should include the four new edgecrafter/ecvit*-ltdetr names.
  4. lightly_train.load_model("edgecrafter/ecvitt-ltdetr") should download pretrained weights and instantiate without error (requires network for the first run).

Stacks on PR #772 (gabriel-trn-2143-port-ecvit-to-lightlytrain) and wires the
existing ECViTWrapper into the DINOv3 LTDETR object detection task model as
four new backbones under the `edgecrafter/` package prefix:

  - edgecrafter/ecvitt-ltdetr
  - edgecrafter/ecvittplus-ltdetr
  - edgecrafter/ecvits-ltdetr
  - edgecrafter/ecvitsplus-ltdetr

Design:
- Single new EdgeCrafterPackage (with name="edgecrafter"); ECViT presets map
  to existing DINOv3 ViT-shaped encoder/decoder configs by proj_dim
  (ecvitt -> 192, ecvitt*/ecvits* -> 256). No new ECViT-specific configs.
- DINOv3LTDETRObjectDetection dispatches on
  package_helpers.parse_model_name(model_name); accepts both "dinov3" and
  "edgecrafter" packages.
- For ECViT: use_sta=False (no SpatialPriorModule); fixed patch_size=16
  (ECViT-NN uses a ConvPyramidPatchEmbed); no mask_token freeze (ECViT has
  none); fixed _expected_input_channels=3 (no multi-channel support).
- Pretrained weights load via ECVIT_PRETRAINED_URLS through the new package's
  get_model (mirrors DINOV3_PACKAGE._maybe_download_weights flow; no hash
  check on first cut, matching the DINOv3 TODO).

Files:
- new: src/lightly_train/_models/ecvit/ecvit_package.py
- new: src/lightly_train/_task_models/dinov3_ltdetr_object_detection/ecvit_vit_wrapper.py
- new: tests/_task_models/dinov3_ltdetr_object_detection/test_ecvit_vit_wrapper.py
- new: docs/source/pretrain_distill/models/edgecrafter.md
- edit: src/lightly_train/_models/package_helpers.py (register EDGE_CRAFTER_PACKAGE)
- edit: src/lightly_train/_task_models/dinov3_ltdetr_object_detection/task_model.py
  (dispatch, list_model_names, parse_model_name, constructor)
- edit: src/lightly_train/_task_models/dinov3_ltdetr_object_detection/train_model.py
  (resolve_auto ECViT branch -> patch_size=16)
- edit: tests/_task_models/dinov3_ltdetr_object_detection/test_task_model.py
  (ECViT parametrizations)
- edit: docs/source/object_detection.md (LTDETR EdgeCrafter Models subsection)
- edit: docs/source/pretrain_distill/models/index.md (toctree + bullet)

No COCO rows, no mAP table, no Quick Start snippet in docs (per user
instruction). Multi-channel input is intentionally not supported for ECViT
on this first cut.

Verified locally on Python 3.13:
- make format
- make static-checks (mypy: "Success: no issues found in 557 source files")
- focused tests (63 passed, 2 skipped; skips are pre-existing ONNX tests)
- make test skipped locally per the TRN-2143 pattern; CI will run it.
Remove the three doc files added in the previous commit:

- docs/source/pretrain_distill/models/edgecrafter.md (new file)

- docs/source/pretrain_distill/models/index.md (toctree + bullet)

- docs/source/object_detection.md (LTDETR EdgeCrafter Models subsection)

These will land in a separate docs-only follow-up PR stacked on this one.

The ECViTWrapper -> ECViTModelWrapper rename is also deferred to its own

follow-up PR for backbone-level naming standardization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant