feat: incorporate ECViT backbones into DINOv3 LTDETR by gabrielfruet · Pull Request #778 · lightly-ai/lightly-train

gabrielfruet · 2026-06-12T14:58:47Z

Summary

Incorporates the EdgeCrafter ECViT backbones into the DINOv3 LTDETR object detection task model, exposing four new model names under the edgecrafter/ package prefix:

edgecrafter/ecvitt-ltdetr
edgecrafter/ecvittplus-ltdetr
edgecrafter/ecvits-ltdetr
edgecrafter/ecvitsplus-ltdetr

Each is a drop-in replacement for the DINOv3 ViT LTDETR backbones — only the model argument changes:

lightly_train.train_object_detection(
    out="out/my_experiment",
    model="edgecrafter/ecvitt-ltdetr",  # only change vs. dinov3/*-ltdetr
    data={...},
)

Stacking

This PR stacks on #772 (gabriel-trn-2143-port-ecvit-to-lightlytrain), which moved the ECViT backbone port into src/lightly_train/_models/ecvit/. #772 must be merged first. The base branch for this PR is gabriel-trn-2143-port-ecvit-to-lightlytrain, not main.

Design

Incorporation strategy: extended the existing DINOv3LTDETRObjectDetection task model to dispatch on package_helpers.parse_model_name(model_name). Accepts both "dinov3" and "edgecrafter" packages. No new task model class.
No new encoder/decoder configs. The ECViT wrapper emits (P3, P4, P5) at strides 8/16/32 with per-level channel counts equal to proj_dim (192 for ecvitt, 256 for the other three). These match the existing DINOv3 ViT-shaped HybridEncoder / RTDETRTransformerv2 / DFINETransformer configs by proj_dim, so the configs are reused:
- ecvitt → _DINOv3LTDETRObjectDetectionViTTConfig (in_channels 192)
- ecvittplus / ecvits / ecvitsplus → _DINOv3LTDETRObjectDetectionViTTPlusConfig (in_channels 256)
use_sta=False. ECViT already fuses its own pyramid internally; no SpatialPriorModulev2 is instantiated. The wrapper is a pass-through: forward(x) → ECViTWrapper(x).
No mask_token freeze. ECViT has no mask_token; the DINOv3 ViT branch's backbone.mask_token.requires_grad = False line is skipped.
Fixed patch_size=16. ECViT-NN uses a ConvPyramidPatchEmbed that only supports patch_size=16. The train/val transform's image size and scale_jitter.divisible_by are resolved to match (multiples of 32).
Fixed _expected_input_channels=3. Multi-channel input is intentionally not supported for ECViT on this first cut. The constructor ignores backbone_args["in_chans"] and image_normalize-driven channel counts on the edgecrafter branch.
Pretrained weights. Loaded via ECVIT_PRETRAINED_URLS through the new EdgeCrafterPackage.get_model, which downloads to Env.LIGHTLY_TRAIN_MODEL_CACHE_DIR and loads into ECViTWrapper.backbone via the existing _load_backbone_weights path. No hash check on first cut (matches the DINOv3 TODO).

Files

New (3):

src/lightly_train/_models/ecvit/ecvit_package.py — EdgeCrafterPackage (public name "edgecrafter", alias ECVIT_PACKAGE)
src/lightly_train/_task_models/dinov3_ltdetr_object_detection/ecvit_vit_wrapper.py — ECViTBackboneWrapper (pass-through adapter)
tests/_task_models/dinov3_ltdetr_object_detection/test_ecvit_vit_wrapper.py — parametrized forward-shape parity, patch_size=16, backbone_model identity, no mask_token

Edited (4):

src/lightly_train/_models/package_helpers.py — register EDGE_CRAFTER_PACKAGE
src/lightly_train/_task_models/dinov3_ltdetr_object_detection/task_model.py — dispatch, list_model_names, parse_model_name, constructor
src/lightly_train/_task_models/dinov3_ltdetr_object_detection/train_model.py — resolve_auto ECViT branch → patch_size=16
tests/_task_models/dinov3_ltdetr_object_detection/test_task_model.py — ECViT parametrizations (is_supported_model, parse_model_name, _create_train_model, set_train_mode, resolve_auto)

No Makefile change required: the blanket add-header walk stamps the Lightly header on every .py under src/ unless excluded by -x, and the new files are not excluded.

Verification (local, Python 3.13)

make format ✓
make static-checks ✓ — mypy: "Success: no issues found in 557 source files"
Focused tests: 63 passed, 2 skipped in 9.67s (uv run --frozen pytest tests/_task_models/dinov3_ltdetr_object_detection/test_ecvit_vit_wrapper.py tests/_task_models/dinov3_ltdetr_object_detection/test_task_model.py -q). The 2 skips are pre-existing ONNX tests that need onnx/onnxruntime/GPU.
make test was intentionally skipped locally per the established pattern (CI runs it).

Out of scope (intentional)

No documentation changes in this PR. The edgecrafter/ package page (docs/source/pretrain_distill/models/edgecrafter.md), the toctree/bullet entry in docs/source/pretrain_distill/models/index.md, and the "LTDETR EdgeCrafter (ECViT) Models" subsection in docs/source/object_detection.md are deferred to a separate docs-only follow-up PR stacked on top of this one.
No ECViTWrapper → ECViTModelWrapper rename in this PR. The backbone-level naming standardization will land in a separate follow-up PR.
No *-ltdetr-coco rows in the docs table — we have no COCO-fine-tuned ECViT weights yet.
No mAP numbers in the docs table.
No Quick Start snippet for ECViT in object_detection.md (the DINOv3 snippet applies verbatim; only the model argument changes).
No multi-channel support for ECViT (always 3 input channels).
No new _HybridEncoder* / _RTDETRTransformerv2* / _DFINETransformer* configs — the existing DINOv3 ViT-shaped configs are reused.
No new ECViT-specific backbone-wrapper config (_RTDETRBackboneWrapperViTT*Config) — those are STA-specific and we have use_sta=False.

Parent PR: feat: add ECViT LTDETR backbone wrapper #772 (gabriel-trn-2143-port-ecvit-to-lightlytrain) — moves the ECViT backbone port into src/lightly_train/_models/ecvit/. Must be merged first.
Upstream: EdgeCrafter (capsule2077/edgecrafter).
Task model: DINOv3LTDETRObjectDetection (the class name is unchanged; ECViT is dispatched internally).

Test plan for reviewer

uv run --frozen pytest tests/_task_models/dinov3_ltdetr_object_detection/ -q → expect 63 passed, 2 skipped.
uv run --frozen pytest tests/_models/ecvit/ -q → expect existing ECViT tests to still pass (no changes to ecvit.py).
lightly_train.list_models() should include the four new edgecrafter/ecvit*-ltdetr names.
lightly_train.load_model("edgecrafter/ecvitt-ltdetr") should download pretrained weights and instantiate without error (requires network for the first run).

Stacks on PR #772 (gabriel-trn-2143-port-ecvit-to-lightlytrain) and wires the existing ECViTWrapper into the DINOv3 LTDETR object detection task model as four new backbones under the `edgecrafter/` package prefix: - edgecrafter/ecvitt-ltdetr - edgecrafter/ecvittplus-ltdetr - edgecrafter/ecvits-ltdetr - edgecrafter/ecvitsplus-ltdetr Design: - Single new EdgeCrafterPackage (with name="edgecrafter"); ECViT presets map to existing DINOv3 ViT-shaped encoder/decoder configs by proj_dim (ecvitt -> 192, ecvitt*/ecvits* -> 256). No new ECViT-specific configs. - DINOv3LTDETRObjectDetection dispatches on package_helpers.parse_model_name(model_name); accepts both "dinov3" and "edgecrafter" packages. - For ECViT: use_sta=False (no SpatialPriorModule); fixed patch_size=16 (ECViT-NN uses a ConvPyramidPatchEmbed); no mask_token freeze (ECViT has none); fixed _expected_input_channels=3 (no multi-channel support). - Pretrained weights load via ECVIT_PRETRAINED_URLS through the new package's get_model (mirrors DINOV3_PACKAGE._maybe_download_weights flow; no hash check on first cut, matching the DINOv3 TODO). Files: - new: src/lightly_train/_models/ecvit/ecvit_package.py - new: src/lightly_train/_task_models/dinov3_ltdetr_object_detection/ecvit_vit_wrapper.py - new: tests/_task_models/dinov3_ltdetr_object_detection/test_ecvit_vit_wrapper.py - new: docs/source/pretrain_distill/models/edgecrafter.md - edit: src/lightly_train/_models/package_helpers.py (register EDGE_CRAFTER_PACKAGE) - edit: src/lightly_train/_task_models/dinov3_ltdetr_object_detection/task_model.py (dispatch, list_model_names, parse_model_name, constructor) - edit: src/lightly_train/_task_models/dinov3_ltdetr_object_detection/train_model.py (resolve_auto ECViT branch -> patch_size=16) - edit: tests/_task_models/dinov3_ltdetr_object_detection/test_task_model.py (ECViT parametrizations) - edit: docs/source/object_detection.md (LTDETR EdgeCrafter Models subsection) - edit: docs/source/pretrain_distill/models/index.md (toctree + bullet) No COCO rows, no mAP table, no Quick Start snippet in docs (per user instruction). Multi-channel input is intentionally not supported for ECViT on this first cut. Verified locally on Python 3.13: - make format - make static-checks (mypy: "Success: no issues found in 557 source files") - focused tests (63 passed, 2 skipped; skips are pre-existing ONNX tests) - make test skipped locally per the TRN-2143 pattern; CI will run it.

Remove the three doc files added in the previous commit: - docs/source/pretrain_distill/models/edgecrafter.md (new file) - docs/source/pretrain_distill/models/index.md (toctree + bullet) - docs/source/object_detection.md (LTDETR EdgeCrafter Models subsection) These will land in a separate docs-only follow-up PR stacked on this one. The ECViTWrapper -> ECViTModelWrapper rename is also deferred to its own follow-up PR for backbone-level naming standardization.

gabrielfruet added 2 commits June 12, 2026 11:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: incorporate ECViT backbones into DINOv3 LTDETR#778

feat: incorporate ECViT backbones into DINOv3 LTDETR#778
gabrielfruet wants to merge 2 commits into
gabriel-trn-2143-port-ecvit-to-lightlytrainfrom
gabriel-trn-2144-add-ecvit-ltdetr-model-support

gabrielfruet commented Jun 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gabrielfruet commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Stacking

Design

Files

Verification (local, Python 3.13)

Out of scope (intentional)

Related

Test plan for reviewer

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gabrielfruet commented Jun 12, 2026 •

edited

Loading