feat: incorporate ECViT backbones into DINOv3 LTDETR#778
Draft
gabrielfruet wants to merge 2 commits into
Draft
Conversation
Stacks on PR #772 (gabriel-trn-2143-port-ecvit-to-lightlytrain) and wires the existing ECViTWrapper into the DINOv3 LTDETR object detection task model as four new backbones under the `edgecrafter/` package prefix: - edgecrafter/ecvitt-ltdetr - edgecrafter/ecvittplus-ltdetr - edgecrafter/ecvits-ltdetr - edgecrafter/ecvitsplus-ltdetr Design: - Single new EdgeCrafterPackage (with name="edgecrafter"); ECViT presets map to existing DINOv3 ViT-shaped encoder/decoder configs by proj_dim (ecvitt -> 192, ecvitt*/ecvits* -> 256). No new ECViT-specific configs. - DINOv3LTDETRObjectDetection dispatches on package_helpers.parse_model_name(model_name); accepts both "dinov3" and "edgecrafter" packages. - For ECViT: use_sta=False (no SpatialPriorModule); fixed patch_size=16 (ECViT-NN uses a ConvPyramidPatchEmbed); no mask_token freeze (ECViT has none); fixed _expected_input_channels=3 (no multi-channel support). - Pretrained weights load via ECVIT_PRETRAINED_URLS through the new package's get_model (mirrors DINOV3_PACKAGE._maybe_download_weights flow; no hash check on first cut, matching the DINOv3 TODO). Files: - new: src/lightly_train/_models/ecvit/ecvit_package.py - new: src/lightly_train/_task_models/dinov3_ltdetr_object_detection/ecvit_vit_wrapper.py - new: tests/_task_models/dinov3_ltdetr_object_detection/test_ecvit_vit_wrapper.py - new: docs/source/pretrain_distill/models/edgecrafter.md - edit: src/lightly_train/_models/package_helpers.py (register EDGE_CRAFTER_PACKAGE) - edit: src/lightly_train/_task_models/dinov3_ltdetr_object_detection/task_model.py (dispatch, list_model_names, parse_model_name, constructor) - edit: src/lightly_train/_task_models/dinov3_ltdetr_object_detection/train_model.py (resolve_auto ECViT branch -> patch_size=16) - edit: tests/_task_models/dinov3_ltdetr_object_detection/test_task_model.py (ECViT parametrizations) - edit: docs/source/object_detection.md (LTDETR EdgeCrafter Models subsection) - edit: docs/source/pretrain_distill/models/index.md (toctree + bullet) No COCO rows, no mAP table, no Quick Start snippet in docs (per user instruction). Multi-channel input is intentionally not supported for ECViT on this first cut. Verified locally on Python 3.13: - make format - make static-checks (mypy: "Success: no issues found in 557 source files") - focused tests (63 passed, 2 skipped; skips are pre-existing ONNX tests) - make test skipped locally per the TRN-2143 pattern; CI will run it.
Remove the three doc files added in the previous commit: - docs/source/pretrain_distill/models/edgecrafter.md (new file) - docs/source/pretrain_distill/models/index.md (toctree + bullet) - docs/source/object_detection.md (LTDETR EdgeCrafter Models subsection) These will land in a separate docs-only follow-up PR stacked on this one. The ECViTWrapper -> ECViTModelWrapper rename is also deferred to its own follow-up PR for backbone-level naming standardization.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Incorporates the EdgeCrafter ECViT backbones into the DINOv3 LTDETR object detection task model, exposing four new model names under the
edgecrafter/package prefix:edgecrafter/ecvitt-ltdetredgecrafter/ecvittplus-ltdetredgecrafter/ecvits-ltdetredgecrafter/ecvitsplus-ltdetrEach is a drop-in replacement for the DINOv3 ViT LTDETR backbones — only the
modelargument changes:Stacking
This PR stacks on #772 (
gabriel-trn-2143-port-ecvit-to-lightlytrain), which moved the ECViT backbone port intosrc/lightly_train/_models/ecvit/. #772 must be merged first. The base branch for this PR isgabriel-trn-2143-port-ecvit-to-lightlytrain, notmain.Design
DINOv3LTDETRObjectDetectiontask model to dispatch onpackage_helpers.parse_model_name(model_name). Accepts both"dinov3"and"edgecrafter"packages. No new task model class.(P3, P4, P5)at strides 8/16/32 with per-level channel counts equal toproj_dim(192 forecvitt, 256 for the other three). These match the existing DINOv3 ViT-shapedHybridEncoder/RTDETRTransformerv2/DFINETransformerconfigs byproj_dim, so the configs are reused:ecvitt→_DINOv3LTDETRObjectDetectionViTTConfig(in_channels 192)ecvittplus/ecvits/ecvitsplus→_DINOv3LTDETRObjectDetectionViTTPlusConfig(in_channels 256)use_sta=False. ECViT already fuses its own pyramid internally; noSpatialPriorModulev2is instantiated. The wrapper is a pass-through:forward(x) → ECViTWrapper(x).mask_tokenfreeze. ECViT has nomask_token; the DINOv3 ViT branch'sbackbone.mask_token.requires_grad = Falseline is skipped.patch_size=16. ECViT-NN uses aConvPyramidPatchEmbedthat only supportspatch_size=16. The train/val transform's image size andscale_jitter.divisible_byare resolved to match (multiples of 32)._expected_input_channels=3. Multi-channel input is intentionally not supported for ECViT on this first cut. The constructor ignoresbackbone_args["in_chans"]andimage_normalize-driven channel counts on theedgecrafterbranch.ECVIT_PRETRAINED_URLSthrough the newEdgeCrafterPackage.get_model, which downloads toEnv.LIGHTLY_TRAIN_MODEL_CACHE_DIRand loads intoECViTWrapper.backbonevia the existing_load_backbone_weightspath. No hash check on first cut (matches the DINOv3 TODO).Files
New (3):
src/lightly_train/_models/ecvit/ecvit_package.py—EdgeCrafterPackage(public name"edgecrafter", aliasECVIT_PACKAGE)src/lightly_train/_task_models/dinov3_ltdetr_object_detection/ecvit_vit_wrapper.py—ECViTBackboneWrapper(pass-through adapter)tests/_task_models/dinov3_ltdetr_object_detection/test_ecvit_vit_wrapper.py— parametrized forward-shape parity,patch_size=16,backbone_modelidentity, nomask_tokenEdited (4):
src/lightly_train/_models/package_helpers.py— registerEDGE_CRAFTER_PACKAGEsrc/lightly_train/_task_models/dinov3_ltdetr_object_detection/task_model.py— dispatch,list_model_names,parse_model_name, constructorsrc/lightly_train/_task_models/dinov3_ltdetr_object_detection/train_model.py—resolve_autoECViT branch →patch_size=16tests/_task_models/dinov3_ltdetr_object_detection/test_task_model.py— ECViT parametrizations (is_supported_model, parse_model_name, _create_train_model, set_train_mode, resolve_auto)No Makefile change required: the blanket
add-headerwalk stamps the Lightly header on every.pyundersrc/unless excluded by-x, and the new files are not excluded.Verification (local, Python 3.13)
make format✓make static-checks✓ — mypy: "Success: no issues found in 557 source files"uv run --frozen pytest tests/_task_models/dinov3_ltdetr_object_detection/test_ecvit_vit_wrapper.py tests/_task_models/dinov3_ltdetr_object_detection/test_task_model.py -q). The 2 skips are pre-existing ONNX tests that needonnx/onnxruntime/GPU.make testwas intentionally skipped locally per the established pattern (CI runs it).Out of scope (intentional)
edgecrafter/package page (docs/source/pretrain_distill/models/edgecrafter.md), the toctree/bullet entry indocs/source/pretrain_distill/models/index.md, and the "LTDETR EdgeCrafter (ECViT) Models" subsection indocs/source/object_detection.mdare deferred to a separate docs-only follow-up PR stacked on top of this one.ECViTWrapper→ECViTModelWrapperrename in this PR. The backbone-level naming standardization will land in a separate follow-up PR.*-ltdetr-cocorows in the docs table — we have no COCO-fine-tuned ECViT weights yet.object_detection.md(the DINOv3 snippet applies verbatim; only themodelargument changes)._HybridEncoder*/_RTDETRTransformerv2*/_DFINETransformer*configs — the existing DINOv3 ViT-shaped configs are reused._RTDETRBackboneWrapperViTT*Config) — those are STA-specific and we haveuse_sta=False.Related
gabriel-trn-2143-port-ecvit-to-lightlytrain) — moves the ECViT backbone port intosrc/lightly_train/_models/ecvit/. Must be merged first.DINOv3LTDETRObjectDetection(the class name is unchanged; ECViT is dispatched internally).Test plan for reviewer
uv run --frozen pytest tests/_task_models/dinov3_ltdetr_object_detection/ -q→ expect 63 passed, 2 skipped.uv run --frozen pytest tests/_models/ecvit/ -q→ expect existing ECViT tests to still pass (no changes toecvit.py).lightly_train.list_models()should include the four newedgecrafter/ecvit*-ltdetrnames.lightly_train.load_model("edgecrafter/ecvitt-ltdetr")should download pretrained weights and instantiate without error (requires network for the first run).