xFormers not available → model imports OK, but infer_image returns all-zero depth map (min=0.0, max=0.0)

Hi @heyoeyo — thanks for the help earlier. Uninstalling xFormers removed the import error, but now I'm seeing the model produce an all-zero depth map.

**### Environment**
- OS: Ubuntu 22.04
- Python: 3.10
- Device: CPU only (no CUDA; torch.device -> 'cpu')
- xFormers: uninstalled (or disabled via `XFORMERS_DISABLE_MEMORY_EFFICIENT_ATTENTION=1`)
- Repo version: Depth-Anything-V2 (local copy)

**### Encoder / checkpoint**
- Encoder used: `vits` (I also tested `vitb`, `vitg` variants)
- Checkpoint: `metric_depth/checkpoints/depth_anything_v2_metric_hypersim_vits.pth` (loaded with `map_location='cpu'`)

**### Minimal repro (my code)**
```python
import cv2
import torch
from depth_anything_v2.dpt import DepthAnythingV2

DEVICE = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'

model_configs = {
    'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]},
    'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
    'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},
    'vitg': {'encoder': 'vitg', 'features': 384, 'out_channels': [1536, 1536, 1536, 1536]}
}

encoder = 'vits'
dataset = 'hypersim'
model = DepthAnythingV2(**model_configs[encoder])
checkpoint_path = f'/path/to/checkpoints/depth_anything_v2_metric_{dataset}_{encoder}.pth'
model.load_state_dict(torch.load(checkpoint_path, map_location='cpu'))
model = model.to(DEVICE).eval()

image_path = "/home/wasiq/Pictures/2m_Depth_Distance.jpeg"
raw_img = cv2.imread(image_path)
depth = model.infer_image(raw_img)
print(f"The minimum depth is {depth.min()}")
print(f"The maximum depth is {depth.max()}")

**### Observed behavior**
xFormers not available printed (twice in logs)
The minimum depth is 0.0
The maximum depth is 0.0

**### Expected behavior**
Non-constant depth map with varied values (for my test image the ground-truth near a marked point is ~2.0 m)
depth.min() < depth.max() and meaningful spatial variation

**### Diagnostics I already tried**
Confirmed checkpoint loads without crash (no obvious exceptions) but I haven't validated state_dict key names yet.
Tried different encoder names (vits, vitb) consistent with checkpoint filenames.
Ensured XFORMERS_DISABLE_MEMORY_EFFICIENT_ATTENTION=1 or uninstalled xFormers.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xFormers not available → model imports OK, but infer_image returns all-zero depth map (min=0.0, max=0.0) #312

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

xFormers not available → model imports OK, but infer_image returns all-zero depth map (min=0.0, max=0.0) #312

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions