Skip to content

xFormers not available → model imports OK, but infer_image returns all-zero depth map (min=0.0, max=0.0) #312

Description

@Wasiq1123

Hi @heyoeyo — thanks for the help earlier. Uninstalling xFormers removed the import error, but now I'm seeing the model produce an all-zero depth map.

### Environment

  • OS: Ubuntu 22.04
  • Python: 3.10
  • Device: CPU only (no CUDA; torch.device -> 'cpu')
  • xFormers: uninstalled (or disabled via XFORMERS_DISABLE_MEMORY_EFFICIENT_ATTENTION=1)
  • Repo version: Depth-Anything-V2 (local copy)

### Encoder / checkpoint

  • Encoder used: vits (I also tested vitb, vitg variants)
  • Checkpoint: metric_depth/checkpoints/depth_anything_v2_metric_hypersim_vits.pth (loaded with map_location='cpu')

### Minimal repro (my code)

import cv2
import torch
from depth_anything_v2.dpt import DepthAnythingV2

DEVICE = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'

model_configs = {
    'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]},
    'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
    'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},
    'vitg': {'encoder': 'vitg', 'features': 384, 'out_channels': [1536, 1536, 1536, 1536]}
}

encoder = 'vits'
dataset = 'hypersim'
model = DepthAnythingV2(**model_configs[encoder])
checkpoint_path = f'/path/to/checkpoints/depth_anything_v2_metric_{dataset}_{encoder}.pth'
model.load_state_dict(torch.load(checkpoint_path, map_location='cpu'))
model = model.to(DEVICE).eval()

image_path = "/home/wasiq/Pictures/2m_Depth_Distance.jpeg"
raw_img = cv2.imread(image_path)
depth = model.infer_image(raw_img)
print(f"The minimum depth is {depth.min()}")
print(f"The maximum depth is {depth.max()}")

**### Observed behavior**
xFormers not available printed (twice in logs)
The minimum depth is 0.0
The maximum depth is 0.0

**### Expected behavior**
Non-constant depth map with varied values (for my test image the ground-truth near a marked point is ~2.0 m)
depth.min() < depth.max() and meaningful spatial variation

**### Diagnostics I already tried**
Confirmed checkpoint loads without crash (no obvious exceptions) but I haven't validated state_dict key names yet.
Tried different encoder names (vits, vitb) consistent with checkpoint filenames.
Ensured XFORMERS_DISABLE_MEMORY_EFFICIENT_ATTENTION=1 or uninstalled xFormers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions