Skip to content

xFormers memory-efficient attention not supported on CPU  #310

Description

@Wasiq1123

I’m trying to run Depth-Anything-V2 with xFormers on my system (CPU only).
I get the following error:

NotImplementedError: No operator found for memory_efficient_attention_forward with inputs:
query : shape=(1, 1531, 6, 64) (torch.float32)
key : shape=(1, 1531, 6, 64) (torch.float32)
value : shape=(1, 1531, 6, 64) (torch.float32)
attn_bias : <class 'NoneType'>
p : 0.0
fa3F@2.8.3-133-gde1584b is not supported because:
device=cpu (supported: {'cuda'})
dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})

It seems that memory-efficient attention in xFormers requires CUDA.

My environment:

  • PyTorch 2.5.7 / 2.8.3
  • xFormers installed from pip
  • Running on CPU only (no GPU available)
  • Python 3.10, Ubuntu 22.04

Question:
Is there a way to run Depth-Anything-V2 on CPU without a GPU, or do I have to disable memory-efficient attention? How can I fix this error on CPU?

Below is My code

#If this file give error of importing package then run it in this directory /testing_model/depth_models/src/Depth-Anything-V2

import cv2
import torch
import sys
sys.path.append('/home/wasiq/testing_model/depth_models/src/Depth-Anything-V2')
from depth_anything_v2.dpt import DepthAnythingV2

model_configs = {
'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]},
'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]}
}

encoder = 'vitb' # or 'vits', 'vitb'
dataset = 'hypersim' # 'hypersim' for indoor model, 'vkitti' for outdoor model
max_depth = 20 # 20 for indoor model, 80 for outdoor model

model = DepthAnythingV2(**{**model_configs[encoder], 'max_depth': max_depth})

model = DepthAnythingV2(**{**model_configs[encoder]})
model.load_state_dict(torch.load(f'/home/wasiq/testing_model/depth_models/src/Depth-Anything-V2/metric_depth/checkpoints/depth_anything_v2_metric_{dataset}_{encoder}.pth', map_location='cpu'))
model.eval()

raw_img = cv2.imread('your/image/path')
depth = model.infer_image(raw_img) # HxW depth map in meters in numpy

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions