Any good way to get this running on RTX 20-series cards? #29

realisticdreamer114514 · 2025-03-07T11:30:09Z

I'm using these flags python gradio_server.py --i2v --profile 5 --attention xformers --precision fp16 --server-name 127.0.0.1 --open-browser
But still it'd return this error:

NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
     query       : shape=(1, 49140, 24, 128) (torch.float32)
     key         : shape=(1, 49140, 24, 128) (torch.float32)
     value       : shape=(1, 49140, 24, 128) (torch.float32)
     attn_bias   : <class 'xformers.ops.fmha.attn_bias.BlockDiagonalPaddedKeysMask'>
     p           : 0.0
`[email protected]` is not supported because:
    requires device with capability > (9, 0) but your GPU has capability (7, 5) (too old)
    dtype=torch.float32 (supported: {torch.float16, torch.bfloat16})
    operator wasn't built - see `python -m xformers.info` for more info
`[email protected]` is not supported because:
    requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
    dtype=torch.float32 (supported: {torch.float16, torch.bfloat16})
`cutlassF-pt` is not supported because:
    attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalPaddedKeysMask'>

It's clearly old GPU issues, can you suggest configs to use or add code to support older GPUs? Do I have to bring this to the Hyvideo team's repo if it's out of the scope of your optimizations?

The text was updated successfully, but these errors were encountered:

MarxMelencio · 2025-03-08T16:50:07Z

I used sdpa instead of xformers. Didn't compile. Didn't enable on-the-fly quantization. I used profile 2. Got it running with my lone 2080 Ti (11Gb). My machine has an i9 13th gen CPU and 128Gb RAM. I use a Pytorch NGC v24.05 Docker container in an Ubuntu 22.04 headless server with Pytorch 2.6, Python 3.10, and CUDA 12.4. I ran into GPU memory issues, so simplest way to get it running is smaller resolution (I just upscale later through another Deep Learning Computer Vision model) -- I used the model's scaling factor (4:3 / 3:4). I implemented this logic into /gradio_server.py (smaller resolutions aren't in the hardcoded options). And well, at 512x288, one 2080 Ti (11Gb) and 128Gb RAM can generate 33 frames at a time -- One video takes ~12 mins at my end.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any good way to get this running on RTX 20-series cards? #29

Any good way to get this running on RTX 20-series cards? #29

realisticdreamer114514 commented Mar 7, 2025

MarxMelencio commented Mar 8, 2025 •

edited

Loading

Any good way to get this running on RTX 20-series cards? #29

Any good way to get this running on RTX 20-series cards? #29

Comments

realisticdreamer114514 commented Mar 7, 2025

MarxMelencio commented Mar 8, 2025 • edited Loading

MarxMelencio commented Mar 8, 2025 •

edited

Loading