Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any good way to get this running on RTX 20-series cards? #29

Open
realisticdreamer114514 opened this issue Mar 7, 2025 · 1 comment

Comments

@realisticdreamer114514
Copy link

I'm using these flags python gradio_server.py --i2v --profile 5 --attention xformers --precision fp16 --server-name 127.0.0.1 --open-browser
But still it'd return this error:

NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
     query       : shape=(1, 49140, 24, 128) (torch.float32)
     key         : shape=(1, 49140, 24, 128) (torch.float32)
     value       : shape=(1, 49140, 24, 128) (torch.float32)
     attn_bias   : <class 'xformers.ops.fmha.attn_bias.BlockDiagonalPaddedKeysMask'>
     p           : 0.0
`[email protected]` is not supported because:
    requires device with capability > (9, 0) but your GPU has capability (7, 5) (too old)
    dtype=torch.float32 (supported: {torch.float16, torch.bfloat16})
    operator wasn't built - see `python -m xformers.info` for more info
`[email protected]` is not supported because:
    requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
    dtype=torch.float32 (supported: {torch.float16, torch.bfloat16})
`cutlassF-pt` is not supported because:
    attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalPaddedKeysMask'>

It's clearly old GPU issues, can you suggest configs to use or add code to support older GPUs? Do I have to bring this to the Hyvideo team's repo if it's out of the scope of your optimizations?

@MarxMelencio
Copy link

MarxMelencio commented Mar 8, 2025

I used sdpa instead of xformers. Didn't compile. Didn't enable on-the-fly quantization. I used profile 2. Got it running with my lone 2080 Ti (11Gb). My machine has an i9 13th gen CPU and 128Gb RAM. I use a Pytorch NGC v24.05 Docker container in an Ubuntu 22.04 headless server with Pytorch 2.6, Python 3.10, and CUDA 12.4. I ran into GPU memory issues, so simplest way to get it running is smaller resolution (I just upscale later through another Deep Learning Computer Vision model) -- I used the model's scaling factor (4:3 / 3:4). I implemented this logic into /gradio_server.py (smaller resolutions aren't in the hardcoded options). And well, at 512x288, one 2080 Ti (11Gb) and 128Gb RAM can generate 33 frames at a time -- One video takes ~12 mins at my end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants