Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistently getting noise as output with Intel Arc #556

Closed
bvhari opened this issue Apr 23, 2023 · 17 comments
Closed

Consistently getting noise as output with Intel Arc #556

bvhari opened this issue Apr 23, 2023 · 17 comments

Comments

@bvhari
Copy link
Contributor

bvhari commented Apr 23, 2023

I set up ComfyUI following the tutorial for Intel Arc. However I am consistently getting noise as output.
System spec: Windows 10 WSL, Ubuntu 22.04.2 LTS, Python 3.10, Arc A770

@bvhari bvhari changed the title Consistently getting noise as output with Intel Arc Graphics Consistently getting noise as output with Intel Arc Apr 23, 2023
@comfyanonymous
Copy link
Owner

are you getting this with all samplers/schedulers?

@kwaa
Copy link
Contributor

kwaa commented Apr 24, 2023

I think this is more likely to be an upstream (Intel Extension for PyTorch, Intel Compute Runtime, etc...) issue.
anyway, did you start with the --use-split-cross-attention argument?

in general, lower resolutions (e.g. 512*512 or lower) have a lower probability of noise.

Possibly related: intel/intel-extension-for-pytorch#325

@bvhari
Copy link
Contributor Author

bvhari commented Apr 24, 2023

@comfyanonymous Yes, I tried multiple combinations.
@kwaa Quite possible as the system started slowing down after 3-4 back to back gens. I had to quit Comfy and restart the GPU driver to fix this.
I am already testing at 512x512, with --use-split-cross-attention active.
I tried with a 2 GB model, same result.

@WASasquatch
Copy link
Contributor

WASasquatch commented Apr 24, 2023

I think this is more likely to be an upstream (Intel Extension for PyTorch, Intel Compute Runtime, etc...) issue. anyway, did you start with the --use-split-cross-attention argument?

in general, lower resolutions (e.g. 512*512 or lower) have a lower probability of noise.

Possibly related: intel/intel-extension-for-pytorch#325

Anything under 512x512, you'll actually get latent noise artifacting and strange results if it's not img2img. I think this is cause of the internal upscaling used from source 64x64 (for 1.4/1.5)

@NoAvailableAlias
Copy link

Can confirm, finally got xpu acceleration working via the guide from this merge:
#409
But I'm only getting 3.4 it/s (half of what other people are getting in webui oneapi fork), it also only outputs noise...
Thanks Intel

@kwaa
Copy link
Contributor

kwaa commented Apr 25, 2023

But I'm only getting 3.4 it/s (half of what other people are getting in webui oneapi fork)

I didn't add ipex.optimize, if I do it will report an error
(or maybe I didn't find the right place to add it)
IPEX v2.0.0+xpu may solve this problem, but what is more expected is ipexrun xpu

@kwaa
Copy link
Contributor

kwaa commented Apr 25, 2023

Anything under 512x512, you'll actually get latent noise artifacting and strange results if it's not img2img. I think this is cause of the internal upscaling used from source 64x64 (for 1.4/1.5)

Between the strange results and the pure noise/black images, I think I have to choose the former...

So Intel, F--k You!

Note: The last time I used ComfyUI the maximum size was about 768*704, exceeding that would produce a black image.

@bvhari
Copy link
Contributor Author

bvhari commented Apr 28, 2023

Finally got it working using the wheel files from this tutorial: https://github.com/TotalDay/Intel_ARC_GPU_WSL_Stable_Diffusion_WEBUI
It is basically combining the oneapi branch from this fork: https://github.com/jbaboval/stable-diffusion-webui
with some presumably custom built wheels
Interestingly, ipex.optimize is enabled in the fork, and the fork is working when I tried it.
So I tried enabling ipex.optimize in Comfy as well. Unfortunately, I am only getting around half of the performance compared to the A1111 fork. However, Karras schedule is working.
Hopefully the devs here can figure out the reason behind the performance discrepancy.

@kwaa
Copy link
Contributor

kwaa commented Apr 28, 2023

So I tried enabling ipex.optimize in Comfy as well. Unfortunately, I am only getting around half of the performance compared to the A1111 fork.

Where did you add ipex.optimize?

However, Karras schedule is working.

It looks like TotalDay/Intel_ARC_GPU_WSL_Stable_Diffusion_WEBUI provides IPEX that has not been released yet, so this is in line with expectations.

Btw, are you now able to generate high resolution images?

@bvhari
Copy link
Contributor Author

bvhari commented Apr 28, 2023

Where did you add ipex.optimize?

comfy/model_management.py
In the function load_model_gpu
I added this to the global imports at the beginning
global xpu_available
Then, after the line
real_model.to(get_torch_device())
I added

if xpu_available:
    ipex.optimize(real_model, inplace=True)

YMMV on the improvement from ipex.optimize though.

Btw, are you now able to generate high resolution images?

Yes, but only upto 1024x1024
Beyond that, the driver crashes or the output is noise.
I can go upto 1280x1280 in A1111 DirectML fork
Maybe I should try the ComfyUI DirectML fork. I might be able to hit 1536x1536 since Comfy has Tiled VAE.

@kwaa
Copy link
Contributor

kwaa commented May 1, 2023

IPEX has released v1.13.120+xpu (why not v2.0.0?), I'll see what I can do.

@simonlui
Copy link
Contributor

So good news, months later. Intel finally released an XPU version of their Pytorch extension with Pytorch 2.0 support, v2.0.110+xpu and it solves the noise issue and you can get something out without much issue... as long as you are generating one image and other caevats. I'll write up a post later in the discussion thread related to this. But the base issue should be solved.

Screenshot from 2023-08-13 18-53-08

@BA8F0D39
Copy link

@simonlui
IPEX v2.0.110+xpu solves the black images and weird noises.
However, generating an image larger than 512x768 makes it all black, even though all the VRAM isn't used

@simonlui
Copy link
Contributor

@BA8F0D39 Yeah, I've hit that but I am pretty sure though that isn't a ComfyUI issue, it's an Intel issue specifically with how they are handling allocation on GPU because of wanting to preserve their stateful addressing model. I'm currently digging into their stack and have seen the bug reports you made on that regarding how 4GB is the max you can allocate. But remember, that is the limit for 1 single allocation. I will probably open an issue or two and update those issues so keep an eye out on that. In the meantime, you can try and get the program to chunk its allocations into smaller units so it doesn't hit the limit and uses VRAM better. Using FP16 where possible and using some memory saving nodes in your workflow like using the testing nodes with Tiled VAE Encode/Decode and TomePatchModel nodes helps. ComfyUI's latest change as of barely an hour ago will also help with text encoder weights being able to be stored in FP16. With that, I am able to use SD1.5 and generate 768x768 and latent upscale to 1024x1024 without hitting any image corruption or blackout issues.

image

@BrosnanYuen
Copy link

@simonlui
I looked through the IPEX code base and it doesn't seem to directly allocate arrays
I think memory allocation happens on OneDNN and it is disabled by default.
They deleted "-cl-intel-greater-than-4GB-buffer-required" in OneDNN which enables arrays larger than 4GB

oneapi-src/oneDNN@42a1895#diff-21a382a12fc4d58cceb2ab97c73746f53439a1f739f1573ccdd6060ea62949e1

I think we can try to enable 4GB and greater allocation using
intel/compute-runtime#627

@simonlui
Copy link
Contributor

simonlui commented Aug 23, 2023

@BrosnanYuen Both IPEX and oneDNN use SYCL for allocation. See here and here respectively. That's not the main issue here though. If you read https://github.com/intel/compute-runtime/blob/master/programmers-guide/ALLOCATIONS_GREATER_THAN_4GB.md, you will realize that there are two requirements to making >4GB allocations happen. You need build flags, true, but you also need to pass a flag or struct through with the allocation function call to make it happen. The document only specifies Level Zero and OpenCL. IPEX and OneDNN both use SYCL instead and that's the problem as there is no provision to do this same thing in SYCL. I've opened an enhancement report in Intel's LLVM to try and propagate something to get through this limitation but it is going to take a long time if Intel even considers it. Again, what can be done at this time is mitigation so IPEX doesn't allocate more than 4GB for any single allocation and that will allow for more VRAM to be used using the strategies mentioned but it is unavoidable hitting that limit for big images, batches and more complex workflows which limits what the GPU can do at this time. The other thing that can be done is ComfyUI actually splitting allocations into 4GB chunks if possible but I think it is untenable to actually ask any projects that would use IPEX to do something to mitigate something that shouldn't be their problem and implement what is essentially manual memory management in Python.

@bvhari
Copy link
Contributor Author

bvhari commented Aug 8, 2024

Closing as this is long fixed

@bvhari bvhari closed this as completed Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants