Intel Arc Graphics Thread #476
Replies: 65 comments 175 replies
-
UPDATE:
Before Intel update it, the code needs to be modified like this to avoid reporting errors: - match operation:
- case "multiply":
- output[top:bottom, left:right] = destination_portion * source_portion
- case "add":
- output[top:bottom, left:right] = destination_portion + source_portion
- case "subtract":
- output[top:bottom, left:right] = destination_portion - source_portion
+ if operation == "multiply":
+ output[top:bottom, left:right] = destination_portion * source_portion
+ elif operation == "add":
+ output[top:bottom, left:right] = destination_portion + source_portion
+ elif operation == "subtract":
+ output[top:bottom, left:right] = destination_portion - source_portion Of course not modifying it doesn't seem to affect normal usage. |
Beta Was this translation helpful? Give feedback.
-
About Karras scheduler: Currently using it causes a -996 "uses-fp64-math" error, which may be fixed in the next IPEX release (intel/intel-extension-for-pytorch#285) Until then, can specify to use # comfy/samplers.py
- sigmas = k_diffusion_sampling.get_sigmas_karras(n=steps, sigma_min=self.sigma_min, sigma_max=self.sigma_max, device=self.device)
+ sigmas = k_diffusion_sampling.get_sigmas_karras(n=steps, sigma_min=self.sigma_min, sigma_max=self.sigma_max, device="cpu") |
Beta Was this translation helpful? Give feedback.
-
is it Linux only or can you use this on a win10 machine? |
Beta Was this translation helpful? Give feedback.
-
So again, sharing the good news. An XPU version of their Pytorch extension with Pytorch 2.0 support, Hats off to ComfyUI for being the only Stable Diffusion UI to be able to do it at the moment but there are a bunch of caveats with running Arc and Stable Diffusion right now from the research I have done. As of the time of posting: 1.) Setup can still be complicated in some respects and can randomly not work because Intel is only really verifying for enterprise Linux distributions and Ubuntu. My base Linux Fedora 38 install has the RPMs installed but I get a cryptic error regarding the runtime saying 2.) From what I have read at intel/intel-extension-for-pytorch#398, there is no native Windows version for some wheels to get 3.) Intel Arc right now has a huge problem with allocating more than 4GB of VRAM in IPEX even though the card has more VRAM in the case of the A750/A770 according to intel/intel-extension-for-pytorch#325. This seems to have mitigations elsewhere in Intel's oneAPI stack like their OpenCL compute runtime where anything you compile, you can work your code to go around the restriction via some flags and changed code according to intel/compute-runtime#627. No workarounds, it seems, until someone gets Intel to fix it. 4.) Because of that, experimenting with other Stable Diffusion UIs, it seems like every once in a while, Arc will occasionally run out of VRAM if you decide to not use lower VRAM flags and throw a 5.) Intel has an equivalent to
I'm not even sure if I'm sure that Intel GPU support will not a big priority at the moment for the project, given the big blocker at this point is the 4GB VRAM limit which needs to be fixed by Intel. But that being said, things should be working a lot better than it is at the moment, maybe not average user ready, but should be ready for any mildly technical person. I really want to play with ComfyUI more but I really don't want to restart the application server for every several image I might want to generate even or rolling a dice for SDXL to actually finish a workflow. But the Arc cards are strong. I managed to equal the Nvidia RTX 3070 Ti in certain Stable Diffusion workflows with my 16 GB Intel Arc A770 so I look forward to the future when things are more mature and all the stars are aligned. Also not sure what issues need to be open here but there is potentially 2-3 of them that could be made from my report. Edit: Added a caveat I forgot to mention and filled in some information and fixed some typos. |
Beta Was this translation helpful? Give feedback.
-
So this took me a bit of time, but I have the Docker image I used published here so hopefully someone can find it useful. Some things to note I've found while poking and experimenting with things. 1.) A lot of the issues are gone mentioned before by @kwaa months ago like the Karras scheduler not working, where it is working now even with the new dpmpp_3 schedulers, or noise issues if not using split attention which is gone for the most part unless your graphics driver has crashed too many times and a restart of the computer fixes that. Anyways, hope people have good success with it like I did. I might try and see why ipexrun is failing but for now, I am going to take a break. Edit: Added in a caveat I forgot to mention and fixed some typos. |
Beta Was this translation helpful? Give feedback.
-
In the event anyone else missed it: I seem to be missing the file that gets sourced from /opt/intel, too... may edit this post once I figure it out. |
Beta Was this translation helpful? Give feedback.
-
Well, short story. After spend whole day to install Arch Linux on Windows 10 WSL (2 in my case). After all procedures, I stuck on
The installer reach a 87% and starts roll back all changes, because intel-oneapi-basekit not contain library libtbb.so.12 that needed to install oneAPI AI Analytics Toolkit v2023. The packgage that contains needed library is intel-oneapi-compiler-shared-runtime but it conflict with intel-oneapi-basekit. Because the istall procedure can't finish with intel-oneapi-basekit, I install intel-oneapi-compiler-shared-runtime. ComfyUI runs: [{user}@pc ArchLinux]$
But if i try to generate image I get an next error:
As can see, I tries params like --force-fp16 --bf16-vae --lowvram --use-split-cross-attention --highvram but it's have no effect. So, models are loaded, but KSampler are crashes. And that's my dead end, because I have no idea what's going on. Not in ComfyUI itself, and also there more in Linux Arch. But I suppose this can be a kind of report. |
Beta Was this translation helpful? Give feedback.
-
I don't use Windows, but all the pieces should be together to run ComfyUI on Windows now without any horrible downsides like no AOT compilation or missing packages since a new unofficial Intel Extension for Python package has been released without needing any installation of external dependencies and bundling it all together. It remains to be seen whether there will ever be an official package that does this so this package is the best chance anyone has at actually using Arc on Windows natively. According to reports, it is a bit faster than WSL2 but slower than Linux native. The rough steps to do this should be roughly the same as the process outlined in the opening post for Linux minus platform-specific things. 1.) Make sure you install an Intel driver that is 4952 or newer. The latest driver can be found here python -m ensurepip --upgrade 3.) Install git from here using the GUI installer or other means. cd <location where to place ComfyUI>
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI 5.) Download non-official Intel Extension for Pytorch packages here and install the packages with pip. Assuming they are all put in the root of the ComfyUI repository, run this command line. pip install intel_extension_for_pytorch-2.0.110+git0f2597b-cp310-cp310-win_amd64.whl torch-2.0.0a0+gite9ebda2-cp310-cp310-win_amd64.whl torchvision-0.15.2a0+fa99a53-cp310-cp310-win_amd64.whl Again, I will need to remind you this is unofficial and does have a degree of risk but it should be mostly safe. 6.) Finish the rest of the installation pip install -r requirements.txt At this point, you should be done with the installation. To run ComfyUI each time from scratch, open a terminal/command prompt. Then run the following command lines replacing <> with your own input: cd <location of ComfyUI>
python main.py <Any extra ComfyUI arguments you want to use> |
Beta Was this translation helpful? Give feedback.
-
Reinstall in forced mode, needed dll in place, but still get this error. Can it be relevant to folder rights or upper or lower letter case in "user"? |
Beta Was this translation helpful? Give feedback.
-
There really needs to be a better, more updated tutorial for this. Much of the information is stretched out over the thread and mixed up with information between Linux and Windows |
Beta Was this translation helpful? Give feedback.
-
To install on Ubuntu. 1.) Install Linux drivers with the following instructions provided by Intel here sudo apt install python3-pip git 3.) Install ComfyUI with the following terminal commands replacing the <> portion with a selection of your choice. cd <Location to put ComfyUI>
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI 4.) Install all the Intel Extension for Pytorch pip Python packages first with this terminal command. python -m pip install torch==2.0.1a0 torchvision==0.15.2a0 intel-extension-for-pytorch==2.0.110+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ 5.) Finish the rest of the dependency installation with this terminal command: pip install -r requirements.txt Installation should be done at this point. To run ComfyUI, one can type in the following terminal commands replacing <> with your own input: cd <location of ComfyUI>
python main.py <Any extra ComfyUI arguments you want to use> For other Linux distros outside of Arch Linux or Ubuntu, one will need to manually install the Intel compute runtime, git, a Python with a version supported by Intel Extension for Pytorch like 3.10 as of the time of this writing or using Intel's Python from the AI Kit which will provide that, pip, and any other dependencies required according to your own Linux distribution's package manager or install scripts but one should be able to then just follow step 3 and onwards without any issue afterwards. |
Beta Was this translation helpful? Give feedback.
-
I've updated the Arch Linux setup guide for the latest IPEX and ComfyUI, without oneAPI AI Kit. If anyone wants to uninstall the previously installed AI Kit, it's available: cd /opt/intel/oneapi/installer
sudo ./installer --action remove --product-id intel.oneapi.lin.aikit.product --product-ver 2023.1.0-31760
# if you don't need
paru -Rsc libxcrypt-compat |
Beta Was this translation helpful? Give feedback.
-
New Patch!Thanks to vladmandic/automatic for the code. contributors Basically, you just need to copy Then modify try:
import intel_extension_for_pytorch as ipex
if torch.xpu.is_available():
xpu_available = True
+ from attention import attention_init
+ ok, e = attention_init()
except:
pass It could partially fix intel/intel-extension-for-pytorch#325. (so f**k you, intel) |
Beta Was this translation helpful? Give feedback.
-
Cool... Got it working with WSL2 using Arch. Followed the instructions at the top, as well as installed jemalloc paru(root/native environment), openmp via pip(venv). There are issues with the env vars as Arch is pulling my windows env vars through. So I some times have to re run the setvars.sh. I get warnings about libpng when I run via python or ipexrun. Everything still works though. I generally run through ipexrun but add xpu to the command i.e.
Apart from some memory overruns I've had no issues the last few days. |
Beta Was this translation helpful? Give feedback.
-
anybody can tell where intel store their patches for pytorch? |
Beta Was this translation helpful? Give feedback.
-
For me it varies....I have 16G card and 32G of RAM. Once the model is
loaded it runs from 3-6 minutes for 512x512. I haven't done much with 1024.
…On Thu, Dec 19, 2024 at 3:20 PM rifux ***@***.***> wrote:
Can I ask what are the speeds for SDXL 1024x1024 on your card? Just
curious.
—
Reply to this email directly, view it on GitHub
<#476 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BNYG66EPZXH5HJQSW74G5IL2GM2B3AVCNFSM6AAAAAAWZ5LOZKVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCNRSGIYTONY>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
does not work for me for some reason...
|
Beta Was this translation helpful? Give feedback.
-
any current fix for 4GB VRAM error? Windows native, no WSL or LINUX |
Beta Was this translation helpful? Give feedback.
-
What about Battleimage GPUs support? |
Beta Was this translation helpful? Give feedback.
-
Is there anything I can do to make FLUX models work more effecitiently with an A770/16 GB? So far I only managed to run a Q3 checkpoint at a reasonable speed, anything bigger than that fills up all the VRAM and works very slowly. |
Beta Was this translation helpful? Give feedback.
-
When using B580+comfyui, I encountered a message saying "IPEX - INFO - Currently split master weight for XPU only supports SGD". It seems that the XPU only supports the SGD optimizer. How should I use the SGD optimizer in comfyui? |
Beta Was this translation helpful? Give feedback.
-
So I decided to try in a metal installed Ubuntu environment because of this same issue in WSL. miniconda3/envs/comfy/lib/python3.12/site-packages/torch/xpu/init.py:60: UserWarning: XPU device count is zero! (Triggered internally at /pytorch/c10/xpu/XPUFunctions.cpp:60.) I tried ipex 2.31, ipex 2.5.1, pytorch for xpu test (2.6), pytorch nightly (2.7), all of them kept giving me the same error. finally found the issue, it was a miniconda issue. The problem is version of libstdc++.so that is installed by miniconda in linux. The FIX: System versions: Miniconda installs ibstdc++.so.6.0.29, which make pytorch and ipex unable to find XPU. If it is 6.0.29 in conda, UPDATE IT!!! Run this in base env and any conda env that you are using torch on intel xpu: Guessing this is a linux only conda issue because i didn't come accross this problem in windows. ** editted conda version find command due to typo |
Beta Was this translation helpful? Give feedback.
-
I need help. Everything worked fine until today, I think something got updated when starting ComfyUI and the generation is now done on CPU (very slowly). I'm using it on Windows, installed according to this guide: #476 (comment) |
Beta Was this translation helpful? Give feedback.
-
Hi Simon I just wanted to circle back on this one...see if you'd had time
to upstream the code you mentioned. Thanks much!
…On Mon, Dec 16, 2024 at 4:01 PM Simon Lui ***@***.***> wrote:
You need to use ONEAPI_DEVICE_SELECTOR environment variable. I believe
you need ONEAPI_DEVICE_SELECTOR=gpu:0 in your specific scenario. See the
full details for usage here:
https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#oneapi_device_selector
I actually coded something for this to include for ComfyUI similar to
--cuda-device for CUDA that would make using this easier for something
else but I forgot about this until this point which should make this
easier. I will clean it up and upstream it when I have time soon.
—
Reply to this email directly, view it on GitHub
<#476 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BNYG66EX6YTGFI4VQYQXYZD2F5ES7AVCNFSM6AAAAAAWZ5LOZKVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCNJYGYZTIMA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
So i got hunyuan to run and complete today. and this is the workflow i ran: videos_00001.mp4I'm running nightly build pytorch implementation |
Beta Was this translation helpful? Give feedback.
-
不知道有没有帮助,关于4G错误, 我使用很久没有更新的SD1.5模型,在大约5121440出图就会报错, 我使用的是, 没有修改model_management.py |
Beta Was this translation helpful? Give feedback.
-
Prompt executed in 602.83 seconds videos_00001.mp4 |
Beta Was this translation helpful? Give feedback.
-
running gguf q4_0 peak memory usage dropped by 25-30% to approx 30GB but generation time went up slightly videos_00002.mp4 |
Beta Was this translation helpful? Give feedback.
-
I have tested SDXL T2I on B580 and A770.
The performance of B580 is quite impressive, it is almost fast as RTX3090 (or RTX4070 I guess), |
Beta Was this translation helpful? Give feedback.
-
More test on B580. FLUX1 Dev.I could run Flux on B580 with
It is faster than A770, and RTX4060Ti. However, I couldn't use GGUF quants. WAN2.1I could run I2V 480P, and I2V 720P with smaller frames. Based on ComfyUI Example I2V 480P workflow,
Yes, fp8_e4m3fn was slower. Based on ComfyUI Example I2V 720P workflow, I had to change frame to 41.
I couldn't use kijai's WAN Wrapper becuase it's codes are based on CUDA. I am attaching the result. (I conveted to MP4 from WEBP format so there might be loss) I2V_480P.mp4I2V_720P_44frames.mp4 |
Beta Was this translation helpful? Give feedback.
-
ComfyUI now supports Intel Arc Graphics. (#409)
Since the installation tutorial for Intel Arc Graphics is quite long, I'll write it here first.
Intel Extension for PyTorch is currently only available for Linux, so you will need to have a Linux or WSL environment.
Arch Linux (with
paru
) are used here as example operating systems.Install Python and PIP:
Install Intel Compute Runtime and Intel oneAPI Base Kit:
Install ComfyUI:
Install Dependencies (via
venv
):python -m venv venv source venv/bin/activate pip install torch==2.0.1a0 torchvision==0.15.2a0 intel-extension-for-pytorch==2.0.120+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl-aitools/ pip install -r requirements.txt
For the second start and beyond, venv needs to be reactivated:
source venv/bin/activate
Set oneAPI vars:
source /opt/intel/oneapi/setvars.sh
Running ComfyUI (via
python
):Running ComfyUI (via
ipexrun
):Beta Was this translation helpful? Give feedback.
All reactions