Flux - torchao inference not working

### Describe the bug

1. Flux with torchao int8wo not working
2. enable_sequential_cpu_offload not working

![image](https://github.com/user-attachments/assets/fea33615-1d54-4f35-89d7-61917bdcf62c)


### Reproduction

example taken from (merged)
https://github.com/huggingface/diffusers/pull/10009

```
import torch
from diffusers import FluxPipeline, FluxTransformer2DModel, TorchAoConfig

model_id = "black-forest-labs/Flux.1-Dev"
dtype = torch.bfloat16

quantization_config = TorchAoConfig("int8wo")
transformer = FluxTransformer2DModel.from_pretrained(
    model_id,
    subfolder="transformer",
    quantization_config=quantization_config,
    torch_dtype=dtype,
)
pipe = FluxPipeline.from_pretrained(
    model_id,
    transformer=transformer,
    torch_dtype=dtype,
)
# pipe.to("cuda")

# pipe.enable_sequential_cpu_offload()
pipe.vae.enable_tiling()

prompt = "A cat holding a sign that says hello world"
image = pipe(prompt, num_inference_steps=4, guidance_scale=0.0).images[0]
image.save("output.png")
```

### Logs

```shell
Stuck at this (without cpu offload)

(venv) C:\ai1\diffuser_t2i>python FLUX_torchao.py
Fetching 3 files: 100%|█████████████████████████████████████████████████████| 3/3 [00:00<?, ?it/s]
Loading pipeline components...:  29%|████████▊                      | 2/7 [00:00<00:00,  5.36it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading checkpoint shards: 100%|████████████████████████████████████| 2/2 [00:00<00:00,  6.86it/s]
Loading pipeline components...: 100%|███████████████████████████████| 7/7 [00:02<00:00,  2.38it/s]
```

(with cpu offload)

```
(venv) C:\ai1\diffuser_t2i>python FLUX_torchao.py
Fetching 3 files: 100%|█████████████████████████████████████████████████████| 3/3 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|████████████████████████████████████| 2/2 [00:00<00:00,  6.98it/s]
Loading pipeline components...:  29%|████████▊                      | 2/7 [00:00<00:01,  2.62it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100%|███████████████████████████████| 7/7 [00:01<00:00,  4.31it/s]
Traceback (most recent call last):
  File "C:\ai1\diffuser_t2i\FLUX_torchao.py", line 21, in <module>
    pipe.enable_sequential_cpu_offload()
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 1179, in enable_sequential_cpu_offload
    cpu_offload(model, device, offload_buffers=offload_buffers)
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\big_modeling.py", line 205, in cpu_offload
    attach_align_device_hook(
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 518, in attach_align_device_hook
    attach_align_device_hook(
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 518, in attach_align_device_hook
    attach_align_device_hook(
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 518, in attach_align_device_hook
    attach_align_device_hook(
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 509, in attach_align_device_hook
    add_hook_to_module(module, hook, append=True)
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 161, in add_hook_to_module
    module = hook.init_hook(module)
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 308, in init_hook
    set_module_tensor_to_device(module, name, "meta")
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\utils\modeling.py", line 355, in set_module_tensor_to_device
    new_value.layout_tensor,
AttributeError: 'AffineQuantizedTensor' object has no attribute 'layout_tensor'

```

### System Info

Windows 11

```
(venv) C:\ai1\diffuser_t2i>python --version
Python 3.10.11

(venv) C:\ai1\diffuser_t2i>echo %CUDA_PATH%
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6
```
```

(venv) C:\ai1\diffuser_t2i>pip list
Package            Version
------------------ ------------
accelerate         1.2.0.dev0
aiofiles           23.2.1
annotated-types    0.7.0
anyio              4.7.0
bitsandbytes       0.45.0
certifi            2024.12.14
charset-normalizer 3.4.1
click              8.1.8
colorama           0.4.6
diffusers          0.33.0.dev0
einops             0.8.0
exceptiongroup     1.2.2
fastapi            0.115.6
ffmpy              0.5.0
filelock           3.16.1
fsspec             2024.12.0
gguf               0.13.0
gradio             5.9.1
gradio_client      1.5.2
h11                0.14.0
httpcore           1.0.7
httpx              0.28.1
huggingface-hub    0.25.2
idna               3.10
imageio            2.36.1
imageio-ffmpeg     0.5.1
importlib_metadata 8.5.0
Jinja2             3.1.5
markdown-it-py     3.0.0
MarkupSafe         2.1.5
mdurl              0.1.2
mpmath             1.3.0
networkx           3.4.2
ninja              1.11.1.3
numpy              2.2.1
opencv-python      4.10.0.84
orjson             3.10.13
packaging          24.2
pandas             2.2.3
pillow             11.1.0
pip                23.0.1
protobuf           5.29.2
psutil             6.1.1
pydantic           2.10.4
pydantic_core      2.27.2
pydub              0.25.1
Pygments           2.18.0
python-dateutil    2.9.0.post0
python-multipart   0.0.20
pytz               2024.2
PyYAML             6.0.2
regex              2024.11.6
requests           2.32.3
rich               13.9.4
ruff               0.8.6
safehttpx          0.1.6
safetensors        0.5.0
semantic-version   2.10.0
sentencepiece      0.2.0
setuptools         65.5.0
shellingham        1.5.4
six                1.17.0
sniffio            1.3.1
starlette          0.41.3
sympy              1.13.1
tokenizers         0.21.0
tomlkit            0.13.2
torch              2.5.1+cu124
torchao            0.7.0
torchvision        0.20.1+cu124
tqdm               4.67.1
transformers       4.47.1
typer              0.15.1
typing_extensions  4.12.2
tzdata             2024.2
urllib3            2.3.0
uvicorn            0.34.0
websockets         14.1
wheel              0.45.1
zipp               3.21.0
```

### Who can help?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Flux - torchao inference not working #10470

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Flux - torchao inference not working #10470

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions