[`Mixtral` / `Awq`] Add mixtral fused modules for Awq #28240

younesbelkada · 2023-12-25T11:41:13Z

What does this PR do?

Adds Mixtral + AWQ fused modules for blazing fast text generation!

from transformers import MixtralForCausalLM, AwqConfig, AutoTokenizer

model_path = "casperhansen/mixtral-instruct-awq"

quantization_config = AwqConfig(
    do_fuse=True,
    fuse_max_seq_len=1024,
)

model = MixtralForCausalLM.from_pretrained(model_path, quantization_config=quantization_config, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_path)

tokenizer.pad_token = tokenizer.eos_token 

inputs = ["Here are the top 10 useful Hindi phrases for your upcoming trip to India:\n1. ", "Hello my name is"]

inputs = tokenizer(inputs, return_tensors="pt", padding=True).to(0)
outputs = model.generate(**inputs, max_new_tokens=100, do_sample=False)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

I introduced the same changes in modeling_utils as #28239 for a tiny issue with respect to modules_to_not_convert not being handled correctly for fused module.
Users needs autoawq>=0.1.8 to use this feature

cc @casper-hansen

…odules

younesbelkada · 2024-01-11T04:17:37Z

src/transformers/integrations/awq.py

@@ -328,6 +335,8 @@ def _fuse_awq_attention_layers(model, module, modules_to_fuse, current_module_na
            previous_device,
            modules_to_fuse["max_seq_len"],
            use_alibi=modules_to_fuse["use_alibi"],
+            # The default value in autoawq is set to 10000.0
+            rope_theta=modules_to_fuse.get("rope_theta", 10000.0),


This specifically addresses: casper-hansen/AutoAWQ#251 (comment)

Good to have the option to configure. As a general note, matching the default of another library is brittle - it can be changed without us knowing.

Yes correct, let's keep that in mind, cc @casper-hansen for visibility

amyeroberts

Thanks for adding this!

Just some questions and comments about the model specific element to this PR

src/transformers/modeling_utils.py

amyeroberts · 2024-01-11T10:45:14Z

src/transformers/integrations/awq.py

@@ -328,6 +335,8 @@ def _fuse_awq_attention_layers(model, module, modules_to_fuse, current_module_na
            previous_device,
            modules_to_fuse["max_seq_len"],
            use_alibi=modules_to_fuse["use_alibi"],
+            # The default value in autoawq is set to 10000.0
+            rope_theta=modules_to_fuse.get("rope_theta", 10000.0),


Good to have the option to configure. As a general note, matching the default of another library is brittle - it can be changed without us knowing.

amyeroberts · 2024-01-11T15:05:58Z

src/transformers/modeling_utils.py

+            # In case a user passes a `AwqConfig` with `do_fuse=True` for models that have
+            # a `modules_to_not_convert` attribute we need to manually set that attribute into the
+            # passed `quantization_config`
+            elif (


It's not obvious how this change relates to mixtral here - either from the AWQ fuse mapping or the test. Is if it's addressing a general bug we should have a test to cover it

It is something that has been addressed recently in #28239 - this covers a bug where awq + fused modules does not deal properly with fused modules + modules_to_not_convert. I think having the mixtral and llava test (the llava test is alread there) should perhaps be already sufficient as it cover most of the usecase of modules_to_not_convert + fused modules. What do you think?

amyeroberts · 2024-01-11T15:08:11Z

tests/quantization/autoawq/test_awq.py

+    def test_generation_mixtral_fused(self):
+        """
+        Text generation test for Mixtral + AWQ + fused
+        """


Do we need a model specific test here? We don't want to have to add tests for every model we cover. It would be better to have tests which cover different functional properties e.g. A, B, C. Then if any model uses A & C we know it works

This test should be generalizable to all mixtral models as the many thing to make sure that it works is on the interaction between modules_to_not_convert and fused modules for mixtral !
I can also do a smaller test with a tiny model - in addition to this one, if we know that the tiny model is correctly loaded then other models should be correctly loaded as well - wdyt? I would say in general this test is also good to have as the underlying things that it tests are

1- correct conversion of mixtral to mixtral fused modules (with modules_to_not_convert being properly set)
2- Generation correctness for mixtral + fused modules
3- Batched generation correctness for mixtral fused modules

Yes please - let's add a more general test for a tiny model to make sure the code works generally: we don't want to overfit to specifics of mixtral but also want to make sure mixtral works.

perfect, will do !

Co-authored-by: amyeroberts <[email protected]>

younesbelkada · 2024-01-12T06:03:55Z

Thanks for your review @amyeroberts ! I left few comments and open questions, let me know wdyt! 🙏

HuggingFaceDocBuilderDev · 2024-01-12T06:22:09Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

amyeroberts

Thanks for the explanations and iterating on this!

I would like to see a general test for a tiny model to be added. Happy for you to merge once that's commited :)

amyeroberts · 2024-01-12T12:04:12Z

tests/quantization/autoawq/test_awq.py

+    def test_generation_mixtral_fused(self):
+        """
+        Text generation test for Mixtral + AWQ + fused
+        """


Yes please - let's add a more general test for a tiny model to make sure the code works generally: we don't want to overfit to specifics of mixtral but also want to make sure mixtral works.

…odules

younesbelkada · 2024-01-12T13:29:14Z

Thanks @amyeroberts for all your reviews! I just added the more general test with a tiny model ! I will merge the PR and address potential comments in a follow up PR ! 🙏

younesbelkada added 5 commits December 25, 2023 12:39

add mixtral fused modules

da59bd6

add changes from modeling utils

0067e2e

add test

54815e9

Merge remote-tracking branch 'upstream/main' into add-mixtral-fused-m…

bfa41aa

…odules

fix test + rope theta issue

1ad8d77

younesbelkada commented Jan 11, 2024

View reviewed changes

younesbelkada requested a review from amyeroberts January 11, 2024 04:17

amyeroberts reviewed Jan 11, 2024

View reviewed changes

younesbelkada and others added 2 commits January 12, 2024 06:53

Update src/transformers/modeling_utils.py

2832c09

Co-authored-by: amyeroberts <[email protected]>

Merge branch 'main' into add-mixtral-fused-modules

8f47f63

younesbelkada requested a review from amyeroberts January 12, 2024 06:03

amyeroberts approved these changes Jan 12, 2024

View reviewed changes

younesbelkada added 2 commits January 12, 2024 12:32

Merge remote-tracking branch 'upstream/main' into add-mixtral-fused-m…

50d22d0

…odules

add tests

3e1bc0b

younesbelkada merged commit 266c67b into huggingface:main Jan 12, 2024

younesbelkada deleted the add-mixtral-fused-modules branch January 12, 2024 13:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`Mixtral` / `Awq`] Add mixtral fused modules for Awq #28240

[`Mixtral` / `Awq`] Add mixtral fused modules for Awq #28240

younesbelkada commented Dec 25, 2023 •

edited

Loading

younesbelkada Jan 11, 2024

amyeroberts Jan 11, 2024

younesbelkada Jan 12, 2024

amyeroberts left a comment

amyeroberts Jan 11, 2024

amyeroberts Jan 11, 2024

younesbelkada Jan 12, 2024

amyeroberts Jan 11, 2024

younesbelkada Jan 12, 2024

amyeroberts Jan 12, 2024

younesbelkada Jan 12, 2024

younesbelkada Jan 12, 2024

younesbelkada Jan 12, 2024

younesbelkada commented Jan 12, 2024

HuggingFaceDocBuilderDev commented Jan 12, 2024

amyeroberts left a comment

amyeroberts Jan 12, 2024

younesbelkada commented Jan 12, 2024

[Mixtral / Awq] Add mixtral fused modules for Awq #28240

[Mixtral / Awq] Add mixtral fused modules for Awq #28240

Conversation

younesbelkada commented Dec 25, 2023 • edited Loading

What does this PR do?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

younesbelkada commented Jan 12, 2024

HuggingFaceDocBuilderDev commented Jan 12, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

younesbelkada commented Jan 12, 2024

[`Mixtral` / `Awq`] Add mixtral fused modules for Awq #28240

[`Mixtral` / `Awq`] Add mixtral fused modules for Awq #28240

younesbelkada commented Dec 25, 2023 •

edited

Loading