-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Mixtral
/ Awq
] Add mixtral fused modules for Awq
#28240
[Mixtral
/ Awq
] Add mixtral fused modules for Awq
#28240
Conversation
@@ -328,6 +335,8 @@ def _fuse_awq_attention_layers(model, module, modules_to_fuse, current_module_na | |||
previous_device, | |||
modules_to_fuse["max_seq_len"], | |||
use_alibi=modules_to_fuse["use_alibi"], | |||
# The default value in autoawq is set to 10000.0 | |||
rope_theta=modules_to_fuse.get("rope_theta", 10000.0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This specifically addresses: casper-hansen/AutoAWQ#251 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to have the option to configure. As a general note, matching the default of another library is brittle - it can be changed without us knowing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes correct, let's keep that in mind, cc @casper-hansen for visibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this!
Just some questions and comments about the model specific element to this PR
@@ -328,6 +335,8 @@ def _fuse_awq_attention_layers(model, module, modules_to_fuse, current_module_na | |||
previous_device, | |||
modules_to_fuse["max_seq_len"], | |||
use_alibi=modules_to_fuse["use_alibi"], | |||
# The default value in autoawq is set to 10000.0 | |||
rope_theta=modules_to_fuse.get("rope_theta", 10000.0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to have the option to configure. As a general note, matching the default of another library is brittle - it can be changed without us knowing.
# In case a user passes a `AwqConfig` with `do_fuse=True` for models that have | ||
# a `modules_to_not_convert` attribute we need to manually set that attribute into the | ||
# passed `quantization_config` | ||
elif ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious how this change relates to mixtral here - either from the AWQ fuse mapping or the test. Is if it's addressing a general bug we should have a test to cover it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is something that has been addressed recently in #28239 - this covers a bug where awq + fused modules does not deal properly with fused modules + modules_to_not_convert
. I think having the mixtral and llava test (the llava test is alread there) should perhaps be already sufficient as it cover most of the usecase of modules_to_not_convert
+ fused modules. What do you think?
def test_generation_mixtral_fused(self): | ||
""" | ||
Text generation test for Mixtral + AWQ + fused | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a model specific test here? We don't want to have to add tests for every model we cover. It would be better to have tests which cover different functional properties e.g. A, B, C. Then if any model uses A & C we know it works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test should be generalizable to all mixtral
models as the many thing to make sure that it works is on the interaction between modules_to_not_convert
and fused modules for mixtral
!
I can also do a smaller test with a tiny model - in addition to this one, if we know that the tiny model is correctly loaded then other models should be correctly loaded as well - wdyt? I would say in general this test is also good to have as the underlying things that it tests are
1- correct conversion of mixtral to mixtral fused modules (with modules_to_not_convert
being properly set)
2- Generation correctness for mixtral + fused modules
3- Batched generation correctness for mixtral fused modules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please - let's add a more general test for a tiny model to make sure the code works generally: we don't want to overfit to specifics of mixtral but also want to make sure mixtral works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perfect, will do !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: amyeroberts <[email protected]>
Thanks for your review @amyeroberts ! I left few comments and open questions, let me know wdyt! 🙏 |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanations and iterating on this!
I would like to see a general test for a tiny model to be added. Happy for you to merge once that's commited :)
def test_generation_mixtral_fused(self): | ||
""" | ||
Text generation test for Mixtral + AWQ + fused | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please - let's add a more general test for a tiny model to make sure the code works generally: we don't want to overfit to specifics of mixtral but also want to make sure mixtral works.
Thanks @amyeroberts for all your reviews! I just added the more general test with a tiny model ! I will merge the PR and address potential comments in a follow up PR ! 🙏 |
What does this PR do?
Adds Mixtral + AWQ fused modules for blazing fast text generation!
I introduced the same changes in modeling_utils as #28239 for a tiny issue with respect to
modules_to_not_convert
not being handled correctly for fused module.Users needs autoawq>=0.1.8 to use this feature
cc @casper-hansen