Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Int8 pipeline parallelism #1482

Open
psinger opened this issue Jan 22, 2025 · 2 comments
Open

Int8 pipeline parallelism #1482

psinger opened this issue Jan 22, 2025 · 2 comments
Labels
High Risk Risk of bugs in transformers and other libraries medium priority (will be worked on after all high priority issues)

Comments

@psinger
Copy link

psinger commented Jan 22, 2025

I am trying to work with cuda streams for pipeline parallelism, i.e. executing different parts of a model at the same time on different gpus.
And with int4, float16, bfloat16 everything seems to work as expected.

However, with int8 there appears to be something blocking, and gpus execute sequentially.
As int4 works, I am wondering if anyone knows if there is some blocking operation in int8.

Thanks!

@TimDettmers
Copy link
Collaborator

For int4 you are also using bitsandbytes code or is this only for int8? There are some operations on bitsandbytes that forces the cuda device before c-calls because this sometimes introduced bugs. It might be that this is causing your problems.

This behavior was changed in 0.45. Can you check your bitsandbytes version and see if you still have this problem with the newer version?

@TimDettmers TimDettmers added High Risk Risk of bugs in transformers and other libraries medium priority (will be worked on after all high priority issues) labels Feb 28, 2025
@matthewdouglas
Copy link
Member

One further thing to note is that int8 has a host-device synchronization that is forced when decomposing the problem into separate int8 and fp16 matmuls. Using threshold=0.0 should avoid that, and will be faster in general, at the potential cost of accuracy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
High Risk Risk of bugs in transformers and other libraries medium priority (will be worked on after all high priority issues)
Projects
None yet
Development

No branches or pull requests

3 participants