-
-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Using pytest --forked
when CUDA is not generally fork-safe
#3557
Comments
pytest --forked
for tests that use CUDApytest --forked
for tests re-initializing CUDA contexts
pytest --forked
for tests re-initializing CUDA contextspytest --forked
when CUDA is not generally fork-safe
Yeah +1 fixing this issue. @sangstar are you familiar with how --forked works? When does it actually fork a new process for running a test? |
For a forked process to use CUDA, its parent process must not have invoked CUDA before, which can be a bit tricky to manage, as this can occur during module imports and no test setups in |
Hmm so I guess there are some ideas
So basically 1 is the only viable solution? I also wonder how other ML projects handle these issues. Do you happen to know? |
Cleanup fixtures would be a good option to explore. Forking avoids needing cleanup fixtures but it will cause cryptic errors any time a CUDA context is initiated before forking and invoking a test that uses CUDA. Without forking and with a cleanup fixture, each unit test can initiate a CUDA context and then clean it up before the next test. |
It is in progress @sangstar and the result seems pretty promising (all model tests pass except 5 of them) |
Your current environment
The environment vLLM uses to CI test a PR.
🐛 Describe the bug
Multiple tests for my pull request are failing due to
pytest --forked
trying to fork new child processes during testing, trying to re-initialize the CUDA context. I'm not sure how other PRs are passing these tests, but I don't know whypytest --forked
is explicitly being used when CUDA isn't fork-safe.This is fine as long as
conftest.py
doesn't ever initialize CUDA before forking, but this may be difficult to maintain and may be a cause of some sneaky bugs, like from #3487The text was updated successfully, but these errors were encountered: