-
-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vllm development does not work for tensor-parallel > 1 #2619
Comments
nvidia device info: lroberts@GPU77B9:~/llm_quantization$ nvidia-smi
Fri Jan 26 22:08:29 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-80GB On | 00000000:07:00.0 Off | 0 |
| N/A 34C P0 74W / 400W| 62027MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-SXM4-80GB On | 00000000:0A:00.0 Off | 0 |
| N/A 31C P0 65W / 400W| 3MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA A100-SXM4-80GB On | 00000000:47:00.0 Off | 0 |
| N/A 32C P0 64W / 400W| 3MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA A100-SXM4-80GB On | 00000000:4D:00.0 Off | 0 |
| N/A 35C P0 68W / 400W| 3MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 4 NVIDIA A100-SXM4-80GB On | 00000000:87:00.0 Off | 0 |
| N/A 36C P0 68W / 400W| 3MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 5 NVIDIA A100-SXM4-80GB On | 00000000:8D:00.0 Off | 0 |
| N/A 33C P0 69W / 400W| 3MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 6 NVIDIA A100-SXM4-80GB On | 00000000:C7:00.0 Off | 0 |
| N/A 32C P0 66W / 400W| 3MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 7 NVIDIA A100-SXM4-80GB On | 00000000:CA:00.0 Off | 0 |
| N/A 35C P0 67W / 400W| 3MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 3727132 C python3.10 62014MiB |
+---------------------------------------------------------------------------------------+ |
also meet |
Looks like this is fixed in Ray 2.9 ray-project/ray#41913 (comment). Try upgrading Ray? We will make sure to lower bound the Ray version as well. |
I have tried
|
That seems to be a different issue, please open another ticket and I can try reproducing it. Update, it did try to reproduce it. With the latest main branch:
|
as a work around I think you could pass the flag |
I believe #2642 might fix "resource already mapped", please try with latest main, sorry about the back and forth. |
@simon-mo This works for me: some stdout:
also works for tensor-parallel 8 (all the ones on this machine) specs on A100 machine I'm using:
python env: lroberts@GPU77B9:~/update-vllm-env/vllm-source/vllm$ python -c "import vllm, ray, torch, pydantic; print(vllm.__version__); print(ray.__version__); print(torch.__version__); print(pydantic.__version__)"
/usr/lib/python3/dist-packages/requests/__init__.py:87: RequestsDependencyWarning: urllib3 (2.1.0) or chardet (5.2.0) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
0.2.7
2.9.1
2.1.2+cu121
2.6.0
lroberts@GPU77B9:~/update-vllm-env/vllm-source/vllm$ git log -n 2
commit ea8489fce266d69f2fbe314c1385956b1a342e12 (HEAD -> main, origin/main, origin/HEAD)
Author: Rasmus Larsen <[email protected]>
Date: Mon Jan 29 19:52:31 2024 +0100
ROCm: Allow setting compilation target (#2581)
commit 1b20639a43e811f4469e3cfa543cf280d0d76265
Author: Hanzhi Zhou <[email protected]>
Date: Tue Jan 30 02:46:29 2024 +0800
No repeated IPC open (#2642) I don't think the inplace error -> #2620 is resolved though. I still see that one. |
I have a local dev build on commit
and I have some local code that is a thin wrapper around LLM class
If i run this with
tensor-parallel == 2
I get the following:however,
tensor-parallel == 1
works fine:with response:
the error in logs from ray indicates some serialization
relevant details about env:
It seems there a known fix or workaround here -> ray-project/ray#41913 (comment)
but it seems that pydantic version 2 is necessary for openai testing
vllm/requirements.txt
Line 11 in 3a0e1fc
is there a suggested workaround or should I manually downgrade pydantic to version lower than 2.0.0?
The text was updated successfully, but these errors were encountered: