remove hardcoded `device="cuda"` to support more device #2503

jikunshang · 2024-01-19T02:49:41Z

Refer to #1948 , there are a lot of code use cuda as device, especially in tensor creation, which is not friendly to add other device support. This PR aims to refactor the code to leave some interface for better and easily add new device like cpu or xpu

WoosukKwon

@jikunshang Thanks for submitting the PR! While the code looks good overall, I have two concerns:

I believe device should be automatically detected instead of being declared by users. This also aligns with our current design. While I don't know a good way to implement this is, I feel there should be a way to do it as long as the device is supported by PyTorch.
I believe device should not be an attribute of ModelConfig. Can we make a new config class like DeviceConfig?

vllm/worker/model_runner.py

WoosukKwon · 2024-01-19T06:33:02Z

vllm/worker/model_runner.py

@@ -147,18 +148,21 @@ def _prepare_prompt(
        input_tokens = _make_tensor_with_pad(input_tokens,
                                             max_prompt_len,
                                             pad=0,
-                                             dtype=torch.long)
+                                             dtype=torch.long,
+                                             device=self.device)


Can we use the _set_default_torch_device context manager here to not repeat device=self.device?

After second thoughts, I found that explicitly specifying the devices would be better as we might mix CPU and accelerator in some cases.

vllm/model_executor/model_loader.py

tests/kernels/test_prefix_prefill.py

vllm/worker/worker.py

examples/offline_inference.py

examples/offline_inference_with_prefix.py

vllm/config.py

vllm/engine/arg_utils.py

WoosukKwon · 2024-01-29T19:03:07Z

Hi @jikunshang, could you resolve the merge conflicts? The PR looks good overall.

Co-authored-by: Jiang Li <[email protected]> Co-authored-by: Kunshang Ji <[email protected]>

jikunshang · 2024-01-30T08:33:30Z

Hi @jikunshang, could you resolve the merge conflicts? The PR looks good overall.

Sure. I have resolved conflicts. All tests should have been fixed.

WoosukKwon · 2024-02-01T23:29:02Z

vllm/worker/model_runner.py

-        self.device = torch.device(torch.cuda.current_device())
+        self.device_config = (device_config
+                              if device_config is not None else DeviceConfig())
+        self.device = self.device_config.device


BTW, I intentionally avoided using torch.set_current_device, since this can affect the user code when using LLM class.

WoosukKwon

@jikunshang LGTM! Thanks for submitting the PR and sorry for the delay in my second review.

While I think vLLM still has several torch.cuda calls, I believe this is a good first step towards supporting non-CUDA devices. Thanks for the great work!

…ct#2503) Co-authored-by: Jiang Li <[email protected]> Co-authored-by: Kunshang Ji <[email protected]>

jikunshang mentioned this pull request Jan 19, 2024

Remove "cuda" and use xpu abstraction #1948

Closed

jikunshang force-pushed the main_0118 branch from 03ffa07 to dce769e Compare January 19, 2024 03:26

WoosukKwon self-requested a review January 19, 2024 05:47

WoosukKwon requested changes Jan 19, 2024

View reviewed changes

jikunshang force-pushed the main_0118 branch 2 times, most recently from 2df9f74 to 114a846 Compare January 22, 2024 03:36

jikunshang requested a review from WoosukKwon January 23, 2024 01:44

jikunshang force-pushed the main_0118 branch 3 times, most recently from 37ff8f3 to 88782e0 Compare January 26, 2024 02:20

WoosukKwon mentioned this pull request Jan 29, 2024

Support inference with transformers-neuronx #2569

Merged

jikunshang and others added 3 commits January 30, 2024 00:40

add device config and remove default device='cuda'

adb4e20

Co-authored-by: Jiang Li <[email protected]> Co-authored-by: Kunshang Ji <[email protected]>

fix lora ut

c546e89

fix ut

f1434d4

jikunshang force-pushed the main_0118 branch from 88782e0 to f1434d4 Compare January 30, 2024 08:30

liangfu approved these changes Jan 30, 2024

View reviewed changes

WoosukKwon added 7 commits February 1, 2024 18:01

Merge branch 'main' into main_0118

f7411d9

Minor

ba894e8

Minor

4bbff26

Minor

431336e

Minor fix on help msg

5fa2e0e

Minor

8085c0b

Minor

d3376fd

WoosukKwon reviewed Feb 1, 2024

View reviewed changes

WoosukKwon approved these changes Feb 1, 2024

View reviewed changes

WoosukKwon merged commit 96b6f47 into vllm-project:main Feb 1, 2024
15 of 17 checks passed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Remove hardcoded device="cuda" to support more devices (vllm-proje…

79c7e80

…ct#2503) Co-authored-by: Jiang Li <[email protected]> Co-authored-by: Kunshang Ji <[email protected]>

alexm-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request Feb 13, 2024

Remove hardcoded device="cuda" to support more devices (vllm-proje…

4115939

…ct#2503) Co-authored-by: Jiang Li <[email protected]> Co-authored-by: Kunshang Ji <[email protected]>

xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 20, 2024

Remove hardcoded device="cuda" to support more devices (vllm-proje…

f6e917f

…ct#2503) Co-authored-by: Jiang Li <[email protected]> Co-authored-by: Kunshang Ji <[email protected]>

xjpang pushed a commit to xjpang/vllm that referenced this pull request Feb 22, 2024

Remove hardcoded device="cuda" to support more devices (vllm-proje…

eb48998

…ct#2503) Co-authored-by: Jiang Li <[email protected]> Co-authored-by: Kunshang Ji <[email protected]>

andy-neuma mentioned this pull request Feb 23, 2024

andy/bump main to v0.3.2 neuralmagic/nm-vllm#49

Closed

jikunshang mentioned this pull request Feb 29, 2024

remove hard code cuda related API. #3102

Closed

xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 4, 2024

Remove hardcoded device="cuda" to support more devices (vllm-proje…

aaa98b5

…ct#2503) Co-authored-by: Jiang Li <[email protected]> Co-authored-by: Kunshang Ji <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove hardcoded `device="cuda"` to support more device #2503

remove hardcoded `device="cuda"` to support more device #2503

jikunshang commented Jan 19, 2024

WoosukKwon left a comment

WoosukKwon Jan 19, 2024

WoosukKwon Feb 1, 2024

WoosukKwon commented Jan 29, 2024

jikunshang commented Jan 30, 2024

WoosukKwon Feb 1, 2024

WoosukKwon left a comment

remove hardcoded device="cuda" to support more device #2503

remove hardcoded device="cuda" to support more device #2503

Conversation

jikunshang commented Jan 19, 2024

WoosukKwon left a comment

Choose a reason for hiding this comment

WoosukKwon Jan 19, 2024

Choose a reason for hiding this comment

WoosukKwon Feb 1, 2024

Choose a reason for hiding this comment

WoosukKwon commented Jan 29, 2024

jikunshang commented Jan 30, 2024

WoosukKwon Feb 1, 2024

Choose a reason for hiding this comment

WoosukKwon left a comment

Choose a reason for hiding this comment

remove hardcoded `device="cuda"` to support more device #2503

remove hardcoded `device="cuda"` to support more device #2503