[Core] Improve DAG API to support tensor parallel DAG #41231

rkooo567 · 2023-11-17T06:54:56Z

Why are these changes needed?

Support OutputNode
Allow to create bind from regular actor. It is needed because
- Actor needs to be reused
- Currently, you can have only 1 DAG per actor because the actor is a starting point of the DAG. This allows us to
  make a task as a starting node of the DAG instead of the actor
- This also allows to have more than one InputNode per each actor

This PR also removes the unused code

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

python/ray/dag/output_node.py

ericl · 2023-11-30T19:00:36Z

python/ray/dag/output_node.py

+IN_CONTEXT_MANAGER = "__in_context_manager__"
+
+
+class OutputNode(DAGNode):


Is it only useful when there are multiple outputs? If so, how about calling it something like MultiOutputNode?

actually I thought we want this even for a single output (since we need to allocate the buffer ahead of time). cc @stephanie-wang can we allocate the output buffer without this Output node? for an single output case?

For single-output, it's kind of a nuisance to require users to specify it. We don't really need it for compiled DAG; it's just useful to know which node is the sink. My plan for compiled DAG was to add it implicitly if the user didn't specify it.

So yes I agree that we can just call it MultiOutputNode, since users only need to use if there are multiple outputs.

Addressed

the final semantic is

MultiOutputNode([output_1, output_2, ...]) -> [output_1, output2, ...]

rkooo567 · 2023-12-05T14:52:14Z

cc @ericl @stephanie-wang. I addressed comments!

rkooo567 · 2023-12-05T14:52:35Z

Working on CI failures...

stephanie-wang

Looks good!

python/ray/dag/tests/test_accelerator_dag.py

rkooo567 · 2023-12-07T15:47:46Z

@ericl can you take a look at doc changes?

doc/source/ray-core/ray-dag.rst

angelinalg

Just some nits.

doc/source/ray-core/ray-dag.rst

matthewdeng · 2023-12-12T04:00:17Z

python/ray/actor.py

+from ray.dag.class_node import (
+    PARENT_CLASS_NODE_KEY,
+    PREV_CLASS_METHOD_CALL_KEY,
+    ClassMethodNode,
+)


For the test failure, I'm wondering if this is causing an increase in startup time?

Hmm this just adds 1ms overhead when ray.init() is called. but let me try. nothing to lose...

yeah this fixes the issue... I think it is really a bug from that test, and it should be fixed, but I will just move import here for now

SangBin Cho added 4 commits November 17, 2023 08:37

ip

3044247

basic working.

8c5efd8

enhancement

664b07a

working now.

8f6f8d2

rkooo567 assigned stephanie-wang and ericl Nov 28, 2023

ericl reviewed Nov 30, 2023

View reviewed changes

ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Nov 30, 2023

SangBin Cho added 2 commits December 5, 2023 02:50

Merge branch 'master' into dag-api

b7d2414

Address code review.

5ae2bb8

rkooo567 removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Dec 4, 2023

addressed code review and add docs

bd8836c

rkooo567 requested a review from a team as a code owner December 5, 2023 14:52

stephanie-wang approved these changes Dec 5, 2023

View reviewed changes

python/ray/dag/tests/test_accelerator_dag.py Outdated Show resolved Hide resolved

python/ray/dag/tests/test_accelerator_dag.py Outdated Show resolved Hide resolved

stephanie-wang added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Dec 5, 2023

SangBin Cho added 3 commits December 6, 2023 11:12

Merge branch 'master' into dag-api

5acbbf1

Addressed code review.

156e584

added a new test by the comment

41c8af1

rkooo567 removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Dec 7, 2023

SangBin Cho added 3 commits December 7, 2023 16:59

Fixed.

46cc877

remove

20071eb

fixed

47c90bc

rkooo567 assigned angelinalg Dec 8, 2023

angelinalg reviewed Dec 8, 2023

View reviewed changes

doc/source/ray-core/ray-dag.rst Outdated Show resolved Hide resolved

doc/source/ray-core/ray-dag.rst Outdated Show resolved Hide resolved

angelinalg approved these changes Dec 8, 2023

View reviewed changes

doc/source/ray-core/ray-dag.rst Outdated Show resolved Hide resolved

doc/source/ray-core/ray-dag.rst Outdated Show resolved Hide resolved

doc/source/ray-core/ray-dag.rst Outdated Show resolved Hide resolved

SangBin Cho added 2 commits December 8, 2023 16:22

Merge branch 'master' into dag-api

79bc9a7

Fix test failures.

d3d0acb

SangBin Cho added 9 commits December 8, 2023 22:15

test fix

1cfd746

Merge branch 'master' into dag-api

daa8bff

lint

f1cb4c8

Merge branch 'master' into dag-api

0081d80

Merge branch 'master' into dag-api

377c130

.

c6a9ea8

Fix a issue

fd9a5fb

Merge branch 'master' into dag-api

be128fd

.

9369c9f

matthewdeng reviewed Dec 12, 2023

View reviewed changes

SangBin Cho added 2 commits December 12, 2023 16:45

try fixing again.

8ce321f

done

34a992a

rkooo567 merged commit 2839644 into ray-project:master Dec 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Improve DAG API to support tensor parallel DAG #41231

[Core] Improve DAG API to support tensor parallel DAG #41231

rkooo567 commented Nov 17, 2023

ericl Nov 30, 2023

rkooo567 Dec 4, 2023

stephanie-wang Dec 4, 2023

stephanie-wang Dec 4, 2023

rkooo567 Dec 5, 2023

rkooo567 commented Dec 5, 2023

rkooo567 commented Dec 5, 2023

stephanie-wang left a comment

rkooo567 commented Dec 7, 2023

angelinalg left a comment

matthewdeng Dec 12, 2023

rkooo567 Dec 12, 2023

rkooo567 Dec 12, 2023 •

edited

Loading

		IN_CONTEXT_MANAGER = "__in_context_manager__"


		class OutputNode(DAGNode):

[Core] Improve DAG API to support tensor parallel DAG #41231

[Core] Improve DAG API to support tensor parallel DAG #41231

Conversation

rkooo567 commented Nov 17, 2023

Why are these changes needed?

Related issue number

Checks

ericl Nov 30, 2023

Choose a reason for hiding this comment

rkooo567 Dec 4, 2023

Choose a reason for hiding this comment

stephanie-wang Dec 4, 2023

Choose a reason for hiding this comment

stephanie-wang Dec 4, 2023

Choose a reason for hiding this comment

rkooo567 Dec 5, 2023

Choose a reason for hiding this comment

rkooo567 commented Dec 5, 2023

rkooo567 commented Dec 5, 2023

stephanie-wang left a comment

Choose a reason for hiding this comment

rkooo567 commented Dec 7, 2023

angelinalg left a comment

Choose a reason for hiding this comment

matthewdeng Dec 12, 2023

Choose a reason for hiding this comment

rkooo567 Dec 12, 2023

Choose a reason for hiding this comment

rkooo567 Dec 12, 2023 • edited Loading

Choose a reason for hiding this comment

rkooo567 Dec 12, 2023 •

edited

Loading