-
Notifications
You must be signed in to change notification settings - Fork 802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fuse instruction #7399
Fuse instruction #7399
Conversation
Signed-off-by: daquexian <[email protected]>
Signed-off-by: daquexian <[email protected]>
…ched_nccl stream; 2) VirtualMachineEngine::local_pending_msg_list_;
Signed-off-by: daquexian <[email protected]>
…low into refactor_release_tensor
… = false; 3) InstructionType::OnDispatch
…o fuse_instruction
intrusive::ChannelStatus status = mut_pending_instruction_list()->MoveTo(&tmp_list); | ||
intrusive::ChannelStatus status = (mut_pending_instruction_list()->*Move)(&tmp_list); | ||
*cnt = tmp_list.size(); | ||
if (*cnt == 0) { return status; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个 size_t* cnt 是什么作用呢,看起来可以是一个局部变量
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为44行需要返回。
@@ -42,6 +50,7 @@ class InstructionType { | |||
virtual void Compute(VirtualMachineEngine* vm, InstructionMsg* instr_msg) const { | |||
LOG(FATAL) << "UNIMPLEMENTED"; | |||
} | |||
virtual void ComputeInFuseMode(InstructionMsg* instr_msg) const { LOG(FATAL) << "UNIMPLEMENTED"; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个可不可以去掉,直接调用 Compute 呢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
应为Compute的参数类型是Instruction*,现在还需要ComputeInFuseMode,等single-client的代码移除后,InstructionMsg会和Instruction合并,到时这里就可以统一成Compute了。
oneflow/core/vm/stream_type.h
Outdated
@@ -61,6 +61,7 @@ class StreamType { | |||
int64_t this_machine_id) const = 0; | |||
|
|||
virtual bool OnSchedulerThread() const = 0; | |||
virtual bool EnableInstructionFuse() const { return false; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个方法看起来没有用到
int cnt = kLimit; | ||
InstructionMsgList pending_instr_msgs; | ||
constexpr static int kPendingHandleWindow = 10; | ||
GetRewritedPendingInstructionsByWindowSize(kPendingHandleWindow, &pending_instr_msgs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的意思是最多融合10条指令?融合的指令数以及依靠融合指令带来的launch开销是怎么确定呢?假如有100个tensor它们经过一个concat变成一个tensor,那么在Dispath concat这个LocaCallOpKernel的之后会有100条可以fuse的指令,这种情况是不是可以把100条都融合在一起?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个10肯定影响融合的上限。但这个值的本意与指令融合没有关系。从变量名kPendingHandleWindow可以看出,它目的是为了让scheduler不要过分的只关注pending的情况,每处理完10条指令,应该尝试看看有没有其他stream上指令已完成。
kPendingHandleWindow这个值肯定可以调成100,但意义不大。因为并不是越大越好,越大就越可能导致其他stream的饥饿。
Speed stats:
|
vm 为每条指令的执行做了不少支撑工作,这个开销与kernel launch相比往往不能忽略。本pr尝试将多条指令融合在一起,在vm里一起被调度。这样,vm对多条指令的调度成本就平摊开来了。