-
Notifications
You must be signed in to change notification settings - Fork 803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine vm::TensorStorage #9851
Refine vm::TensorStorage #9851
Conversation
…eritance Signed-off-by: daquexian <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
~OutsideVmTensorStorage() = default; | ||
|
||
bool is_allocated_in_vm() const override { return false; } | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inside 和 Outside 看起来就只有这个一个 bool 变量的差别,而且这个 bool 不影响类内的行为
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是的,所以改成在构造时传入这个 bool 变量了。否则在后续添加功能正交的新子类时会不得不对 Inside/Outside 各写一份子类
|
||
void EagerBlobObject::set_last_used_stream(Symbol<::oneflow::Stream> last_used_stream) { | ||
tensor_storage_->set_last_used_stream(last_used_stream); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这些看起来都是从 h 拆到了 cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是的,因为 eager_blob_object.h 里只前向声明 TensorStorage,没有 include tensor_stroage.h(为了减少重新编译)
@@ -14,15 +14,15 @@ See the License for the specific language governing permissions and | |||
limitations under the License. | |||
*/ | |||
#include "oneflow/core/framework/tensor_storage.h" | |||
#include "oneflow/core/eager/eager_blob_object.h" | |||
#include "oneflow/core/eager/tensor_storage.h" | |||
#include "oneflow/core/eager/local_dep_object.h" | |||
#include "oneflow/core/framework/shut_down_util.h" | |||
|
|||
namespace oneflow { | |||
namespace one { | |||
|
|||
TensorStorage::TensorStorage(const std::shared_ptr<const ParallelDesc>& parallel_desc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有 vm 和 one 两个 TensorStorage ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是的,实质上是一个东西,one::TensorStorage 在主线程,被 Tensor 持有,vm::TensorStorage 在 vm 线程,被 EagerBlobObject 持有
Merge queue setting changed
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally. |
Speed stats:
|
Speed stats:
|
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9851/ |
CI failed when running job: cuda-speed-test. PR label automerge has been removed |
Speed stats:
|
This reverts commit c3a7e8c.
从 DTR 里抽出来的一部分修改 1. 把 vm::TensorStorage 挪到专门的文件 oneflow/core/eager/tensor_storage.* 里 2. 用组合(TensorStorage + true/false)代替继承(InsideVmTensorStorage/OutsideVmTensorStorage),减少继承层级 --------- Signed-off-by: daquexian <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <[email protected]>
之前的 PR #9851 因为出了问题整个 revert 了,现在解决问题之后重新提交。解决问题的是这个 commit 0279c6e ,storage_delete_hooks_ 在 master 本身就是在析构阶段执行的,上次的 PR 挪到 Release 函数里引起了问题,这个 commit 挪回去了。经过本地测试没有问题 --------- Signed-off-by: daquexian <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <[email protected]>
核心的逻辑是: 1. 用不同的 device 区分支持/不支持重计算的 tensor 2. 在 remat::Allocator 里实现了选择 cost 最低的 tensor 并 evict 的逻辑(对内存布局和 evict 方式的优化就是在这里) 3. 在 OpCallInstructionUtil::Compute 里实现了重新计算出被用到但已被 evict 的 tensor 的逻辑 其他的都是一些周边改动 使用方式: ```python x1 = flow.ones(3).to('cuda+remat') # 移动到支持重计算的 device 上 x2 = flow.ones(3).to('cuda') # 移动到不支持重计算的 device 上 x3 = x2 + x3 # 报错:device 不同 # ----- model = ResNet50() model.to('cuda+remat') data, label = dataloader() data, label = data.to('cuda+remat'), label.to('cuda+remat') loss = model(data) # 如果过程中显存满了,会自动丢弃一些 tensor loss.backward() # 如果接下来又用到被丢弃的 tensor,会自动把它们重新计算出来 ``` ---- 一部分通用的改动已经在前置 PR 里被合并: * #9698 * #9791 * #9850 * #9851 --------- Signed-off-by: daquexian <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <[email protected]> Co-authored-by: Peihong Liu <[email protected]>
从 DTR 里抽出来的一部分修改