Refine vm::TensorStorage #9851

daquexian · 2023-02-10T03:50:37Z

从 DTR 里抽出来的一部分修改

把 vm::TensorStorage 挪到专门的文件 oneflow/core/eager/tensor_storage.* 里
用组合（TensorStorage + true/false）代替继承（InsideVmTensorStorage/OutsideVmTensorStorage），减少继承层级

…eritance Signed-off-by: daquexian <[email protected]>

strint

LGTM

strint · 2023-02-10T04:01:51Z

oneflow/core/eager/eager_blob_object.h

-  ~OutsideVmTensorStorage() = default;
-
-  bool is_allocated_in_vm() const override { return false; }
-};


Inside 和 Outside 看起来就只有这个一个 bool 变量的差别，而且这个 bool 不影响类内的行为

是的，所以改成在构造时传入这个 bool 变量了。否则在后续添加功能正交的新子类时会不得不对 Inside/Outside 各写一份子类

strint · 2023-02-10T04:02:48Z

oneflow/core/eager/eager_blob_object.cpp

+
+void EagerBlobObject::set_last_used_stream(Symbol<::oneflow::Stream> last_used_stream) {
+  tensor_storage_->set_last_used_stream(last_used_stream);
+}


这些看起来都是从 h 拆到了 cpp

是的，因为 eager_blob_object.h 里只前向声明 TensorStorage，没有 include tensor_stroage.h（为了减少重新编译）

strint · 2023-02-10T04:04:32Z

oneflow/core/framework/tensor_storage.cpp

@@ -14,15 +14,15 @@ See the License for the specific language governing permissions and
 limitations under the License.
 */
 #include "oneflow/core/framework/tensor_storage.h"
-#include "oneflow/core/eager/eager_blob_object.h"
+#include "oneflow/core/eager/tensor_storage.h"
 #include "oneflow/core/eager/local_dep_object.h"
 #include "oneflow/core/framework/shut_down_util.h"

 namespace oneflow {
 namespace one {

 TensorStorage::TensorStorage(const std::shared_ptr<const ParallelDesc>& parallel_desc)


有 vm 和 one 两个 TensorStorage ？

是的，实质上是一个东西，one::TensorStorage 在主线程，被 Tensor 持有，vm::TensorStorage 在 vm 线程，被 EagerBlobObject 持有

oneflow/core/eager/eager_blob_object.cpp

github-actions · 2023-02-10T12:12:53Z

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions · 2023-02-11T03:52:50Z

Speed stats:

github-actions · 2023-02-11T05:01:35Z

Speed stats:

GPU Name: GeForce GTX 1080 

❌ OneFlow resnet50 time: 141.2ms (= 14118.8ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.2ms (= 14317.8ms / 100, input_shape=[16, 3, 224, 224])
❌ Relative speed: 1.01 (= 143.2ms / 141.2ms)

OneFlow resnet50 time: 81.2ms (= 8115.3ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 89.3ms (= 8928.2ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.10 (= 89.3ms / 81.2ms)

OneFlow resnet50 time: 50.9ms (= 10184.7ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 55.3ms (= 11054.2ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.09 (= 55.3ms / 50.9ms)

OneFlow resnet50 time: 33.4ms (= 6683.4ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 43.6ms (= 8721.2ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.30 (= 43.6ms / 33.4ms)

OneFlow resnet50 time: 25.5ms (= 5090.4ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 35.6ms (= 7116.7ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.40 (= 35.6ms / 25.5ms)

OneFlow swin dataloader time: 0.235s (= 46.953s / 200, num_workers=1)
PyTorch swin dataloader time: 0.147s (= 29.483s / 200, num_workers=1)
Relative speed: 0.628 (= 0.147s / 0.235s)

OneFlow swin dataloader time: 0.063s (= 12.554s / 200, num_workers=4)
PyTorch swin dataloader time: 0.044s (= 8.890s / 200, num_workers=4)
Relative speed: 0.708 (= 0.044s / 0.063s)

OneFlow swin dataloader time: 0.037s (= 7.327s / 200, num_workers=8)
PyTorch swin dataloader time: 0.023s (= 4.560s / 200, num_workers=8)
Relative speed: 0.622 (= 0.023s / 0.037s)

❌ OneFlow resnet50 time: 152.7ms (= 15266.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 164.2ms (= 16419.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
❌ Relative speed: 1.08 (= 164.2ms / 152.7ms)

OneFlow resnet50 time: 92.2ms (= 9216.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.0ms (= 10301.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.12 (= 103.0ms / 92.2ms)

OneFlow resnet50 time: 59.7ms (= 11946.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.4ms (= 15680.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.31 (= 78.4ms / 59.7ms)

OneFlow resnet50 time: 42.5ms (= 8506.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 76.7ms (= 15338.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.80 (= 76.7ms / 42.5ms)

OneFlow resnet50 time: 37.0ms (= 7401.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 73.4ms (= 14687.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.98 (= 73.4ms / 37.0ms)

github-actions · 2023-02-11T05:11:31Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9851/

github-actions · 2023-02-11T05:46:49Z

CI failed when running job: cuda-speed-test. PR label automerge has been removed

github-actions · 2023-02-11T08:48:15Z

Speed stats:

GPU Name: NVIDIA GeForce GTX 1080 

❌ OneFlow resnet50 time: 141.4ms (= 14141.1ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 144.1ms (= 14411.3ms / 100, input_shape=[16, 3, 224, 224])
❌ Relative speed: 1.02 (= 144.1ms / 141.4ms)

OneFlow resnet50 time: 81.6ms (= 8156.8ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 84.6ms (= 8463.4ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.04 (= 84.6ms / 81.6ms)

OneFlow resnet50 time: 51.1ms (= 10214.2ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 67.6ms (= 13519.1ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.32 (= 67.6ms / 51.1ms)

OneFlow resnet50 time: 34.2ms (= 6843.0ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 47.7ms (= 9549.5ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.40 (= 47.7ms / 34.2ms)

OneFlow resnet50 time: 25.9ms (= 5184.7ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 38.4ms (= 7674.8ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.48 (= 38.4ms / 25.9ms)

OneFlow swin dataloader time: 0.238s (= 47.676s / 200, num_workers=1)
PyTorch swin dataloader time: 0.150s (= 29.957s / 200, num_workers=1)
Relative speed: 0.628 (= 0.150s / 0.238s)

OneFlow swin dataloader time: 0.064s (= 12.771s / 200, num_workers=4)
PyTorch swin dataloader time: 0.042s (= 8.429s / 200, num_workers=4)
Relative speed: 0.660 (= 0.042s / 0.064s)

OneFlow swin dataloader time: 0.036s (= 7.272s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.366s / 200, num_workers=8)
Relative speed: 0.600 (= 0.022s / 0.036s)

❌ OneFlow resnet50 time: 163.7ms (= 16370.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 181.5ms (= 18153.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
❌ Relative speed: 1.11 (= 181.5ms / 163.7ms)

OneFlow resnet50 time: 103.3ms (= 10329.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 114.4ms (= 11435.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.11 (= 114.4ms / 103.3ms)

OneFlow resnet50 time: 71.0ms (= 14203.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 88.2ms (= 17641.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.24 (= 88.2ms / 71.0ms)

OneFlow resnet50 time: 56.7ms (= 11346.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 82.6ms (= 16525.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.46 (= 82.6ms / 56.7ms)

OneFlow resnet50 time: 50.2ms (= 10046.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.9ms (= 14377.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.43 (= 71.9ms / 50.2ms)

This reverts commit c3a7e8c.

从 DTR 里抽出来的一部分修改 1. 把 vm::TensorStorage 挪到专门的文件 oneflow/core/eager/tensor_storage.* 里 2. 用组合（TensorStorage + true/false）代替继承（InsideVmTensorStorage/OutsideVmTensorStorage），减少继承层级 --------- Signed-off-by: daquexian <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <[email protected]>

之前的 PR #9851 因为出了问题整个 revert 了，现在解决问题之后重新提交。解决问题的是这个 commit 0279c6e ，storage_delete_hooks_ 在 master 本身就是在析构阶段执行的，上次的 PR 挪到 Release 函数里引起了问题，这个 commit 挪回去了。经过本地测试没有问题 --------- Signed-off-by: daquexian <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <[email protected]>

核心的逻辑是： 1. 用不同的 device 区分支持/不支持重计算的 tensor 2. 在 remat::Allocator 里实现了选择 cost 最低的 tensor 并 evict 的逻辑（对内存布局和 evict 方式的优化就是在这里） 3. 在 OpCallInstructionUtil::Compute 里实现了重新计算出被用到但已被 evict 的 tensor 的逻辑其他的都是一些周边改动使用方式： ```python x1 = flow.ones(3).to('cuda+remat') # 移动到支持重计算的 device 上 x2 = flow.ones(3).to('cuda') # 移动到不支持重计算的 device 上 x3 = x2 + x3 # 报错：device 不同 # ----- model = ResNet50() model.to('cuda+remat') data, label = dataloader() data, label = data.to('cuda+remat'), label.to('cuda+remat') loss = model(data) # 如果过程中显存满了，会自动丢弃一些 tensor loss.backward() # 如果接下来又用到被丢弃的 tensor，会自动把它们重新计算出来 ``` ---- 一部分通用的改动已经在前置 PR 里被合并： * #9698 * #9791 * #9850 * #9851 --------- Signed-off-by: daquexian <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <[email protected]> Co-authored-by: Peihong Liu <[email protected]>

move tensor storage to a separate file, use delegation instead of inh…

d19400b

…eritance Signed-off-by: daquexian <[email protected]>

daquexian added enhancement automerge eager labels Feb 10, 2023

daquexian requested review from chengtbf and strint as code owners February 10, 2023 03:50

mosout approved these changes Feb 10, 2023

View reviewed changes

strint approved these changes Feb 10, 2023

View reviewed changes

strint reviewed Feb 10, 2023

View reviewed changes

daquexian requested a review from oneflow-ci-bot February 10, 2023 05:33

clackhan reviewed Feb 10, 2023

View reviewed changes

oneflow/core/eager/eager_blob_object.cpp Show resolved Hide resolved

clackhan approved these changes Feb 10, 2023

View reviewed changes

daquexian enabled auto-merge February 10, 2023 10:21

auto-merge was automatically disabled February 10, 2023 11:47
Merge queue setting changed

mergify bot and others added 2 commits February 10, 2023 12:11

Merge branch 'master' into refine_tensor_storage

afcdd9e

auto format by CI

958a6d7

mergify bot added 2 commits February 11, 2023 00:56

Merge branch 'master' into refine_tensor_storage

84cec24

Merge branch 'master' into refine_tensor_storage

318a3a0

github-actions bot removed the automerge label Feb 11, 2023

daquexian added the automerge label Feb 11, 2023

mergify bot merged commit c3a7e8c into master Feb 11, 2023

mergify bot deleted the refine_tensor_storage branch February 11, 2023 09:00

daquexian mentioned this pull request Feb 13, 2023

Tensor Rematerialization (a.k.a. DTR/Coop) #9861

Merged

ccssu mentioned this pull request Feb 22, 2023

日期2023-02-11后oneflow包导致，程序在运行过程中出现错误导致程序卡死。 Oneflow-Inc/one-yolov5#114

Closed

daquexian added a commit that referenced this pull request Feb 24, 2023

Revert "Refine vm::TensorStorage (#9851)"

fef9516

This reverts commit c3a7e8c.

daquexian mentioned this pull request Mar 9, 2023

[Resubmit] Refine tensor storage #9962

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refine vm::TensorStorage #9851

Refine vm::TensorStorage #9851

daquexian commented Feb 10, 2023

strint left a comment

strint Feb 10, 2023

daquexian Feb 10, 2023 •

edited

Loading

strint Feb 10, 2023

daquexian Feb 10, 2023

strint Feb 10, 2023

daquexian Feb 10, 2023

github-actions bot commented Feb 10, 2023

github-actions bot commented Feb 11, 2023

github-actions bot commented Feb 11, 2023

github-actions bot commented Feb 11, 2023

github-actions bot commented Feb 11, 2023

github-actions bot commented Feb 11, 2023

Refine vm::TensorStorage #9851

Refine vm::TensorStorage #9851

Conversation

daquexian commented Feb 10, 2023

strint left a comment

Choose a reason for hiding this comment

strint Feb 10, 2023

Choose a reason for hiding this comment

daquexian Feb 10, 2023 • edited Loading

Choose a reason for hiding this comment

strint Feb 10, 2023

Choose a reason for hiding this comment

daquexian Feb 10, 2023

Choose a reason for hiding this comment

strint Feb 10, 2023

Choose a reason for hiding this comment

daquexian Feb 10, 2023

Choose a reason for hiding this comment

github-actions bot commented Feb 10, 2023

github-actions bot commented Feb 11, 2023

github-actions bot commented Feb 11, 2023

github-actions bot commented Feb 11, 2023

github-actions bot commented Feb 11, 2023

github-actions bot commented Feb 11, 2023

daquexian Feb 10, 2023 •

edited

Loading