-
Notifications
You must be signed in to change notification settings - Fork 802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refine offload test #9974
refine offload test #9974
Conversation
layer_list.append(nn.Linear(768, 4096)) | ||
# Big enough to seem mem change | ||
layer_list.append(nn.Linear(4096, 4096)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
大缓存和小缓存看起来差距不是很大?是指 nn.Linear(768, 4096) 不能被 offload 吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果 tensor 太小,发现 offload 和 load 的 cuda memory 没有变化。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
哦哦 明白了 我们之前测试的都是 1024 x 1024 x 1024 这样的。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果 tensor 太小,发现 offload 和 load 的 cuda memory 没有变化。
和 BinAllocator 的实现有关,如果一个 Block 不是都空的话,不会释放。Block 里面有一个或者多个 Pice,一个 Pice 最少 512 Byte。
所以如果当前释放的不足以产生一个 free 的 Block,就会导致 CachingAllocator 清理不出缓存。
至于 Block 会有多大,需要 @chengtbf 帮忙介绍下。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
20M 大概
Speed stats:
|
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9974/ |
Related issue: #9971