Skip to content

Add a device parameter to RemoteModule #44254

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Add a device parameter to RemoteModule #44254

wants to merge 1 commit into from

Conversation

wayi1
Copy link
Contributor

@wayi1 wayi1 commented Sep 5, 2020

Summary:
Add a device parameter to RemoteModule, so it can be placed on any device
and not just CPU.

Original PR issue: RemoteModule enhancements #40550

Test Plan: buck test test/distributed/rpc:process_group_agent -- RemoteModule

Differential Revision: D23483803

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D23483803

@wayi1 wayi1 self-assigned this Sep 5, 2020
@dr-ci
Copy link

dr-ci bot commented Sep 6, 2020

💊 CI failures summary and remediations

As of commit 3e64802 (more details on the Dr. CI page):


  • 2/2 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_test (1/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Sep 18 07:34:42 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future
Sep 18 07:34:42 At: 
Sep 18 07:34:42   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(94): serialize 
Sep 18 07:34:42   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(146): serialize 
Sep 18 07:34:42  
Sep 18 07:34:42 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future 
Sep 18 07:34:42  
Sep 18 07:34:42 At: 
Sep 18 07:34:42   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(94): serialize 
Sep 18 07:34:42   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(146): serialize 
Sep 18 07:34:42  
Sep 18 07:34:42 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future 
Sep 18 07:34:42  
Sep 18 07:34:42 At: 
Sep 18 07:34:42   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(94): serialize 
Sep 18 07:34:42   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(146): serialize 
Sep 18 07:34:42  
Sep 18 07:34:42 ok (1.638s) 
Sep 18 07:34:43   test_return_future_remote (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:576] RPC agent for worker3 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Sep 18 07:34:43 [W tensorpipe_agent.cpp:576] RPC agent for worker0 encountered error when reading incoming request from worker1: EOF: end of file (this is expected to happen during shutdown) 
Sep 18 07:34:44 ok (1.648s) 
Sep 18 07:34:45   test_return_local_rrefs (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:576] RPC agent for worker3 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_ge_config_simple_test (2/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Sep 18 07:34:16 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future
Sep 18 07:34:16 At: 
Sep 18 07:34:16   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(94): serialize 
Sep 18 07:34:16   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(146): serialize 
Sep 18 07:34:16  
Sep 18 07:34:16 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future 
Sep 18 07:34:16  
Sep 18 07:34:16 At: 
Sep 18 07:34:16   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(94): serialize 
Sep 18 07:34:16   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(146): serialize 
Sep 18 07:34:16  
Sep 18 07:34:16 [E request_callback_no_python.cpp:618] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future 
Sep 18 07:34:16  
Sep 18 07:34:16 At: 
Sep 18 07:34:16   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(94): serialize 
Sep 18 07:34:16   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(146): serialize 
Sep 18 07:34:16  
Sep 18 07:34:16 [W tensorpipe_agent.cpp:576] RPC agent for worker3 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Sep 18 07:34:16 [W tensorpipe_agent.cpp:576] RPC agent for worker1 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Sep 18 07:34:16 ok (1.536s) 
Sep 18 07:34:17   test_return_future_remote (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:576] RPC agent for worker3 encountered error when reading incoming request from worker2: EOF: end of file (this is expected to happen during shutdown) 
Sep 18 07:34:17 [W tensorpipe_agent.cpp:576] RPC agent for worker0 encountered error when reading incoming request from worker2: EOF: end of file (this is expected to happen during shutdown) 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 36 times.

@codecov
Copy link

codecov bot commented Sep 6, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@ed862d3). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #44254   +/-   ##
=========================================
  Coverage          ?   67.95%           
=========================================
  Files             ?      384           
  Lines             ?    49597           
  Branches          ?        0           
=========================================
  Hits              ?    33702           
  Misses            ?    15895           
  Partials          ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ed862d3...3049dd6. Read the comment docs.

wayi1 pushed a commit that referenced this pull request Sep 7, 2020
Pull Request resolved: #44254

Add a device parameter to RemoteModule, so it can be placed on any device
and not just CPU.

Original PR issue: RemoteModule enhancements #40550

Differential Revision: [D23483803](https://our.internmc.facebook.com/intern/diff/D23483803/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D23483803

@wayi1 wayi1 requested a review from mrshenli September 8, 2020 22:36
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D23483803

3 similar comments
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D23483803

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D23483803

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D23483803

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D23483803

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D23483803

@wayi1 wayi1 requested a review from rohan-varma as a code owner September 18, 2020 05:14
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D23483803

Summary:
Pull Request resolved: #44254

Add a device parameter to RemoteModule, so it can be placed on any device
and not just CPU.

Original PR issue: RemoteModule enhancements #40550

Test Plan: buck test test/distributed/rpc:process_group_agent -- RemoteModule

Reviewed By: pritamdamania87

Differential Revision: D23483803

fbshipit-source-id: d0459f94906b1c4df1fee3cda981c5249c78b842
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D23483803

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in c68cc78.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants