Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reload!() doesn't publish DRAKE_VIEWER_ADD_ROBOT message on Linux #19

Closed
tkoolen opened this issue Dec 13, 2016 · 19 comments
Closed

reload!() doesn't publish DRAKE_VIEWER_ADD_ROBOT message on Linux #19

tkoolen opened this issue Dec 13, 2016 · 19 comments

Comments

@tkoolen
Copy link
Contributor

tkoolen commented Dec 13, 2016

I think this may have to do with maximum UDP package size.

This is with the Valkyrie model. Everything works correctly on OSX. Also, on Linux, if I just do

lcm = PyLCM.LCM()
msg = DrakeVisualizer.drakevis[:lcmt_viewer_load_robot]()
PyLCM.publish(lcm, "DRAKE_VIEWER_ADD_ROBOT", msg)

bot-spy shows that the message is indeed being published, but the message DrakeVisualizer.reload() tries to publish never gets picked up by bot-spy or drake-visualizer. If I change reload! to only add e.g. the first 60 links (out of 81) added to msg.link, then everything is fine as well.

@tkoolen
Copy link
Contributor Author

tkoolen commented Dec 13, 2016

The limit seems to be 66 links for Valkyrie. With 66 links, sizeof(msg[:encode]()) is 264544 bytes, with 65 it is 261910 bytes. Both are well over the IPv4 max packet length. And it doesn't seem like there's something wrong with link 66 specifically, because if I just add that one, the message publishes fine.

@rdeits
Copy link
Owner

rdeits commented Dec 13, 2016

Crap. I've probably never seen this because all of my computers have their UDP packet sizes turned up for various DRC-related reasons. The issue is presumably because we're pushing the mesh data through with the load_robot message (whereas Drake just sends the filenames). I really really don't want to have to serialize every mesh to disk in order to display it.

A few options:

  1. Write meshes to disk (or to ramdisk or something?) and then send filenames like Drake does. I'm not fond of this
  2. Force users to tweak their MTU settings (if they're using openhumanoids, then they probably already have). This is an obnoxious burden on users, though.
  3. Create an add_link message and publish 81 of those instead of one big load_robot message. This should work fine, but requires more changes to drake-visualizer. It will also fail if one link has a really really really big mesh (as in, one that's ~60 times bigger than Val's meshes).
  4. Switch to a more serious messaging layer like ZeroMQ which knows how to break up large messages. This would require significant changes to drake-visualizer, but would also let us confirm that there is a visualizer listening to what we're sending (and presumably automatically open one if there isn't).

@patmarion
Copy link

patmarion commented Dec 13, 2016

There is one more possibility. Director has a "mesh manager" which listens on lcm for mesh data. This was used by the affordance server in director. So you could send meshes to the mesh manager, and then when you load a robot, you could set the filename field to something customized like: director_mesh://<mesh id> and drakevisualizer is taught how to resolve that. I think this would definitely be more complicated then just implementing option 3, but could be an alternate way to do it if you didn't want to modify the load_robot protocol.

@rdeits
Copy link
Owner

rdeits commented Dec 13, 2016

Interesting; I hadn't thought of that, thanks. I think that would work fine for Val, but not as well for my other use case, which is visualizing soft robots. For the soft robots, I generate an entirely new mesh at every time step (that's also why I don't want to write the meshes to disk). It seems like registering new meshes with the manager every time I want to draw them would add extra complexity and state (especially if the load_robot message happened to be delivered before the corresponding load_mesh message).

@tkoolen
Copy link
Contributor Author

tkoolen commented Dec 13, 2016

I'd be happy with 3 above. Happier still with 4, but I know that's a lot of work.

For now, how does 2 work? I tried sudo ip link set mtu 9000 dev lo (and sudo ip link set mtu 9000 dev eno1) to no avail. Just trying to make absolutely sure that message size is indeed the problem.

@rdeits
Copy link
Owner

rdeits commented Dec 13, 2016

@tkoolen does running the setup_loopback_multicast.sh script fix the issue?

@rdeits
Copy link
Owner

rdeits commented Dec 13, 2016

I notice that my Linux desktop has MTU 1500 on the eth0 interface, but 65536 on the loopback interface.

Full output of ifconfig:

➜ ifconfig                                                                                              18:02:38
eth0      Link encap:Ethernet  HWaddr <redacted>
          inet addr:<redacted>  
          inet6 addr: <redacted>
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:18

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MULTICAST  MTU:65536  Metric:1
          RX packets:641854 errors:0 dropped:0 overruns:0 frame:0
          TX packets:641854 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:133835514 (133.8 MB)  TX bytes:133835514 (133.8 MB)

virbr0    Link encap:Ethernet  HWaddr ba:<redacted>
          inet addr: <redacted>
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

@tkoolen
Copy link
Contributor Author

tkoolen commented Dec 13, 2016

No luck with the script, and no luck with MTU 65536. I thought 9000 was the maximum, but I guess that's only for ethernet devices?

@patmarion
Copy link

patmarion commented Dec 13, 2016

did you try running these commands:

sudo sysctl -w net.core.rmem_max=2097152 
sudo sysctl -w net.core.rmem_default=2097152 

@rdeits
Copy link
Owner

rdeits commented Dec 15, 2016

From discussion with @patmarion I'm leaning towards 2 being the solution for now, but I've started looking into what 4 would require. ZeroMQ + Msgpack is a pretty nice combination.

@tkoolen
Copy link
Contributor Author

tkoolen commented Dec 15, 2016

sudo sysctl -w net.core.rmem_max=2097152 
sudo sysctl -w net.core.rmem_default=2097152 

Thanks, that made the LCM message show up in bot_spy, but drake-visualizer still isn't getting the message it seems.

@tkoolen
Copy link
Contributor Author

tkoolen commented Dec 15, 2016

@rdeits investigated the above. It turns out it's due to me being on 16.04, for which the drake-visualizer binary is out of date, so drake-visualizer doesn't listen to the same LCM channel on which DrakeVisualizer.jl is publishing the lcmt_viewer_load_robot message.

@patmarion
Copy link

I'll make you guys an updated binary. I have to do it manually for ubuntu-16. @rdeits and I are discussing ways to automate.

@patmarion
Copy link

fyi, to avoid running those sudo commands again, edit /etc/sysctl.conf and add these lines:

net.core.rmem_max=2097152
net.core.rmem_default=2097152

See https://lcm-proj.github.io/multicast_setup.html

@tkoolen
Copy link
Contributor Author

tkoolen commented Dec 15, 2016

Thanks, Pat!

@rdeits
Copy link
Owner

rdeits commented Dec 20, 2016

@tkoolen did Pat's workaround fix the issue? If so, I think we should add it to the readme and then close this. I think the ZMQ approach might be the right thing eventually, but with the workaround LCM can limp along for a little while longer.

@tkoolen
Copy link
Contributor Author

tkoolen commented Dec 20, 2016

It did, yeah. I was waiting to test the updated 16.04 build before closing this issue, but forgot to do so before leaving for home. I trust that it has been fixed though. Yeah, if you could add it to the readme, that would be great, and feel free to close after that.

@rdeits rdeits closed this as completed in c41b522 Dec 21, 2016
@rdeits
Copy link
Owner

rdeits commented Dec 21, 2016

Ok, done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants