-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reload nameserver information on lookup failure #41582
Reload nameserver information on lookup failure #41582
Conversation
I wonder, are there performance implications about this? E.g. does a bunch of failing queries now take much longer? |
@alexcrichton yes, I believe that that is true. This is why I first proposed in #41570 that we might instead want a |
src/libstd/sys_common/net.rs
Outdated
// The lookup failure could be caused by using a stale /etc/resolv.conf. | ||
// See https://github.com/rust-lang/rust/issues/41570. | ||
// We therefore force a reload of the nameserver information. | ||
c::res_init(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is in sys_common
so it is compiled on all platforms, but I don't see a res_init
on Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what the if cfg!(unix)
above was intended for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh, of course, it's still compiled. I need #[cfg(..)] {}
instead, right?
@alexcrichton - are you looking at this PR? Friendly ping to keep this on your radar. |
@alexcrichton I'd say pretty negligible: #![feature(test)]
extern crate libc;
extern crate test;
fn main() {}
#[cfg(test)]
mod tests {
use super::*;
use test::Bencher;
use std::net::ToSocketAddrs;
#[bench]
fn bench_plain(b: &mut Bencher) {
let addr = ("google.com", 80);
b.iter(|| addr.to_socket_addrs().map(|a| a.count()).unwrap_or(0));
}
#[bench]
fn bench_reinit(b: &mut Bencher) {
let addr = ("google.com", 80);
b.iter(|| {
addr.to_socket_addrs().map(|a| a.count()).unwrap_or(0);
unsafe { libc::res_init() };
});
}
} $ cargo +nightly bench
Finished release [optimized] target(s) in 0.0 secs
Running target/release/deps/res_init_bench-94665f8bc2ebf0bc
running 3 tests
test tests::bench_plain ... bench: 4,419,043 ns/iter (+/- 331,978)
test tests::bench_reinit ... bench: 4,420,291 ns/iter (+/- 297,022)
test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured
$ sudo netctl stop-all
$ cargo +nightly bench
Finished release [optimized] target(s) in 0.0 secs
Running target/release/deps/res_init_bench-94665f8bc2ebf0bc
running 3 tests
test tests::bench_plain ... bench: 267,152 ns/iter (+/- 18,776)
test tests::bench_reinit ... bench: 267,891 ns/iter (+/- 9,855)
test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured |
@jonhoo awesome thanks for testing! |
65fcd0d
to
1309652
Compare
@alexcrichton I believe that with rust-lang/libc#585 merged, 1309652 should now work fine. What's the process for building with a more recent libc? |
@alexcrichton not sure if this is the right way to do it, but I bumped the submodule revision for |
@bors: r+ Looks good! Let's see what CI says |
📌 Commit db36fc8 has been approved by |
…excrichton Reload nameserver information on lookup failure As discussed in #41570, UNIX systems often cache the contents of `/etc/resolv.conf`, which can cause lookup failures to persist even after a network connection becomes available. This patch modifies lookup_host to force a reload of the nameserver entries following a lookup failure. This is in line with what many C programs already do (see #41570 for details). On systems with nscd, this should not be necessary, but not all systems run nscd. Fixes #41570. Depends on rust-lang/libc#585. r? @alexcrichton
💔 Test failed - status-travis |
Looks legitimate (Edit: this is on macOS, eg: https://travis-ci.org/rust-lang/rust/jobs/228616549):
|
Problem on libc side. Using |
@kennytm yeah, that's what I was worried about. @alexcrichton, looks like we can't rely on std to pull in |
Ah yeah sorry to be clear I think adding linkage directives to libstd is ok for now, I think we just want to avoid modifying libc for now due to the impact it'll have. |
@alexcrichton how would I go about doing that? |
I think the best way would likely be to modify libc-shim's build script in this repo to print out |
Ah, okay -- gave that a shot in 912da9b. |
Looks pretty good to me, although on second though I think this may actually be best in |
912da9b
to
43acba5
Compare
Done in 43acba5fa62ac0018aa8dd498d5709f07a68bd43 |
@bors: r+ Thanks! |
macOS failed again with the same error, I think, though not sure:
|
That's so strange... The log even says that it's linking with |
Oh -lresolv is correct but both of these tests are performing manual linking, so the libs linked just need to be updated (they're run-make tests I believe) |
As discussed in rust-lang#41570, UNIX systems often cache the contents of /etc/resolv.conf, which can cause lookup failures to persist even after a network connection becomes available. This patch modifies lookup_host to force a reload of the nameserver entries following a lookup failure. This is in line with what many C programs already do (see rust-lang#41570 for details). On systems with nscd, this should not be necessary, but not all systems run nscd. Introduces an std linkage dependency on libresolv on macOS/iOS (which also makes it necessary to update run-make/tools.mk). Fixes rust-lang#41570. Depends on rust-lang/libc#585.
43acba5
to
68ae617
Compare
Updated |
@bors: r+ |
📌 Commit 68ae617 has been approved by |
I'll take that as you agreeing that 68ae617#diff-bff523d3aff3a86f367f9d199a559b71R75 is the right change to make :p |
Indeed! |
Hmm, seems like @bors is taking a while to pick this up -- @alexcrichton ? |
You are currently 4th in the queue: https://buildbot2.rust-lang.org/homu/queue/rust. It can take a while, especially now--we're actively trying to land a patch that fixes nightly, so that has higher priority so whenever we retry it, it jumps to the top. |
Ah, thanks! Didn't know there was a way of looking at the queue. Never mind me :) |
…-fail, r=alexcrichton Reload nameserver information on lookup failure As discussed in rust-lang#41570, UNIX systems often cache the contents of `/etc/resolv.conf`, which can cause lookup failures to persist even after a network connection becomes available. This patch modifies lookup_host to force a reload of the nameserver entries following a lookup failure. This is in line with what many C programs already do (see rust-lang#41570 for details). On systems with nscd, this should not be necessary, but not all systems run nscd. Fixes rust-lang#41570. Depends on rust-lang/libc#585. r? @alexcrichton
⌛ Testing commit 68ae617 with merge 42a4f37... |
Is bors drunk? This PR was just merged in a rollup, so why is bors trying to test it? |
It is worth noting that if https://sourceware.org/bugzilla/show_bug.cgi?id=984 ever lands, we might want to revert this change. |
// The lookup failure could be caused by using a stale /etc/resolv.conf. | ||
// See https://github.com/rust-lang/rust/issues/41570. | ||
// We therefore force a reload of the nameserver information. | ||
c::res_init(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this still result in surprising behaviour if e.g. the contents of /etc/resolv.conf change without the old resolver becoming unusable?
For instance, if I change my DNS resolver without making the old resolver unreachable, I'll never hit this error and any running rust applications will continue to use the old resolver...indefinitely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Though if the resolution happens successfully, what is the problem? It's also quite hard to get around that particular case. We could always call res_init
, but that seems a little wasteful. The real solution to this is to fix libc (most libcs do not have this problem — glibc is the major exception). Applications that want to be robust against this could always call libc::res_init
directly though of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though if the resolution happens successfully, what is the problem?
Playing devil's advocate, "successful" doesn't imply "correct".
We could always call res_init, but that seems a little wasteful.
How wasteful? Perhaps this is worth measuring.
The real solution to this is to fix libc (most libcs do not have this problem — glibc is the major exception).
What do you mean? What would "fixing" libc look like? What do other libcs do in contrast to glibc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though if the resolution happens successfully, what is the problem?
Playing devil's advocate, "successful" doesn't imply "correct".
True, though that sounds like a very weird setup indeed. One in which you can connect using the resolution information from the old server, but you need to instead connect to the server provided by a new resolver?
We could always call res_init, but that seems a little wasteful.
How wasteful? Perhaps this is worth measuring.
I did some benchmarks above (#41582 (comment)), and it's not terrible (especially because it doesn't require a syscall), but if we can avoid doing something...
The real solution to this is to fix libc (most libcs do not have this problem — glibc is the major exception).
What do you mean? What would "fixing" libc look like? What do other libcs do in contrast to glibc?
No other libcs have this issue. Some of them don't cache /etc/resolv.conf
, some integrate with NSS or similar services, which know when the cache should be flushed. I haven't looked into it too carefully. It is unclear what the "right" solution is given that glibc wants to be both fast (i.e., don't do a file read on every connect), and not rely on other services (like NSS).
The glibc man page for
Is it safe to call |
That's a good point, though through some digging it appears as though
This suggests that we could (and probably should) use |
@jonhoo should we file an issue somewhere for this? I'm not sure I would call it "unsound" so much as "maybe unsound under certain very specific conditions and versions of glibc" :p |
Well, https://sourceware.org/bugzilla/show_bug.cgi?id=984 is now marked as |
Existing versions of glibc that don't contain the fix will be around for years though. Probably we can never revert this workaround? :( |
As discussed in #41570, UNIX systems often cache the contents of
/etc/resolv.conf
, which can cause lookup failures to persist even after a network connection becomes available. This patch modifies lookup_host to force a reload of the nameserver entries following a lookup failure. This is in line with what many C programs already do (see #41570 for details). On systems with nscd, this should not be necessary, but not all systems run nscd.Fixes #41570.
Depends on rust-lang/libc#585.
r? @alexcrichton