-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fd_set implementation is inefficient #283
Comments
Thanks for the thorough analysis! I'm curious what led you to notice this; do you have a use case with many In WASI, there are more open file descriptors than a typical POSIX program would expect there to be. Beyond stdin, stdout, and stderr, WASI passes in preopened directory fds, and in the future, it will likely pass in additional file descriptors. A program which uses only 1024 file descriptors on typical POSIX platforms could use more on WASI, so the current |
i noticed this issue while debugging a program that was mishandling for background, i've implemented a WASI runtime, so i'm a bit familiar with the expectations/requirements of the WASI libc runtime. i know that you need to pass in open directory handles, including the cwd/ but let's assume such a pathological setup is reasonable. i'll point out again that the behavior you describe is not POSIX compliant -- it specifically says what cloudlibc is doing is undefined behavior, not implementation-defined behavior. that means compilers are allowed to turn this into an abort or memory corruption or anything else. anyone writing to this API is asking for trouble, and it's their fault when it happens. kind of the point of WASI libc is to enable POSIX compliant applications with minimal to no modification. POSIX already provides APIs (e.g. poll()) specifically to handle arbitrary fd limits. those would work right now. i grok that maybe WASI libc probably wants to be a little nicer to folks out of the box and allow them to use POSIX APIs they're used to without having to learn something new (ignoring the fact that WASI/WASM are very new :p). that is still reasonably achievable without bloating things so much. a bitmask scales by 8, so a bitmask implementation with |
Could you clarify this? I'm not seeing from your description how cloudlibc has undefined behavior here. Bumping Admittedly, I don't know of specific applications that do this. But I also don't perceive In an application that only opens 16 or fewer file descriptors, a |
i quoted it in my original report/description:
i grok that, from cloudlibc's pov, it's trying to make an API guarantee, but from POSIX's pov, this is clearly undefined behavior, and compilers are free to optimize it as such. i'm not aware of any currently doing so, but when you tell people "X behavior is classified as undefined behavior by the standards authors", that puts it in a category of "do not touch it".
such an application is clearly not POSIX compliant, and is thus WASI-libc-specific by definition, and hoping that the runtime does not break on it in the future. further, such an application that assumes the range [0,2] is allocated and [3, infinity) is free is buggy. POSIX guarantees that 0/1/2 are allocated, but that it is. It, nor any other runtime, makes any guarantees about any other fd. if it did, then the WASI runtime would be in a bit of a pickle because it wouldn't be allowed to pass in pre-opened paths at all. in the *NIX world, it is trivial to leak open fds, accidentally or on purpose, when running other programs. it's why Linux invented O_CLOEXEC, why the historical BSD/POSIX daemon interface has a such an application should be using
|
WASI does not intend to define itself as a strictly POSIX platform. WASI follows POSIX in many areas, and will likely always do so, as this is very useful, but there's nothing wrong with WASI saying that some things that are UB in POSIX have defined behavior in WASI. And when we say this, compilers shouldn't assume that they're UB. And as one of the maintainers of the compiler used with wasi-libc, I believe this is realistic.
I don't think we know that the rest of the *NIX world has lived without it. The point of the example I brought up is that it's something an application could theoretically do and be ok on POSIX, which wouldn't work on WASI. Beyond that example though, wasi-libc's If there are real-world use cases where the extra |
cloudlibc seems to have made a design choice that POSIX does not require, and what most *NIX implementations don't support. this has led to cloudlibc using a lot more memory supporting scenarios with
select()
that is better served bypoll()
. imo this should be revisited by basically deleting the cloudlibc logic entirely.first, let's start with what POSIX says:
let's look at what cloudlibc does:
FD_SETSIZE
set to1024
int __fds[FD_SETSIZE];
-- an array of integers (fds)__fds
is just "the next used fd"__fds
is the fd to check__fds
array takes exactly__nfds
checks (good)sizeof(fd_set)
is 4100 bytes (bad)FD_SETSIZE
(fine)select()
, not just[0, FD_SETSIZE)
(good)let's look at what musl (and basically every other *NIX library out there) does:
FD_SETSIZE
set to 1024unsigned long fds_bits[FD_SETSIZE / 8 / sizeof(long)];
-- a bitmask of fdsfds_bits
is the fd to checkfds_bits
takes longer, but most architectures have "bit is set" insns to optimize this. an unoptimized implementation would take at least 128 tests (FD_SETSIZE / 8
) (meh)sizeof(fd_set)
is 128 bytes (good)FD_SETSIZE
(fine)[0, FD_SETSIZE)
can be monitored withselect()
(fine)so cloudlibc uses 4k of memory instead of 128bytes in order to be slightly faster and to support fds larger than
FD_SETSIZE
. but no POSIX-compliant project would ever use an fd >=FD_SETSIZE
, which means they'd be making assumptions that the underlying code is cloudlibc.WASI API isn't relevant because it doesn't define fd_set, and this is purely about the
select()
API which is entirely part of WASI libc.The text was updated successfully, but these errors were encountered: