Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readdir seeking #7

Closed
AndrewScheidecker opened this issue Jun 26, 2019 · 10 comments
Closed

readdir seeking #7

AndrewScheidecker opened this issue Jun 26, 2019 · 10 comments
Labels
question Further information is requested

Comments

@AndrewScheidecker
Copy link

POSIX API:

int            closedir(DIR *);
DIR           *opendir(const char *);
DIR           *fdopendir(int fd);
struct dirent *readdir(DIR *);
void           rewinddir(DIR *);
void           seekdir(DIR *, long int);
long int       telldir(DIR *);

Note that the locations returned by telldir are invalidated by rewinddir. It's also unspecified whether changes to the directory since opendir or rewinddir were called will be reflected.

Windows API:

HANDLE FindFirstFileW(const wchar_t*, WIN32_FIND_DATAW*);
BOOL FindNextFileW(HANDLE, WIN32_FIND_DATAW*);
BOOL FindClose(HANDLE);

Windows Vista+ also has GetFileInformationByHandleEx that works with an open file handle instead of a search path, and can be called with FileIdBothDirectoryInfo to read dirent info into a buffer.

Neither Windows API supports anything like seekdir. There's a way to reset the internal pointer in GetFileInformationByHandleEx to the beginning, and you can always call FindClose+FindFirstFileW again to implement rewinddir.

WASI's API:

typedef struct __wasi_dirent_t
{
	__wasi_dircookie_t d_next;
	__wasi_inode_t d_ino;
	uint32_t d_namlen;
	__wasi_filetype_t d_type;
} __wasi_dirent_t;

__wasi_errno_t __wasi_fd_readdir(__wasi_fd_t fd,
                                 void* buffer,
                                 size_t buffer_bytes,
                                 __wasi_dircookie_t cookie);

To map this to the stateful POSIX or Win32 APIs, a WASI implementation needs to mirror the host API's positional state, and determine whether a seek is implied by the cookie argument.

On Windows, that may imply restarting the enumeration from the beginning, and enumerating flies until it reaches whatever file it associates with cookie. Restarting the enumeration could lead to some WASI applications that run in O(numDirEnts^2) time on Windows directories despite running in O(numDirEnts) time on other directories, so it's not practical IMO.

Alternatively, the WASI implementation may buffer the previously read dirents, and just move the buffer pointer back to the direct with the specified cookie. I think this is undesirable because there's already buffering at the host API level, and likely also in the WASI application itself (e.g. in the WASI libc).

Finally, WASI could remove seeking from the current API, so it just streams dirents sequentially, and has an argument or an additional entry point to restart from the beginning. That wouldn't require WASI implementations to do any extra buffering on Windows, but it does mean that WASI applications that need to seek must buffer the dirents themselves.

@dumblob
Copy link

dumblob commented Jun 27, 2019

Beware, that file-like (or other including DB-like etc.) APIs might in the end not be part of WASI namespace - see WebAssembly/WASI#56 (comment) and WebAssembly/WASI#38 (comment) .

@tschneidereit
Copy link
Member

APIs might in the end not be part of WASI namespace

They would still be part of the WASI namespace. The idea behind the discussions you reference is that no APIs would be required. In spec terms, everything would be "normative optional": it's not required to exist, but if it exists it has to follow the specified behavior. That has no bearing on which namespace an API would be in, however.

@AndrewScheidecker
Copy link
Author

Alternatively, the WASI implementation may buffer the previously read dirents, and just move the buffer pointer back to the direct with the specified cookie. I think this is undesirable because there's already buffering at the host API level, and likely also in the WASI application itself (e.g. in the WASI libc).

Another issue with this approach is that there's no way for the application to free any dirents the WASI implementation is buffering short of closing the FD.

@marmistrz
Copy link
Contributor

On Windows, is the order of elements returned by FindNextFileW guaranteed to be stable, i.e. will the function return the same stream of directories for two consecutive invocations? I guess this is what's meant by the MSDN doc

The order in which the search returns the files, such as alphabetical order, is not guaranteed, and is dependent on the file system

and this stackoverflow post

@sunfishcode sunfishcode transferred this issue from WebAssembly/WASI May 20, 2020
@sunfishcode sunfishcode added the question Further information is requested label Oct 15, 2020
@sunfishcode
Copy link
Member

To add to this, on the POSIX side there are several issues as well.

POSIX unfortunately requires seekdir and telldir to use the C long type to represent directory offsets. That's only 32-bit on wasm32, so it isn't enough to store a WASI directory cookie, which has to be 64-bit. I have a this for this which adds a layer of indirection here: WebAssembly/wasi-libc#221, however this is ugly.

The behavior of seekdir/telldir are tricky in the presence of concurrent directory modifications.

The cookie values potentially leak implementation details about the underlying filesystem.

It also happens that seekdir and telldir aren't required in POSIX; they're optional. So in all, the alternative above of "WASI could remove seeking from the current API" is sounding very tempting.

Does anyone know of places where seekdir and telldir are used by applications in practice?

@sunfishcode
Copy link
Member

Here are some more links describing how seekdir/telldir are rarely used, tricky to use correctly, and burdensome to implement:

Also, seekdir/telldir aren't defined in Android's libc, leading some users to do "evil" things: https://github.com/tytso/xfstests-bld/blob/7ca4e52401a5e3376ee285cbb973f84559c12572/android-compat/telldir.c

@sunfishcode
Copy link
Member

I've gotten more informal feedback from people saying they aren't aware of seekdir/telldir being needed, and no one responding with places they're needed. So I'm picturing the following plan:

  • FIrst, remove seekdir/telldir from just WASI libc
  • If there are no problems, remove the dir_cookie argument to fd_readdir and change it's semantics:
    • Make it remember its position in the stream between calls, similar to Linux's getdents64.
    • Add a fd_rewinddir function to reset the stream to the beginning.

@linclark
Copy link
Member

linclark commented Apr 2, 2021

[bookkeeping] Added to the Moving wasi-filesystem to Phase 3 project.

sunfishcode added a commit to sunfishcode/wasi-filesystem that referenced this issue Nov 22, 2021
This makes a number of changes, to make use of interface-types features such
as `expected`, variant types, and resources. The change to use resources in
particular means that filesystem functions are now methods of the `descriptor`
resource. Since this means renaming everything, take this opportunity to
introduce a new naming conventions, with `_at` being used for functions that
take dirfd+path arguments.

This also eliminates the `rights` concept what was present in earlier versions
of WASI, has has discussed in WebAssembly#31. This required adding new flags to `open_at`,
so while here, this also adds basic `chmod`-like support, as discussed in WebAssembly#33.

And, this removes support for readdir seeking (seekdir/telldir), as discussed
in WebAssembly#7.

And it adds a fifo file type and a more general socket type, as discussed in
sunfishcode added a commit to sunfishcode/wasi-filesystem that referenced this issue Nov 22, 2021
This makes a number of changes, to make use of interface-types features such
as `expected`, variant types, and resources. The change to use resources in
particular means that filesystem functions are now methods of the `descriptor`
resource. Since this means renaming everything, take this opportunity to
introduce a new naming conventions, with `_at` being used for functions that
take dirfd+path arguments.

This also eliminates the `rights` concept what was present in earlier versions
of WASI, has has discussed in WebAssembly#31. This required adding new flags to `open_at`,
so while here, this also adds basic `chmod`-like support, as discussed in WebAssembly#33.

And, this removes support for readdir seeking (seekdir/telldir), as discussed
in WebAssembly#7.

And it adds a fifo file type and a more general socket type, as discussed in
sunfishcode added a commit to sunfishcode/wasi-filesystem that referenced this issue Nov 22, 2021
This makes a number of changes, to make use of interface-types features such
as `expected`, variant types, and resources. The change to use resources in
particular means that filesystem functions are now methods of the `descriptor`
resource. Since this means renaming everything, take this opportunity to
introduce a new naming conventions, with `_at` being used for functions that
take dirfd+path arguments.

This also eliminates the `rights` concept what was present in earlier versions
of WASI, has has discussed in WebAssembly#31. This required adding new flags to `open_at`,
so while here, this also adds basic `chmod`-like support, as discussed in WebAssembly#33.

And, this removes support for readdir seeking (seekdir/telldir), as discussed
in WebAssembly#7.

And it adds a fifo file type and a more general socket type, as discussed in
sunfishcode added a commit to sunfishcode/wasi-filesystem that referenced this issue Nov 22, 2021
This makes a number of changes, to make use of interface-types features such
as expected, variant types, and resources. The change to use resources in
particular means that filesystem functions are now methods of the descriptor
resource. Since this means renaming everything, take this opportunity to
introduce a new naming conventions, with _at being used for functions that
take dirfd+path arguments.

This also eliminates the rights concept what was present in earlier versions
of WASI, has has discussed in WebAssembly#31. This required adding new flags to open_at,
so while here, this also adds basic chmod-like support, as discussed in WebAssembly#33.

And, this removes support for readdir seeking (seekdir/telldir), as discussed
in WebAssembly#7.

And it adds a fifo file type and a more general socket type, as discussed in WebAssembly#4.
sunfishcode added a commit that referenced this issue Dec 15, 2021
This makes a number of changes, to make use of interface-types features such
as expected, variant types, and resources. The change to use resources in
particular means that filesystem functions are now methods of the descriptor
resource. Since this means renaming everything, take this opportunity to
introduce a new naming conventions, with _at being used for functions that
take dirfd+path arguments.

This also eliminates the rights concept what was present in earlier versions
of WASI, has has discussed in #31. This required adding new flags to open_at,
so while here, this also adds basic chmod-like support, as discussed in #33.

And, this removes support for readdir seeking (seekdir/telldir), as discussed
in #7.

And it adds a fifo file type and a more general socket type, as discussed in #4.
@sunfishcode
Copy link
Member

As an update here:

  • I'm now planning to leave seekdir/telldir in wasi-libc and leave fd_readdir in preview1 as-is
  • The preview2 wasi-filesystem API readdir doesn't have cookies exposed, and just supports linear iteration. The preview1 polyfill we're building works by doing a linear search, which isn't great if users do it a lot, but the assumption here is that users aren't doing this a lot.

@sunfishcode
Copy link
Member

The preview2 wasi-filesystem API now has a readdir function that returns a new dir-entry-stream and does not have a way to rewind an existing stream. If anyone has a use case for directory seeking, please file a new issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants