Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rough sketch for a syscall API for argv/argc and exit status #179

Closed
sunfishcode opened this issue Oct 16, 2018 · 9 comments
Closed

Rough sketch for a syscall API for argv/argc and exit status #179

sunfishcode opened this issue Oct 16, 2018 · 9 comments

Comments

@sunfishcode
Copy link
Member

The following is a rough sketch for a syscall API. The idea is to support a traditional C-style "main" function which in some environments is called from the wasm start function. The wasm start function has no arguments or return values, so we instead use imported functions to communicate this information.

I'm hoping this proposal will eventually help lead to something useful on its own (a standard way to invoke command-line-style wasm programs), but I'm also hoping it encourages meta-level conversations about how to design and document proposals. "main" is just the beginning ;-).

argv and argc

(func (import “@std_main/” “argv_data_len”) (result i32))
(func (import “@std_main/” “store_argv_data”) (param $data i32))
(func (import “@std_main/” “argc”) (result i32))
(func (import “@std_main/” “store_argv_pointers”)
      (param $data i32) (param $pointers i32))

Argv strings are NUL-terminated C-style strings and are concatenated into a single buffer, the argv data. argv_data_len returns the unsigned length of the argv data, including the NULs. store_argv_data stores the argv data into the default linear memory at $ptr.

argc returns the unsigned number of NUL-terminated strings are encoded in the argv data. store_argv_pointers populates an argv-style array (with length argc) of 32-bit indices at $pointers in the default linear memory, giving the starts of the NUL-terminated strings in the argv data.

store_argv_data and store_argv_offsets trap if any byte to be written would be out of bounds.

The rough startup sequence would be: call argv_data_len to obtain the argv data length, malloc a buffer of that size, call store_argv_data to write the argv data into it, call argc to obtain the number of arguments, malloc a buffer enough for that many pointers (plus one for C which needs a trailing NULL pointer), call store_argv_pointers to populate it with the argv pointers (and store NULL in the last element for C).

Strings are not required to be valid UTF-8.

Exit Status

Each wasm instance has an associated exit status variable, which is either an integer or one of a predefined set of exit messages. The value is returned to the host when the start function exits. The initial value when the start function is called is the message “success”.

(func (import “@std_main/” “set_exit_status_i32”) (param $val i32))
(func (import “@std_main/” “set_exit_status_message” (param $index i32))

set_exit_status_i32 sets the exit status variable to an integer value.

set_exit_status_message sets the exit status variable to one of a set of predefined exit messages:

index message
0 success
1 general error
2 command-line usage error
3 memory allocation error
N in [4, 16) error code N (not yet defined)
N in [16, 128) exit with value N-16
N >= 128 error code N (not yet defined)

Integer values and exit messages are both mapped to the host environment in host-specific and potentially lossy ways, with the following constraints:

  • an i32 value of 0 is mapped as if it were the message “success”
  • An i32 value x such that 0 < x < 112 is mapped to a message “exit with value x” with “x” replaced by the value

Rationale for the magic number 112: Unix-like environments reserve exit values N >= 128 for signal termination values, and command-line shells use 127 and 126 to indicate "command not found" and "command is not executable", respectively. In theory, if shells need more values, they'll continue to count down from there. And, <sysexits.h> is a header on many systems which defines some error codes starting at 64 and counting up, currently up to 78. 112 is a somewhat arbitrary point in between these two, allowing both to grow.

Rationale for having a message API in addition to an i32 API: Some programs use exit codes to communicate specific information. For example, while 1 is commonly used to indicate an error, in grep, 1 means that no matches were found, while 2 indicates an error. The idea here is to provide an i32 API for use by portable C code that imposes minimal interpretation on the meaning, and also to provide a wasm-specific message API that programs can opt into using to evoke a specific interpretation.

@rianhunter
Copy link

  • I'm not 100% sure but I don't think this is the intended purpose of the wasm start function. Using it this way would mean that main() would get called when the main module is instantiated.
  • Where does the argv data ultimately come from? Unless I'm missing something, it seems better for the module to ask the module from which it's importing where the argv data is, instead of setting it.

@sunfishcode
Copy link
Member Author

I'm not 100% sure but I don't think this is the intended purpose of the wasm start function. Using it this way would mean that main() would get called when the main module is instantiated.

That's a good point. I've tweaked the wording a little, but let me also explain here: Calling main from the wasm start function isn't the only way to do it, but for purely command-line-oriented programs, I think it is a natural way to do it. The important aspect here is that we don't want to rely on the entrypoint (whatever it is) having arguents or return values.

Where does the argv data ultimately come from? Unless I'm missing something, it seems better for the module to ask the module from which it's importing where the argv data is, instead of setting it.

The argv data comes from whatever entity satisfies these imports. In a command-line-oriented wasm runner, these might be provided by the runner itself. In a browser, there might be a polyfill module providing a command-line-environment to run in. Other environments could decide how to resolve these symbols in whatever way makes sense in their contexts.

As for pointing the module to where the argv data is rather than copying it in, the difficulty is that Wasm programs by design can't refer to data they haven't been explicitly given access to.

@rianhunter
Copy link

Where does the argv data ultimately come from? Unless I'm missing something, it seems better for the module to ask the module from which it's importing where the argv data is, instead of setting it.

The argv data comes from whatever entity satisfies these imports. In a command-line-oriented wasm runner, these might be provided by the runner itself. In a browser, there might be a polyfill module providing a command-line-environment to run in. Other environments could decide how to resolve these symbols in whatever way makes sense in their contexts.

As for pointing the module to where the argv data is rather than copying it in, the difficulty is that Wasm programs by design can't refer to data they haven't been explicitly given access to.

I see, so the module runner is responsible for filling the argv data when the module calls store_argv_data. The naming was a bit confusing for me, since by "store" I thought that meant it was sending (storing) the pointer to the module runner.

@bjfish
Copy link

bjfish commented Dec 1, 2018

It would be nice if there was some convention regarding calling a main() or init() but calling this from start has this limitation? WebAssembly/design#1160

@rianhunter
Copy link

@bjfish the function being named "main" is the convention, isn't it? whatever is running the wasm can just assume that is the main entry point by convention.

@bjfish
Copy link

bjfish commented Dec 1, 2018

@rianhunter Yes, I think a "main" function being the entry point would be a good convention for a wasm runtime. It'd be nice to have an option to override this via configuration as well.

@lachlansneff
Copy link

Would that pass in cli arguments as parameters like emscripten does? Or would imported functions be used to get the cli arguments?

@bjfish
Copy link

bjfish commented Dec 3, 2018

@lachlansneff I think the way emscripten does this is by generating glue code (shims) which call it’s own wasm syscall/runtime API. In the short term, I think imported functions could be used directly but hopefully future glue code would adopt the new reference syscall API.

@sunfishcode
Copy link
Member Author

This design for command-line arguments is now subsumed by the WASI proposal; see the __wasi_args_* functions in the API.

@sunfishcode sunfishcode transferred this issue from WebAssembly/wasi-libc-old Mar 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants