-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Writing valid UTF-8 to the Windows terminal using stdout().write() can incorrectly return an error #83258
Comments
I see two distinct possibilites to resolve this:
|
Hm, so this is ultimately a bug in the buffer |
@ChrisDenton The logic which errors if incomplete UTF-8 is passed in at index 0 is in the std lib at sys/windows/stdio.rs. That is where the error message comes from. As far as I can see the stdio buffer implementation (in stable, it is in linewritershim.rs, apparently currently being revamped) can not know of the requirement that only complete UTF-8 may be written to the underlying Windows terminal so I think that it should be encapsulated in sys/windows/stdio.rs by just accepting and buffering incomplete UTF-8 when necessary. |
I mean, a generic buffer can attempt to avoid small writes (say, less than 16 bytes) by copying those left over bytes to the front of the buffer and waiting for either another But I see what you mean. Introducing a small buffer to windows/stdio would fix the problem. |
…utf8, r=m-ou-se Allow writing of incomplete UTF-8 sequences to the Windows console via stdout/stderr # Problem Writes of just an incomplete UTF-8 byte sequence (e.g. `b"\xC3"` or `b"\xF0\x9F"`) to stdout/stderr with a Windows console attached error with `io::ErrorKind::InvalidData, "Windows stdio in console mode does not support writing non-UTF-8 byte sequences"` even though further writes could complete the codepoint. This is currently a rare occurence since the [linewritershim](https://github.com/rust-lang/rust/blob/2c56ea38b045624dc8b42ec948fc169eaff1206a/library/std/src/io/buffered/linewritershim.rs) implementation flushes complete lines immediately and buffers up to 1024 bytes for incomplete lines. It can still happen as described in rust-lang#83258. The problem will become more pronounced once the developer can switch stdout/stderr from line-buffered to block-buffered or immediate when the changes in the "Switchable buffering for Stdout" pull request (rust-lang#78515) get merged. # Patch description If there is at least one valid UTF-8 codepoint all valid UTF-8 is passed through to the extracted `write_valid_utf8_to_console()` fn. The new code only comes into play if `write()` is being passed a short byte slice comprising an incomplete UTF-8 codepoint. In this case up to three bytes are buffered in the `IncompleteUtf8` struct associated with `Stdout` / `Stderr`. The bytes are accepted one at a time. As soon as an error can be detected `io::ErrorKind::InvalidData, "Windows stdio in console mode does not support writing non-UTF-8 byte sequences"` is returned. Once a complete UTF-8 codepoint is received it is passed to the `write_valid_utf8_to_console()` and the buffer length is set to zero. Calling `flush()` will neither error nor write anything if an incomplete codepoint is present in the buffer. # Tests Currently there are no Windows-specific tests for console writing code at all. Writing (regression) tests for this problem is a bit challenging since unit tests and UI tests don't run in a console and suddenly popping up another console window might be surprising to developers running the testsuite and it might not work at all in CI builds. To just test the new functionality in unit tests the code would need to be refactored. Some guidance on how to proceed would be appreciated. # Public API changes * `std::str::verifications::utf8_char_width()` would be exposed as `std::str::utf8_char_width()` behind the "str_internals" feature gate. # Related issues * Fixes rust-lang#83258. * PR rust-lang#78515 will exacerbate the problem. # Open questions * Add tests? * Squash into one commit with better commit message?
I tried this code:
I expected to see this happen: Program does not panic.
Instead, this happened: Program panics with the error
Windows stdio in console mode does not support writing non-UTF-8 byte sequences
.Meta
rustc --version --verbose
:Backtrace
Update
To clarify: This problem occurs with all
Write
trait implementations that don't override the defaultWrite::write_all()
implementation, e.g. with this simple DelegatingWrite:The text was updated successfully, but these errors were encountered: