Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error writing valid UTF-8 to a terminal on Windows 10 with ColorChoice::Never #40

Closed
Count-Count opened this issue Mar 17, 2021 · 6 comments
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@Count-Count
Copy link

Writing certain valid UTF-8 text to a Windows 10 terminal with ColorChoice::Never fails with the error "Windows stdio in console mode does not support writing non-UTF-8 byte sequences". Superficial debugging let me to assume that this is somehow related to replacement character substitution in write_lossy_utf8(). I could reproduce this on two different Windows 10 PCs.

Unfortunately I could not shrink the test file further because there is line buffering layer in between which interferes with easy testcase minimization.

Testcode:

use std::io::Write;
use std::str::from_utf8;

use termcolor::{ColorChoice, StandardStream};

fn main() {
    let test_str = include_str!("test.txt").to_owned();
    from_utf8(&test_str.as_bytes()).unwrap(); // unnecessarily confirm that this is indeed UTF-8
    let mut stdout = StandardStream::stdout(ColorChoice::Never);

    // following line panics with "Windows stdio in console mode does not support writing non-UTF-8 byte sequences"
    write!(&mut stdout, "{}", test_str).unwrap();
}
@BurntSushi
Copy link
Owner

Thanks for the bug report! Unfortunately, it's unlikely I'll be able to debug this any time soon, so I'd love it if someone else could.

Looking at the code, I'm not sure I see how this could go wrong. Writing the replacement codepoint is itself valid UTF-8, so there should be no problems there. And if you're writing valid UTF-8 in the first place, then the replacement codepoint wouldn't be written at all.

So at least just from inspection, I don't see any obvious causes. I'm either wrong, or the problem is lurking elsewhere.

@BurntSushi BurntSushi added help wanted Extra attention is needed question Further information is requested labels Mar 17, 2021
@Count-Count
Copy link
Author

Count-Count commented Mar 17, 2021

The problem is that this write() call returns in the middle of a multi-byte codepoint for whatever reason. That already causes a replacement character to be inserted incorrectly on the next call. Also the Ok(_) value of w.write(b"\xEF\xBF\xBD")?; is not checked so it is possible that only part of those three bytes are written and still Ok(1) is returned...

I confirmed that the replacement char insertion happens in the testcase with valid UTF-8 stepping through the code with a debugger.

I can dig deeper tomorrow, kind of getting late here.

@BurntSushi
Copy link
Owner

I see, thanks. I'm not sure what to do about a write call that terminates in the middle of a codepoint.

It looks like you're write about Ok(1) being wrong though.

Thanks for digging. If there's a simple patch, I could probably get that merged and released quickly.

@Count-Count
Copy link
Author

Count-Count commented Mar 18, 2021

I found out why the stdio line buffering implementation returns only a partial write which ends in the middle of a UTF-8 codepoint in this testcase. Issue with simpler test string is here: Rust issue 83258.

If it is decided that this is a bug in the std lib and that 1) only complete UTF-8 may be passed to individual write() calls if a Windows terminal is attached and 2) partial writes must always end on codepoint boundaries then no changes for termcolor are needed.

In the vain of "Be liberal in what you accept" that might not be such a good idea though. The alternative would be stateful handling of incomplete codepoints.

@BurntSushi
Copy link
Owner

Ah yeah if this is a bug in std, then it should be fixed there, if possible. So I'll close this in favor of rust-lang/rust#83258.

@BurntSushi
Copy link
Owner

@Count-Count Thanks for digging into this!

Count-Count added a commit to Count-Count/termcolor that referenced this issue Mar 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants