Skip to content

io::stdin().read_to_end() drops a byte on certain Unicode input (Windows only) #142847

@QuaeroEtTego

Description

@QuaeroEtTego

Description

The method io::stdin().read_to_end() appears to drop a byte when reading certain Unicode input on Windows. The issue occurs when using Vec::new(), and affects only certain input strings.

use std::io::{self, Read, Write};
use std::str;

fn main() -> io::Result<()> {
    let mut stdout = io::stdout();

    write!(stdout, "Enter content : ")?;
    stdout.flush()?;

    // let mut buffer = Vec::with_capacity(1024);  // Reads correctly
    let mut buffer = Vec::new();  // Fails to read correctly
    
    // Paste the provided input and press Ctrl+Z (or Ctrl+D on Linux)
    io::stdin().read_to_end(&mut buffer)?;

    println!("\nBytes read : {:?}", buffer);

    match str::from_utf8(&buffer) {
        Ok(s) => println!("UTF-8 - OK: {}", s),
        Err(e) => println!("UTF-8 - ERROR: {}", e),
    }

    Ok(())
}

With the input:

8216]:есть какое нибудь бюджетное

The resulting read contains the sequence of bytes:

... 208, 208, 181 ... 0
Bytes read : [56, 50, 49, 54, 93, 58, 208, 181, 209, 129, 209, 130, 209, 140, 32, 208, 186, 208, 176, 208, 186, 208, 190, 208, 181, 32, 208, 189, 208, 184, 208, 177, 209, 131, 208, 180, 209, 140, 32, 208, 177, 209, 142, 208, 180, 208, 208, 181, 209, 130, 208, 189, 208, 190, 208, 181, 0]

instead of the expected:

... 208, 182, 208, 181 ...
Bytes read : [56, 50, 49, 54, 93, 58, 208, 181, 209, 129, 209, 130, 209, 140, 32, 208, 186, 208, 176, 208, 186, 208, 190, 208, 181, 32, 208, 189, 208, 184, 208, 177, 209, 131, 208, 180, 209, 140, 32, 208, 177, 209, 142, 208, 180, 208, 182, 208, 181, 209, 130, 208, 189, 208, 190, 208, 181, 10]

As correctly returned by io::stdin().read_line() or io::stdin().read_to_end() on Linux.

Symptoms

  • The issue occurs only with specific Unicode inputs (such as the example above).
  • Removing a character (the first one for example) from the input causes the issue to disappear.
  • When the character loss occurs, read_to_end() appends a null byte at the end of the buffer.
  • The issue only occurs when using:
let mut buffer = Vec::new();
std::io::stdin().read_to_end(&mut buffer)?;
  • The issueis NOT observed when using:
let mut buffer = Vec::with_capacity(1024);
std::io::stdin().read_to_end(&mut buffer)?;

Notes

  • The behavior only appears when using read_to_end() with Vec::new().
  • The behavior is not present when using a preallocated buffer (Vec::with_capacity(1024)), nor when using read_line().

Meta

rustc --version --verbose:

rustc 1.87.0 (17067e9ac 2025-05-09)
binary: rustc
commit-hash: 17067e9ac6d7ecb70e50f92c1944e545188d2359
commit-date: 2025-05-09
host: x86_64-pc-windows-msvc
release: 1.87.0
LLVM version: 20.1.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-UnicodeArea: UnicodeA-ioArea: `std::io`, `std::fs`, `std::net` and `std::path`C-bugCategory: This is a bug.O-windowsOperating system: WindowsT-libsRelevant to the library team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions