Closed
Description
Currently, the Debug of Chars
prints the underlying bytes, rather than the chars:
#![allow(unused)]
fn main() {
let s = String::from(" é 😀 ");
let c = s.chars();
dbg!("Debug of Chars: ", &c);
dbg!("Debug of each char: ");
for x in c {
dbg!(x);
}
}
Returns:
[src/main.rs:5] "Debug of Chars: " = "Debug of Chars: "
[src/main.rs:5] &c = Chars {
iter: Iter(
[
32,
195,
169,
32,
240,
159,
152,
128,
32,
],
),
}
[src/main.rs:6] "Debug of each char: " = "Debug of each char: "
[src/main.rs:8] x = ' '
[src/main.rs:8] x = 'é'
[src/main.rs:8] x = ' '
[src/main.rs:8] x = '😀'
[src/main.rs:8] x = ' '
As I was trying to work out what chars
was (whether it was unicode points or bytes or something else), the first output was v confusing - is there a reason we don't print something like the second case?
Would you take a PR to change this?
I couldn't find any previous discussion on this - #49283 was the closest I could find.
Activity
ExpHP commentedon Jul 25, 2019
The first one is UTF-8 bytes. You see this because the
Debug
impl forChars
is auto-generated:Strictly speaking, since the bytes inside of
Chars
should always be valid UTF-8, this could have a customDebug
impl that makes it pretend to contain a string by formatting the member asstr::from_utf8(self.iter.as_slice()).unwrap()
.The reason it doesn't display as individual chars is because it doesn't have individual
chars
; determining their boundaries is the entire point of theChars
iterator. I suppose this same argument could be used against the call tostr::from_utf8
, which needs to scan the whole string to validate it.(but then the solution seems to be to use
str::from_utf8_unchecked
, which seems awfully heavy-handed for aDebug
impl. And perhaps it doesn't even matter, because the cost of mostio::Write
impls probably outweighs the cost of this validation)I guess that, questionable concerns of efficiency aside, my main concern is simply that showing a list of individual chars would be... dishonest, I guess.
max-sixty commentedon Jul 25, 2019
Do
str
&String
contain more data thanChars
? I had thought they both contained the underlying bytes and then these were decoded as needed - including as part of theDisplay
&Debug
implementationsExpHP commentedon Jul 25, 2019
No, they contain the same data. They're all just UTF-8 bytes. And the
Display
implementation ofstr
simply writes the bytes contained in thestr
directly to theio::Write
instance.Suppose we are writing to STDOUT. On UNIX, the
io::Write for Stdout
impl writes these bytes directly to the underlying file descriptor with no processing:rust/src/libstd/sys/unix/stdio.rs
Lines 27 to 30 in eedf6ce
rust/src/libstd/sys/unix/fd.rs
Lines 109 to 116 in eedf6ce
I would imagine this is because the console on any UNIX platform almost certainly uses UTF-8.1 It is your terminal application that is then responsible for decoding these bytes and producing glyphs. Considering that UTF-8 dominates much of the web space as well, it's quite possible that even on the playground, these bytes are ultimately sent over the wire to your PC with minimal processing, where your browser is responsible for decoding and displaying them.
On Windows,
io::Write for Stdout
transcodes the UTF-8 into the UTF-16 format expected by the windows APIs:rust/src/libstd/sys/windows/stdio.rs
Lines 76 to 84 in eedf6ce
and then Windows does whatever it does with those UTF-16 code units. (Quite likely, it hands them directly to the console, which is then responsible for decoding and displaying them)
Footnotes
(I think in actuality UNIX accepts arbitrary strings of bytes, and then the portions of these strings which are valid UTF-8 are rendered appropriately by the console. I don't know; doesn't really matter) ↩
max-sixty commentedon Jul 25, 2019
OK, so given that - is there still an objection to displaying the unicode characters for
Chars
but notString
?Rollup merge of rust-lang#63000 - max-sixty:chars-display, r=alexcric…
Rollup merge of rust-lang#63000 - max-sixty:chars-display, r=alexcric…
Rollup merge of rust-lang#63000 - max-sixty:chars-display, r=alexcric…
Rollup merge of rust-lang#63000 - max-sixty:chars-display, r=alexcric…