-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Use escape_default() for strings in LitKind::token(). #50391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/libsyntax/attr.rs
Outdated
@@ -1230,7 +1230,7 @@ impl LitKind { | |||
LitKind::Str(string, ast::StrStyle::Cooked) => { | |||
let mut escaped = String::new(); | |||
for ch in string.as_str().chars() { | |||
escaped.extend(ch.escape_unicode()); | |||
escaped.extend(ch.escape_default()); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use str::escape_default
nowadays.
let escaped = string.escape_default();
(There is also a fn escape_default(s: &str) -> String
in src/libsyntax/parse.rs
but that should probably be removed in favor of the std one.)
However, I think only \r
, \
, and "
really needs to be escaped. All other characters can legally appear inside a string. This may provide even more speed up since we don't need the UTF-8 encode/decode step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this function is not the hot one. str_lit(), which parses the string produced here, is the hot function. So str::escape_default()
might be more concise, but it's unlikely to make things faster.
Also, in practice these strings are all rustdoc comments, and escaped chars are rare. So it also doesn't matter much exactly which characters we escape.
r=me with @kennytm's suggestion |
e8cf316
to
bb18f41
Compare
@eddyb: I switched to |
src/libsyntax/attr.rs
Outdated
for ch in string.as_str().chars() { | ||
escaped.extend(ch.escape_unicode()); | ||
} | ||
let mut escaped = string.as_str().escape_default(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mut
is no longer needed.
This avoids converting every char to \u{...} form, which bloats the resulting strings unnecessarily. It also provides consistency with the existing escape_default() calls in LitKind::token() used for raw string literals, char literals, and raw byte char literals. There are two benefits from this change. - Compilation is faster. Most of the rustc-perf benchmarks see a non-trivial speedup, particularly for incremental rebuilds, with the best speedup over 13%, and multiple others over 10%. - Generated rlibs are smaller. An extreme example is libfutures.rlib, which shrinks from 2073306 bytes to 1765927 bytes, a 15% reduction.
str::escape_default() can be used instead.
bb18f41
to
7a56360
Compare
I removed the unnecessary |
@bors r=eddyb |
📌 Commit 7a56360 has been approved by |
Use escape_default() for strings in LitKind::token(). This avoids converting every char to \u{...} form, which bloats the resulting strings unnecessarily. It also provides consistency with the existing escape_default() calls in LitKind::token() used for raw string literals, char literals, and raw byte char literals. There are two benefits from this change. - Compilation is faster. Most of the rustc-perf benchmarks see a non-trivial speedup, particularly for incremental rebuilds, with the best speedup over 13%, and multiple others over 10%. - Generated rlibs are smaller. An extreme example is libfutures.rlib, which shrinks from 2073306 bytes to 1765927 bytes, a 15% reduction. r? @jseyfried <details><summary>Here are full numbers for all the rustc-perf runs where the improvement was > 1%.</summary> ``` regex-check avg: -11.1% min: -13.4% max: -5.5% futures-check avg: -7.6% min: -11.4% max: -3.5% futures-opt avg: -6.3% min: -10.3% max: -2.3% futures avg: -6.6% min: -10.3% max: -2.8% regex-opt avg: -4.7% min: -10.2% max: -0.4% regex avg: -5.3% min: -10.2% max: -1.2% hyper-check avg: -4.8% min: -6.6% max: -2.7% encoding-check avg: -4.1% min: -5.5% max: -2.5% issue-46449-check avg: -4.7% min: -5.2% max: -4.1% clap-rs-check avg: -2.9% min: -5.2% max: -1.1% hyper avg: -3.0% min: -5.1% max: -0.8% parser-check avg: -4.2% min: -4.9% max: -3.2% hyper-opt avg: -2.6% min: -4.9% max: -0.3% encoding-opt avg: -2.3% min: -4.6% max: -0.5% encoding avg: -2.5% min: -4.4% max: -0.6% issue-46449 avg: -2.3% min: -4.4% max: -1.8% issue-46449-opt avg: -1.7% min: -4.3% max: -0.9% clap-rs-opt avg: -1.6% min: -4.2% max: -0.2% serde-check avg: -1.4% min: -4.1% max: -0.2% clap-rs avg: -1.6% min: -3.9% max: -0.7% unify-linearly-check avg: -3.2% min: -3.7% max: -2.7% serde avg: -1.1% min: -3.5% max: -0.1% regression-31157-check avg: -2.6% min: -3.4% max: -1.6% helloworld-check avg: -2.5% min: -3.4% max: -0.6% serde-opt avg: -1.3% min: -3.3% max: -0.5% tokio-webpush-simple-check avg: -2.4% min: -3.2% max: -1.8% piston-image-check avg: -1.7% min: -3.2% max: -0.9% deeply-nested-opt avg: -1.5% min: -3.0% max: -0.6% deeply-nested-check avg: -1.9% min: -2.9% max: -0.4% deeply-nested avg: -1.9% min: -2.9% max: -1.2% syn-check avg: -1.8% min: -2.8% max: -0.6% coercions avg: -0.5% min: -2.8% max: 0.4% syn-opt avg: -0.9% min: -2.4% max: -0.1% syn avg: -1.1% min: -2.2% max: -0.3% parser-opt avg: -1.9% min: -2.1% max: -1.6% parser avg: -1.9% min: -2.1% max: -1.6% style-servo-check avg: -1.3% min: -2.0% max: -0.8% regression-31157-opt avg: -0.8% min: -2.0% max: 0.0% piston-image avg: -0.7% min: -1.8% max: -0.2% piston-image-opt avg: -0.6% min: -1.8% max: -0.0% regression-31157 avg: -1.0% min: -1.7% max: -0.3% html5ever-opt avg: -0.6% min: -1.5% max: -0.1% unify-linearly-opt avg: -1.3% min: -1.5% max: -1.1% unify-linearly avg: -1.3% min: -1.4% max: -1.2% tokio-webpush-simple-opt avg: -0.4% min: -1.2% max: -0.0% helloworld-opt avg: -1.0% min: -1.1% max: -0.6% helloworld avg: -1.0% min: -1.1% max: -0.7% inflate-opt avg: -0.3% min: -1.1% max: 0.1% html5ever-check avg: -0.6% min: -1.0% max: -0.3% inflate-check avg: -0.3% min: -1.0% max: -0.1% ``` </details>
☀️ Test successful - status-appveyor, status-travis |
This avoids converting every char to \u{...} form, which bloats the
resulting strings unnecessarily. It also provides consistency with the
existing escape_default() calls in LitKind::token() used for raw
string literals, char literals, and raw byte char literals.
There are two benefits from this change.
Compilation is faster. Most of the rustc-perf benchmarks see a
non-trivial speedup, particularly for incremental rebuilds, with the
best speedup over 13%, and multiple others over 10%.
Generated rlibs are smaller. An extreme example is libfutures.rlib,
which shrinks from 2073306 bytes to 1765927 bytes, a 15% reduction.
r? @jseyfried
Here are full numbers for all the rustc-perf runs where the improvement was > 1%.