Skip to content

Underscore suffixes allowed in literals #41723

Closed
@qnighy

Description

@qnighy
Contributor

I found these pathlogical examples pass the lexer and the parser:

fn main() {
    println!("{}", "Foo"_);
    println!("{}", 10._);
}

The reason is that

  1. scan_optional_raw_name eats _, but reports no suffix.
  2. scan_number distinguishes between decimal points and field/method syntax by is_xid_start, which in fact excludes _.

According to the reference, it seems these examples shouldn't be allowed.

Tested on stable (rustc 1.17.0 (56124baa9 2017-04-24)) and nightly (rustc 1.19.0-nightly (6a5fc9eec 2017-05-02)) on the playground.

Activity

qnighy

qnighy commented on May 3, 2017

@qnighy
ContributorAuthor

According to the reference, decimal points are optionally followed by another "decimal literal". Since _ isn't a decimal literal, it shouldn't be allowed.

Here the parser is recognizing _ in 10._ as a suffix rather than a decimal literal followed by the point. If not, 10._f64 would be lexed as 10._+f64, but actually the parser complains _f64 is not a valid suffix.

In addition, the reference says f32 and f64 are the only valid suffixes. So _ shouldn't be allowed as a suffix either.

added
A-grammarArea: The grammar of Rust
A-parserArea: The lexing & parsing of Rust source code to an AST
T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.
on May 3, 2017
nikomatsakis

nikomatsakis commented on May 11, 2017

@nikomatsakis
Contributor

It seems to me that _ should be considered part of the number. So I would expect 10._f64 to work, indeed, just as 10.2_f64 or 2_f64 works.

@arielb1 points out that "a"_ is also accepted, which seems like a bug.

Basically, I do not expect a suffix to begin with _, I think, but I do expect _ to be allowed in numeric literals anywhere (including the end).

removed
T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.
on May 11, 2017
nikomatsakis

nikomatsakis commented on May 11, 2017

@nikomatsakis
Contributor

Shifting nomination to T-lang for now.

joshtriplett

joshtriplett commented on May 11, 2017

@joshtriplett
Member

The language team discussed this in today's meeting. We also considered cases like 42.method_on_numbers(), or 42._method_on_numbers(), both of which ought to be parsed as methods. The conclusion we came to was that it's ambiguous (for our mental lexers if nothing else) to allow a trailing underscore on a floating-point literal immediately following a . with no subsequent digits. Parsing _[a-z] in a way that puts the _ and the [a-z] in separate tokens feels incredibly confusing. It's not unreasonable to have to write 42.0f64 or 42.0_f64 instead if you want a suffix.

(By contrast, 42_f64 or 42_u64 seems fine, and unambiguous, though whether you should write that might be a good question for the style team.)

So, the recommendation from the language team would be to always parse 42._ident as a field or method, and never include the _ as part of the floating-point literal.

One question that didn't come up in the language team meeting: does anything rely on the current (apparently inconsistent) lexing behavior here? Would fixing this ambiguity break anything?

qnighy

qnighy commented on May 12, 2017

@qnighy
ContributorAuthor

I'll try to write a patch for this issue.

nikomatsakis

nikomatsakis commented on May 12, 2017

@nikomatsakis
Contributor

@joshtriplett

One question that didn't come up in the language team meeting: does anything rely on the current (apparently inconsistent) lexing behavior here? Would fixing this ambiguity break anything?

I would think not, at least for numbers, given that it results in an error. Not sure about ""_, which... apparently parses but does not error? (When do we ever support suffixes on strings, besides #?)

added a commit that references this issue on May 13, 2017

Rollup merge of rust-lang#41946 - qnighy:disallow-dot-underscore-in-f…

added 2 commits that reference this issue on May 13, 2017

Rollup merge of rust-lang#41946 - qnighy:disallow-dot-underscore-in-f…

Rollup merge of rust-lang#41946 - qnighy:disallow-dot-underscore-in-f…

qnighy

qnighy commented on May 15, 2017

@qnighy
ContributorAuthor

@joshtriplett

So, the recommendation from the language team would be to always parse 42._ident as a field or method, and never include the _ as part of the floating-point literal.

Do you mean by "never include" also forbidding literals like 4_000.000_000 ?

joshtriplett

joshtriplett commented on May 16, 2017

@joshtriplett
Member

@qnighy No, that's fine. I meant specifically that 42._ident or similar should never lex as a floating-point number 42._ with suffix ident; an underscore immediately after the dot should not lex as part of the literal.

qnighy

qnighy commented on May 16, 2017

@qnighy
ContributorAuthor

@joshtriplett I see. Then #41946 and #41990 will close this issue.

added a commit that references this issue on Jun 6, 2017

Auto merge of #41990 - qnighy:disallow-underscore-suffix-for-string-l…

qnighy

qnighy commented on Jun 6, 2017

@qnighy
ContributorAuthor

Closing since #41946 and #41990 are merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-grammarArea: The grammar of RustA-parserArea: The lexing & parsing of Rust source code to an ASTT-langRelevant to the language team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @qnighy@nikomatsakis@joshtriplett@nagisa@arielb1

        Issue actions

          Underscore suffixes allowed in literals · Issue #41723 · rust-lang/rust