Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing description of IsStructurallyValidLanguageTag() operation #425

Closed
iamstolis opened this issue Apr 5, 2020 · 7 comments
Closed
Labels
c: locale Component: locale identifiers editorial Involves an editorial fix s: in progress Status: the issue has an active proposal
Milestone

Comments

@iamstolis
Copy link

IsStructurallyValidLanguageTag() operation states that it

verifies that the locale argument represents a well-formed "Unicode BCP 47 locale identifier" as specified in Unicode Technical Standard 35 section 3.2,

It also says that

The abstract operation returns true if locale can be generated from the EBNF grammar in section 3.2 of the Unicode Technical Standard 35, starting with unicode_locale_id, and does not contain duplicate variant or singleton subtags (other than as a private use subtag). It returns false otherwise.

These requirements are inconsistent. The mentioned grammar does not describe "Unicode BCP 47 locale identifier". It describes "Unicode CLDR locale identifier", i.e. it describes identifiers that support some backward compatibility syntax (root subtag, underscores as separator, tags starting with script subtag) that is not allowed in "Unicode BCP 47 locale identifier", see Unicode Technical Standard 35 section 3.3. Please, fix/improve the description of this operation such that it is clear what this operation is supposed to do.

@sffc sffc added c: locale Component: locale identifiers s: discuss Status: TG2 must discuss to move forward labels Apr 6, 2020
@sffc
Copy link
Contributor

sffc commented Apr 6, 2020

@anba @jswalden @FrankYFTang

@zbraniecki
Copy link
Member

It should refer to Unicode BCP 47 locale identifier - not the CLDR one.

@littledan
Copy link
Member

I don't really see the inconsistency. The grammar describes Unicode locale identifiers in general, and Unicode BCP 47 locale identifiers are a subset of this, as described in section 3.3. But the grammar is in section 3.2; there's no other grammar we could refer to. Should we link directly to section 3.3 in addition for clarity?

ryzokuken added a commit to ryzokuken/ecma402 that referenced this issue May 2, 2020
Add a note for IsStructurallyValidLanguageTag, clarifying the exact
variant of locale indentifer that's expected and point to a resource
that clarifies the differences between the two.

Fixes: tc39#425
@iamstolis
Copy link
Author

I don't really see the inconsistency.

I find it hard not to see the inconsistency when one sentence says that the operation verifies that the argument

represents a well-formed "Unicode BCP 47 locale identifier"

and the following sentence says the equivalent of: The operation returns true if the argument is "Unicode CLDR locale identifier" (and returns false otherwise).

The paragraph speaking about returning true/false based on unicode_locale_id grammar is incorrect and should be removed or reworded. Note that the main motivation behind this issue is the fact that we had this operation implemented incorrectly (based on this incorrect paragraph, i.e., following the grammar only => accepting CLDR not BCP 47 version) until I was pointed by @anba what is the intended meaning of IsStructurallyValidLanguage().

But the grammar is in section 3.2; there's no other grammar we could refer to. Should we link directly to section 3.3 in addition for clarity?

I see it in the opposite way: the term "Unicode BCP 47 locale identifier" is defined in section 3.3 (and not mentioned at all in section 3.2). So, section 3.3 is the more important section to point to. Of course, the grammar in section 3.2 is crucial for the understanding of the definition. Hence, linking section 3.2 in addition (for clarity) is a very good idea.

@jswalden
Copy link
Collaborator

jswalden commented May 3, 2020

FWIW #429 would probably also resolve this by incidentally enumerating exact validity criteria and linking to both sections 3.2 and 3.3.

@ryzokuken
Copy link
Member

@iamstolis @jswalden I added a note in #431 that should ideally clear up the confusion. Could you please take a quick look at that PR?

@sffc sffc added this to the ES 2021 milestone Jun 5, 2020
@sffc sffc added editorial Involves an editorial fix s: in progress Status: the issue has an active proposal and removed s: discuss Status: TG2 must discuss to move forward labels Jun 5, 2020
@sffc sffc modified the milestones: ES 2021, ES 2022 Mar 22, 2021
@sffc sffc modified the milestones: ES 2022, ES 2023 Jun 1, 2022
@ben-allen
Copy link
Contributor

In the time between this issue being opened and now, the IsStructurallyValidLanguageTag AO has been largely re-written. The new version does not appear to have this inconsistency. see dff9842.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: locale Component: locale identifiers editorial Involves an editorial fix s: in progress Status: the issue has an active proposal
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants