-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various Unicode BCP 47 locale identifiers issues #330
Comments
@anba As in a big picture, all the issues you mentioned seems reasonable to me. I suggest you create a PR based on what you stated above and we can review the wording of the changes together. |
While I can provide changes for most of other parts, these are questions we should evolve within a discussion next Thursday. I don't have any immediate answer for these, at least. I'll have a PR with the other parts as I already did with the https parts (see #331) |
cc @FrankYFTang @zbraniecki to follow up and reference the public spec once it's published |
@FrankYFTang Have you followed up with Mark about this in the CLDR spec? |
The duplicated-variants restriction and whether it ought apply to Either duplicates should be allowed in both productions (but canonicalization should remove all but one of each duplicate variant), or they should be allowed in neither. I don't remember why If the reason for the restriction is sensible and good, I think we ought apply it everywhere. But if it is questionable in any way, being slightly more liberal about allowing harmlessly-duplicate variants (but removing the duplication during canonicalizing) seems like the right approach. |
The duplicate variant restriction may come from BCP 47, §2.2.5, item 5:
|
Hmm, okay. That seems pretty clear and direct about invalidity. I can't think of a serious case for not applying that to |
BCP 47, § 2.2.9 is probably a better reference point, because it also contains the other restrictions present in 6.2.2 IsStructurallyValidLanguageTag:
and BCP 47, § 2.2.9:
(The second bullet point isn't present in ECMA-402, because it'd require shipping an up-to-date language tag registry.) |
We discussed this today and concluded figuring out the duplicate-variant concern does not have to be immediately resolved, and if an ECMA-402 published edition ends up lagging the "living standard" spec, that's okay. I'll look into creating a PR to additionally forbid duplicate variants in |
@ben-allen to evaluate which, if any, of the items in the OP still need to be addressed. |
All of the above appear to be resolved by the following commits:
and
I believe this one should be closed. |
Closed because all but one bullet point has been addressed in PRs from 2019 and 2021. The remaining bullet point, on
|
6.2 Language Tags
6.2.1 Unicode Locale Extension Sequences
unicode_locale_extensions
from https://unicode.org/reports/tr35/#Unicode_locale_identifier.6.2.2 IsStructurallyValidLanguageTag
IsStructurallyValidLanguageTag
will need to refer to the EBNF grammar.Ref: http://unicode.org/repos/cldr/trunk/specs/ldml/tr35.html#Unicode_language_identifier
tlang
. For example is "en-t-en-emodeng-emodeng" valid or not?6.2.3 CanonicalizeLanguageTag
tlang
extension? For example should "en-t-en-us" be case regularised to "en-t-en-US"?The text was updated successfully, but these errors were encountered: