Skip to content

Commit

Permalink
Add support for @ character in key names found in interpolations
Browse files Browse the repository at this point in the history
  • Loading branch information
odelalleau committed Dec 7, 2020
1 parent af3870d commit d959a13
Show file tree
Hide file tree
Showing 4 changed files with 8 additions and 2 deletions.
3 changes: 3 additions & 0 deletions omegaconf/grammar/OmegaConfGrammarLexer.g4
Original file line number Diff line number Diff line change
Expand Up @@ -76,5 +76,8 @@ INTER_CLOSE: '}' -> popMode;

DOT: '.';
INTER_ID: ID -> type(ID);
// A special type of `ID` that contains the "@" character. It has to be a different

This comment has been minimized.

Copy link
@omry

omry Dec 7, 2020

Owner

the comment seems in conflict with the test below, that uses @ in resolver name.
can this be simplified?
I don't think @ in keys should be different than other legal character in the interpolation key.

This comment has been minimized.

Copy link
@odelalleau

odelalleau Dec 7, 2020

Author Collaborator

Assuming you are talking about this test:

("at_in_resolver", "${y@z:}", GrammarParseError),

then the GrammarParseError indicates it should fail with this exception.

This is the simplest I can think of, other alternatives I can see:

  • Just add @ in alllowed characters in all IDs in the grammar, but add a check in the visitor or in register_resolver() to raise an exception if we try to use it in a resolver name (problem: the grammar would still accept it)
  • Make @ a special lexer token (let's call it AT) and move the logic of concatenating it with IDs to the parser:
configKey: interpolation | (ID | AT)+ | LIST_INDEX;

Problem: seems more complex to me.

  • Just allow @ in resolver names: that would be the simplest, but I don't think that's the plan

Can you think of another approach?

Minor: just a note that this is the "wrong" commit, I replaced it with 3af2f85 afterwards, the only line that's different is the definition of ID_WITH_AT and it behaves the same (it's just a simplification).

This comment has been minimized.

Copy link
@omry

omry Dec 7, 2020

Owner

then the GrammarParseError indicates it should fail with this exception.

Yes, missed that.

I think we can have a complete definition of both ID types.

INTER_ID: (CHAR|'_') (CHAR|DIGIT|'_')*;
ID: (CHAR|'_') (CHAR|DIGIT|'_'|AT)*;

Another thing that seems missing:
I am expecting forward slash as a legal character in interpolated keys (foo/bar@zoo should be valid).
At a glance it looks like this is no longer supported. There are also other supported characters that you seems to have dropped (see the regex where I made my initial fix).

This should work:

foo/bar@zoo: 10
bar: ${foo/bar@zoo} # also 10

This comment has been minimized.

Copy link
@odelalleau

odelalleau Dec 7, 2020

Author Collaborator

I think we can have a complete definition of both ID types.

INTER_ID: (CHAR|'_') (CHAR|DIGIT|'_')*;
ID: (CHAR|'_') (CHAR|DIGIT|'_'|AT)*;

This seems almost equivalent to what I have right now, except that:

  • Names are different
  • You are not allowing @ as the first character

The second point is easy to change. Regarding the names, remember that we also have an ID lexer token in VALUE_MODE, which is used to parse dictionary keys. And we can't re-use the same name (even if it's in a different mode), so we have to make a choice regarding which one gets to be named ID (the one with @ or the one without).

One possibility could be to use 3 different names that would be more explicit regarding their usage:

mode VALUE_MODE;

DICTKEY: (CHAR|'_') (CHAR|DIGIT|'_')*;

mode INTERPOLATION_MODE;

RESOLVER_OR_NODE: DICTKEY;
NODE: DICTKEY+ (DICTKEY | '@')*

Happy to go with this or whatever naming you like best (IMO though the one you suggested above isn't great, since it's not clear to me from its name how INTER_ID differs from ID: they are both IDs used in interpolations).

Another thing that seems missing:
I am expecting forward slash as a legal character in interpolated keys (foo/bar@zoo should be valid).
At a glance it looks like this is no longer supported. There are also other supported characters that you seems to have dropped (see the regex where I made my initial fix).

That's correct. My understanding of that regex was that it included characters that were required not only for node interpolations, but also resolver interpolations, and in particular that "fancy" characters like /, ?, * were not supposed to be part of key names, but rather found in unquoted strings as in this test:

supported_chars = "%_-abc123."
(which was missing a few characters btw, I fixed it in this PR). This interpretation was reinforced by the lack of any test for these characters in key names, and Hydra's override grammar (that only allows the ID type as keys).

So which characters exactly do we want to support in key names? All those from that regex, or a subset of them?

This comment has been minimized.

Copy link
@omry

omry Dec 8, 2020

Owner

This seems almost equivalent to what I have right now, except that:

  • Names are different

I prefer to call it by what it is, not what it look like.

  • You are not allowing @ as the first character

accidental omission.

The second point is easy to change. Regarding the names, remember that we also have an ID lexer token in VALUE_MODE, which is used to parse dictionary keys. And we can't re-use the same name (even if it's in a different mode), so we have to make a choice regarding which one gets to be named ID (the one with @ or the one without).

Yes, that's true.
There is a limitation in OmegaConf right now where dict keys can't be numbers, but the grammar can probably support more flexible keys (and allow the DictConfig to potentially fail there):
Will this work? playing with it for a moment now it seems to be working.

dictKeyValuePair: primitive COLON element;

I got the grammar to recognize:

a={
  [email protected]: string
  10.0: float
  1: int
}

We can change Hydra to line up with this it works.

Back to the topic if interpolation keys:
Basically, they should allow us to refer to specific nodes (foo.bar).
There is a special case in Hydra default list interpolation which requires nodes to have the keys like db/engine@backup.

At minimum, we should allow for those two characters, but I wonder if we should relax it to support anything except special tokens (like ., :, {, }, [, ] etc)

To clarify, there is no requirement to change which names are legal for custom resolvers. I think the current ID definition makes sense there.

This comment has been minimized.

Copy link
@odelalleau

odelalleau Dec 8, 2020

Author Collaborator

I prefer to call it by what it is, not what it look like.

Ok, so what do you think about my other proposition? (DICTKEY / RESOLVER_OR_NODE / NODE, which tried to do that exactly = give a name that says what it is). Otherwise what names do you want? (right now, based on your previous message, it would be ??? / INTER_ID / ID)

Edit: actually I don't like DICTKEY since this token in VALUE_MODE is also used in unquoted strings. So I'd rather stick with ID in VALUE_MODE.

There is a limitation in OmegaConf right now where dict keys can't be numbers, but the grammar can probably support more flexible keys (and allow the DictConfig to potentially fail there):
Will this work? playing with it for a moment now it seems to be working.

dictKeyValuePair: primitive COLON element;

Yes it can work, actually initially I had dictionary keys that could be anything but you preferred to stick to basic strings only. This has little to do with DictConfig though, since these dictionaries are arguments to resolvers (maybe they can end up being converted to DictConfig if a resolver returns them, but I've never tried that).

The main downside is that to get this to work I would need to bring back this weird code I wrote to be able to sort dictionary keys (required to make sure that the resolver cache works properly with dictionaries). For instance if you try to call sorted(a) on the example dictionary you wrote, you'd get a TypeError: '<' not supported between instances of 'float' and 'str'. It's doable though, just giving you a heads up that there's this extra added complexity. Also if we do it, any reason not to allow interpolations too as dictionary keys? (edit: scratch that, actually primitive contains interpolations)

At minimum, we should allow for those two characters, but I wonder if we should relax it to support anything except special tokens (like ., :, {, }, [, ] etc)

Ok, I'll come up shortly with a proposition to be more flexible here.

This comment has been minimized.

Copy link
@odelalleau

odelalleau Dec 8, 2020

Author Collaborator

Proposition in 200fef2 (it includes my current preference in terms of naming for lexer tokens). Some tests are failing / missing, once we agree on the grammar I'll replace this commit with another one with proper tests as well as code (like I said earlier, I need to put back some code to deal with different types as dictionary keys).

This comment has been minimized.

Copy link
@omry

omry Dec 8, 2020

Owner

Yes it can work, actually initially I had dictionary keys that could be anything but you preferred to stick to basic strings only. This has little to do with DictConfig though, since these dictionaries are arguments to resolvers (maybe they can end up being converted to DictConfig if a resolver returns them, but I've never tried that).

The first motivation to add dictionaries to the grammar came from the override grammar in Hydra (See doc about dicts)

"Dictionaries are merged, not assigned."

Looking at 200fef2 now.

This comment has been minimized.

Copy link
@odelalleau

odelalleau Dec 8, 2020

Author Collaborator

The first motivation to add dictionaries to the grammar came from the override grammar in Hydra (See doc about dicts)
"Dictionaries are merged, not assigned."

Ok but that's one area where Hydra and OmegaConf differ: in OmegaConf dictionaries parsed through the grammar are those provided to resolvers as arguments (not that it's a big deal, just making sure I understand what you're saying)

This comment has been minimized.

Copy link
@omry

omry Dec 8, 2020

Owner

You are correct, but it's certainly possible that OmegaConf will catch up on that functionality now that it has a way to parse complex strings in the command line :).

This comment has been minimized.

Copy link
@odelalleau

odelalleau Dec 9, 2020

Author Collaborator

initially I had dictionary keys that could be anything

I looked back on it as I was curious to see how I avoided the problem mentioned in 200fef2#r44939891

It turns out my memory was wrong, in my earlier version dictionary keys could be ID or interpolation. Now that I see it, I remember that my motivation for allowing interpolations was that even if keys were expected to only be IDs, we may want these IDs to be computed by an interpolation. And, as a side effect, it allowed using keys of different types if we really wanted to (since interpolations can return anything).

I'm just mentioning this to correct my previous statement. It also means there might be other unforeseen issues with allowing any primitive as key (since I didn't make it work before after all).

This comment has been minimized.

Copy link
@omry

omry Dec 10, 2020

Owner

There is an issue I just closed as a dup which wanted to add numbers as supported keys in DictConfig. this seems related to extending the dict in the grammar to support more types.
Let's try to make it work, if we hit a landmine we can limit the change to support the exact functionality needed by Hydra (@, /).

This comment has been minimized.

Copy link
@odelalleau

odelalleau Dec 10, 2020

Author Collaborator

Let's try to make it work, if we hit a landmine we can limit the change to support the exact functionality needed by Hydra (@, /).

Alright, just did a quick check and I think it will work by having two version of primitive, one with COLON and one without. It's a bit annoying because I have to copy/paste the definition: there is no equivalent of the lexer's fragments in the parser to avoid repetitions, and if I try something like primitive_with_colon: (primitive_without_colon | COLON)+ then I run again into "Attempting Full Context" warnings (which I'd rather avoid if possible).
Anyway, I'll put up a working version tomorrow so you can check it out and see whether or not you like it. A backup solution could be to only allow IDs and quoted strings (since anyway in DictConfig you probably want these keys to be strings, even if they are numbers).

This comment has been minimized.

Copy link
@omry

omry Dec 10, 2020

Owner

A backup solution could be to only allow IDs and quoted strings (since anyway in DictConfig you probably want these keys to be strings, even if they are numbers).

Actually not: once we have support for number keys I would like to be able to use numbers:

{1: one, 3.14: pi}

This comment has been minimized.

Copy link
@odelalleau

odelalleau Dec 10, 2020

Author Collaborator

Ok, done in 4bbc308

I actually found a way to make it work without copying the definition of primitive, but in the end it made things more complex and less flexible. So although this is a bit verbose, at least it's easy to understand, and if we ever want to change what we allow in dictionary keys this should be easy as well.

A few notes:

  • I wish I didn't need to make the changes I made in _utils.py but I couldn't find a better way. Let me know if you do.
  • I renamed listValue and dictValue to listContainer and dictContainer in the grammar because I introduced dictKey, and having both dictKey and dictValue would have created wrong expectations. The other option would have been to find another name than dictKey, but it sounded like the best name to me. Happy to change names though if you prefer otherwise.

Just tried it in Hydra, at least all existing tests pass so nothing should be badly broken. I'll put up a PR on Hydra once this change is approved.

This comment has been minimized.

Copy link
@omry

omry Dec 10, 2020

Owner

Great, I review this (and other PRs in OmegaConf) in a few days.

This comment has been minimized.

Copy link
@omry

omry Dec 10, 2020

Owner

Oh, and to be more clear:
What I meant by "trying this in Hydra" was to enable dict keys that can be any primitive there as well and see that nothing blows up.

This comment has been minimized.

Copy link
@odelalleau

odelalleau Dec 10, 2020

Author Collaborator

What I meant by "trying this in Hydra" was to enable dict keys that can be any primitive there as well and see that nothing blows up.

Yes that's exactly what I did (except that I didn't allow ":", applying to Hydra's override grammar the same kind of grammar changes that I implemented in 4bbc308 to make it work in OmegaConf)

All current tests passed in Hydra, but I still need to turn it into a proper PR with additional tests.

This comment has been minimized.

Copy link
@omry

omry Dec 13, 2020

Owner

Ok, sounds good.
Can you prioritize the Hydra diff?
It will be easier to see the changes to the grammar in that context (and will also allow me to play with it sooner).

This comment has been minimized.

Copy link
@odelalleau

odelalleau Dec 14, 2020

Author Collaborator

Can you prioritize the Hydra diff?

Ok, here it is: facebookresearch/hydra#1208

// lexer token because resolver names must not contain this character.
ID_WITH_AT: (CHAR|'_'|'@') (CHAR|DIGIT|'_'|'@')*;
LIST_INDEX: INT_UNSIGNED;
INTER_WS: WS -> skip;
2 changes: 1 addition & 1 deletion omegaconf/grammar/OmegaConfGrammarParser.g4
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ sequence: element (COMMA element)*;
interpolation: interpolationNode | interpolationResolver;
interpolationNode: INTER_OPEN DOT* configKey (DOT configKey)* INTER_CLOSE;
interpolationResolver: INTER_OPEN (interpolation | ID) COLON sequence? BRACE_CLOSE;
configKey: interpolation | ID | LIST_INDEX;
configKey: interpolation | ID | ID_WITH_AT | LIST_INDEX;

// Primitive types.

Expand Down
2 changes: 1 addition & 1 deletion omegaconf/grammar_visitor.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ def defaultResult(self) -> List[Any]:
def visitConfigKey(self, ctx: OmegaConfGrammarParser.ConfigKeyContext) -> str:
from ._utils import _get_value

# interpolation | ID | LIST_INDEX
# interpolation | ID | ID_WITH_AT | LIST_INDEX
assert ctx.getChildCount() == 1
child = ctx.getChild(0)
if isinstance(child, OmegaConfGrammarParser.InterpolationContext):
Expand Down
3 changes: 3 additions & 0 deletions tests/test_interpolation.py
Original file line number Diff line number Diff line change
Expand Up @@ -655,6 +655,7 @@ def _maybe_create(definition: str) -> Any:
("None", {"True": 1}, ...), # used to test keys with null-like names
("0", 42, ...), # used to test keys with int names
("1", {"2": 1337}, ...), # used to test dot-path with int keys
("x@y", 123, ...),
# Special keywords.
("null", "${test:null}", None),
("true", "${test:TrUe}", True),
Expand Down Expand Up @@ -706,6 +707,7 @@ def _maybe_create(definition: str) -> Any:
("null_like_key_quoted_1", "${'None'.'True'}", GrammarParseError),
("null_like_key_quoted_2", "${'None.True'}", GrammarParseError),
("dotpath_bad_type", "${prim_dict.${float}}", GrammarParseError),
("at_in_key", "${x@y}", 123),
# Resolver interpolations.
("no_args", "${test:}", []),
("space_in_args", "${test:a, b c}", ["a", "b c"]),
Expand All @@ -716,6 +718,7 @@ def _maybe_create(definition: str) -> Any:
("dict_list_as_key", "${test:{[0]: 1}}", GrammarParseError),
("missing_resolver", "${MiSsInG_ReSoLvEr:0}", UnsupportedInterpolationType),
("non_str_resolver", "${${bool}:}", GrammarParseError),
("at_in_resolver", "${y@z:}", GrammarParseError),
# Env resolver (limited: more tests in `test_env_values_are_typed()`).
("env_int", "${env:OMEGACONF_TEST_ENV_INT}", 123),
("env_missing_str", "${env:OMEGACONF_TEST_MISSING,miss}", "miss"),
Expand Down

0 comments on commit d959a13

Please sign in to comment.