BUG: TokenizerGroup doesn't behave like a PreTrainedTokenizer #2713

saattrupdan · 2024-02-01T13:49:33Z

The new TokenizerGroup class that LLM.llm_engine.tokenizer has now become, doesn't behave like a tokenizer.

For instance, the LLM.set_tokenizer method sets the tokenizer attribute as a PreTrainedTokenizer, not a TokenizerGroup.

Also, tools like lm-format-enforcer assumes that the tokenizer attribute is indeed a tokenizer, causing it to now give AttributeErrors.

It seems like either the LLM.llm_engine.tokenizer should revert back to being a PreTrainedTokenizer, or otherwise at least have properties and methods which call the corresponding properties and methods of the underlying PreTrainedTokenizer.

The text was updated successfully, but these errors were encountered:

zhuohan123 · 2024-02-04T22:43:58Z

Should be fixed by #2741.

zhuohan123 closed this as completed Feb 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: TokenizerGroup doesn't behave like a PreTrainedTokenizer #2713

BUG: TokenizerGroup doesn't behave like a PreTrainedTokenizer #2713

saattrupdan commented Feb 1, 2024 •

edited

Loading

zhuohan123 commented Feb 4, 2024

BUG: TokenizerGroup doesn't behave like a PreTrainedTokenizer #2713

BUG: TokenizerGroup doesn't behave like a PreTrainedTokenizer #2713

Comments

saattrupdan commented Feb 1, 2024 • edited Loading

zhuohan123 commented Feb 4, 2024

saattrupdan commented Feb 1, 2024 •

edited

Loading