ERROR occurs when running "tokenizer._tokenizer.model.clear_cache()" #1738

nixonjin · 2025-02-21T06:52:20Z

I met an OOM problem when using BertTokenizer as #1539 reports.

Then I use tokenizer._tokenizer.model.clear_cache() or tokenizer._tokenizer.model._clear_cache() to clear cache.

However, I met an error: AttributeError: 'tokenizers.models.WordPiece' object has no attribute 'clear_cache', could anyone tell me how to fix it?

In the source code, It seems like clear_cache only supports BPE and Unigram tokenizer, not wordpiece tokenizer, is it the reason? if it is, could anyone give me some advice to fix this problem?

environment:
run on linux with only cpu
tokenizers==0.21.0
transformers==4.49.0

MeetThePatel · 2025-03-10T17:00:31Z

Would you be able to provide more context regarding the block of code that is OOMing?

For BPE:

During training: you are just doing merging in a deterministic fashion. source
During tokenization: you are applying your learned merge rules, which can be saved to a cache for tokens you have already "built". source

For WordPiece:

During training: you are doing BPE, which doesn't use cache. source
During tokenization: you don't need a cache, as you are just doing matching in a greedy fashion by search for largest substring that is in the vocab. source

Which leads me to believe that either:

There is a problem with the surrounding code.
The vocab you are trying to load is too large for your machine. This seems less likely, since BERTTokenizer has 30k only.

nixonjin changed the title ~~ERROR occur where running "tokenizer._tokenizer.model.clear_cache()"~~ ERROR occurs where running "tokenizer._tokenizer.model.clear_cache()" Feb 21, 2025

nixonjin changed the title ~~ERROR occurs where running "tokenizer._tokenizer.model.clear_cache()"~~ ERROR occurs when running "tokenizer._tokenizer.model.clear_cache()" Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERROR occurs when running "tokenizer._tokenizer.model.clear_cache()" #1738

ERROR occurs when running "tokenizer._tokenizer.model.clear_cache()" #1738

nixonjin commented Feb 21, 2025 •

edited

Loading

MeetThePatel commented Mar 10, 2025

ERROR occurs when running "tokenizer._tokenizer.model.clear_cache()" #1738

ERROR occurs when running "tokenizer._tokenizer.model.clear_cache()" #1738

Comments

nixonjin commented Feb 21, 2025 • edited Loading

MeetThePatel commented Mar 10, 2025

nixonjin commented Feb 21, 2025 •

edited

Loading