Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore some extra chars in no-combining search #2929

Merged
merged 3 commits into from
Jan 5, 2024
Merged

Conversation

abdnh
Copy link
Collaborator

@abdnh abdnh commented Jan 4, 2024

Closes #2926

I just took the translation list in WordPress page and kept only letters that don't change after NFKD normalization.

.nfkd()
.filter(|c| !is_combining_mark(*c))
.collect::<String>()
.into()
.collect::<String>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use of phf is nice! I think the step below could be done more efficiently though, by using .map() here to transform the unwanted characters, instead of modifying the string below. You should be able to look up each character instead of having to iterate over the phf, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(might be more efficient to do this in a 'for char in ... { output.push(char) or output.push_str(repl) }' than using flat_map)

@dae
Copy link
Member

dae commented Jan 5, 2024

Thanks Abdo!

@dae dae merged commit 646ba41 into ankitects:main Jan 5, 2024
@abdnh abdnh deleted the nc branch January 5, 2024 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Our no-combining search does not ignore all diacritics
2 participants