Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize fuzzy_row_match #489

Merged
merged 13 commits into from
Mar 5, 2025
Merged

Vectorize fuzzy_row_match #489

merged 13 commits into from
Mar 5, 2025

Conversation

Scienfitz
Copy link
Collaborator

Here a resulting test looking at the speedup:
image

  • speedup for the most realistic cases (left_df large versus right_df) approaches 4x from above
  • for less relevant cases (left_df and right_df comparable in size or overall very small) the speedup can even be 40x

@Scienfitz Scienfitz added the enhancement Expand / change existing functionality label Feb 17, 2025
@Scienfitz Scienfitz self-assigned this Feb 17, 2025
@Scienfitz Scienfitz force-pushed the feature/vectorize_fuzzy_match branch from a171b33 to beadcb4 Compare February 18, 2025 14:54
@Scienfitz Scienfitz force-pushed the feature/vectorize_fuzzy_match branch from 20d2538 to 092ce3d Compare February 26, 2025 15:41
Copy link
Collaborator

@AdrianSosic AdrianSosic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Scienfitz. Thx, this is great, finally some improvement to one of our legacy code parts 👍🏼 Overall, the logic is solid and the PR in good shape. There are a couple of issue to be fixed, though

@Scienfitz Scienfitz force-pushed the feature/vectorize_fuzzy_match branch from 979233d to 36d34d4 Compare February 28, 2025 14:00
@CLAassistant
Copy link

CLAassistant commented Feb 28, 2025

CLA assistant check
All committers have signed the CLA.

@Scienfitz Scienfitz force-pushed the feature/vectorize_fuzzy_match branch from 4ce705a to 198ac87 Compare February 28, 2025 15:45
Copy link
Collaborator

@AdrianSosic AdrianSosic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx again, I think looks very good now

@Scienfitz Scienfitz force-pushed the feature/vectorize_fuzzy_match branch 4 times, most recently from 5360b49 to f73980f Compare March 5, 2025 08:52
@Scienfitz Scienfitz force-pushed the feature/vectorize_fuzzy_match branch from f73980f to 1d0d922 Compare March 5, 2025 09:14
@Scienfitz Scienfitz merged commit 6281868 into main Mar 5, 2025
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Expand / change existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants