Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix indexing bug when using parallelism to build CPU hierarchy in HNSW #620

Merged
merged 7 commits into from
Jan 30, 2025

Conversation

divyegala
Copy link
Member

@divyegala divyegala commented Jan 27, 2025

hnswlib uses an internal indexing system which assigns an ID to points, atomically, in-order that they are added to the index. When using parallelism to add points to the index, the internal ID may be different than the "label" of the point (label, for us, is just the index of the row in the dataset) as a consequence of adding points in-parallel in no deterministic order.

The bug was that I was using the label itself to write out the CPU hierarchy, when I should have been using hnswlib's internal ID for the point associated with that label.

@divyegala
Copy link
Member Author

Build times before bug was fixed:
image

Built times with this PR:
image

@divyegala divyegala added bug Something isn't working non-breaking Introduces a non-breaking change labels Jan 27, 2025
@navneet1v
Copy link

@divyegala does this similar problem exist with Cagra index getting converted to Faiss enabled HNSW?

@divyegala
Copy link
Member Author

@navneet1v no, this bug is not present in FAISS

@divyegala
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit 7609d18 into rapidsai:branch-25.02 Jan 30, 2025
61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cpp non-breaking Introduces a non-breaking change Python
Projects
Development

Successfully merging this pull request may close these issues.

3 participants