Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multigpu kmeans fit function #348

Merged
merged 23 commits into from
Oct 2, 2024

Conversation

benfred
Copy link
Member

@benfred benfred commented Sep 25, 2024

Changes to support using kmeans clustering inside of cuml, so we can transition cuml off of the RAFT kmeans code

  • Add a multigpu kmeans fit function
  • Adds instantiations for kmeans on int64_t indicies, which unfortunately also requires int64_t indices for the PW distance functions
  • Add support for double precision kmeans

This adds a multigpu kmeans fit function, so that we can use the kmeans clustering
code inside of cuml.

This also adds instantiations for kmeans on int64_t indicies, which unfortunately
also requires int64_t indices for the PW distance functions.
@benfred benfred added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Sep 25, 2024
@benfred benfred self-assigned this Sep 25, 2024
@benfred benfred marked this pull request as ready for review September 26, 2024 04:19
@benfred benfred requested review from a team as code owners September 26, 2024 04:19
@@ -344,6 +350,32 @@ add_library(
src/distance/detail/pairwise_matrix/dispatch_russel_rao_float_float_float_int.cu
src/distance/detail/pairwise_matrix/dispatch_russel_rao_half_float_float_int.cu
src/distance/detail/pairwise_matrix/dispatch_russel_rao_double_double_double_int.cu
src/distance/detail/pairwise_matrix/dispatch_canberra_double_double_double_int64_t.cu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OH man this is so sad to see :-( I think we definitely should update all of our APIs to be int64_t (and maybe even instantiate the int32_t but have them all go through the same template instantiations in the end conditionally).

@@ -0,0 +1,199 @@
/*
* Copyright (c) 2022-2024, NVIDIA CORPORATION.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's going to take too much to get this into 24.10, I'm okay forgoing this just for this release so long as cuML mnmg (Dask) tests are running properly on multiple gpus. We should create an issue to come back tot his, though , and reference the issue in the code if you can.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw - the dask kmeans in cuml test all are working for me with this change

ben@ben-Precision-7920-Tower:~/code/cuml$ pytest python/cuml/cuml/tests/dask/test_dask_kmeans.py 
========================================================================= test session starts =========================================================================
platform linux -- Python 3.12.6, pytest-7.4.4, pluggy-1.5.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/ben/code/cuml/python/cuml
configfile: pyproject.toml
plugins: benchmark-4.0.0, xdist-3.6.1, hypothesis-6.112.1, cov-5.0.0, cases-3.8.5
collected 491 items                                                                                                                                                   

python/cuml/cuml/tests/dask/test_dask_kmeans.py .ss.ssssssssssssssssssssssssssssssssssssssssssssssssss.ss.ssssssssssssssssssssssssssssssssssssssssssssssssss.ss [ 22%]
.ssssssssssssssssssssssssssssssssssssssssssssssssss.ss.sssssssssssssssssssssssssssssssssssssssssssssssssss.....ssssssss.ssssssss.ssssssssssssssssssssssssssssss [ 54%]
ssssssssssssssssssssssssssssssss.ssssssss.ssssssss.ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.ss.ssssssssssssssssssssssssssssssssssssssssss [ 87%]
ssssssss.ss.ssssssssssssssssssssssssssssssssssssssssssssssssss                                                                                                  [100%]

================================================================== 22 passed, 469 skipped in 39.48s ===================================================================

I will create an issue for the test here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the dask kmeans tests work locally on my workstation - but seem to be failing in CI on the cuml PR =(
https://github.com/rapidsai/cuml/actions/runs/11100367838/job/30840143557?pr=6085

looking into this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am reasonably sure that the last commit here fixes - re-running CI on the cuml pr to test this out

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consider eventually exposing a rudimentary Dask API in cuVS for kmeans (and ann/knn when that's ready) so that we can test this stuff right inside cuVS. I'll create an issue for this.

@@ -341,6 +347,32 @@ add_library(
src/distance/detail/pairwise_matrix/dispatch_russel_rao_float_float_float_int.cu
src/distance/detail/pairwise_matrix/dispatch_russel_rao_half_float_float_int.cu
src/distance/detail/pairwise_matrix/dispatch_russel_rao_double_double_double_int.cu
src/distance/detail/pairwise_matrix/dispatch_canberra_double_double_double_int64_t.cu
src/distance/detail/pairwise_matrix/dispatch_canberra_float_float_float_int64_t.cu
src/distance/detail/pairwise_matrix/dispatch_correlation_double_double_double_int64_t.cu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question- do we need all of these to support int64_t or just the euclidean? Kmeans is inherently coupled to euclidean (and cosine, though that's not an option yet in our kmeans).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed out a change to limit to just l2expanded distance w/ int64_t indices here 63c11ce - lmk what you think

@cjnolet
Copy link
Member

cjnolet commented Oct 2, 2024

/merge

@rapids-bot rapids-bot bot merged commit 2fe2e88 into rapidsai:branch-24.10 Oct 2, 2024
54 checks passed
@benfred benfred deleted the cluster_cuml branch October 4, 2024 06:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp improvement Improves an existing functionality non-breaking Introduces a non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

2 participants