-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multigpu kmeans fit function #348
Conversation
This adds a multigpu kmeans fit function, so that we can use the kmeans clustering code inside of cuml. This also adds instantiations for kmeans on int64_t indicies, which unfortunately also requires int64_t indices for the PW distance functions.
+ fix error in header
cpp/CMakeLists.txt
Outdated
@@ -344,6 +350,32 @@ add_library( | |||
src/distance/detail/pairwise_matrix/dispatch_russel_rao_float_float_float_int.cu | |||
src/distance/detail/pairwise_matrix/dispatch_russel_rao_half_float_float_int.cu | |||
src/distance/detail/pairwise_matrix/dispatch_russel_rao_double_double_double_int.cu | |||
src/distance/detail/pairwise_matrix/dispatch_canberra_double_double_double_int64_t.cu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OH man this is so sad to see :-( I think we definitely should update all of our APIs to be int64_t (and maybe even instantiate the int32_t but have them all go through the same template instantiations in the end conditionally).
cpp/src/distance/detail/pairwise_matrix/dispatch_canberra_double_double_double_int64_t.cu
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,199 @@ | |||
/* | |||
* Copyright (c) 2022-2024, NVIDIA CORPORATION. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's going to take too much to get this into 24.10, I'm okay forgoing this just for this release so long as cuML mnmg (Dask) tests are running properly on multiple gpus. We should create an issue to come back tot his, though , and reference the issue in the code if you can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw - the dask kmeans in cuml test all are working for me with this change
ben@ben-Precision-7920-Tower:~/code/cuml$ pytest python/cuml/cuml/tests/dask/test_dask_kmeans.py
========================================================================= test session starts =========================================================================
platform linux -- Python 3.12.6, pytest-7.4.4, pluggy-1.5.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/ben/code/cuml/python/cuml
configfile: pyproject.toml
plugins: benchmark-4.0.0, xdist-3.6.1, hypothesis-6.112.1, cov-5.0.0, cases-3.8.5
collected 491 items
python/cuml/cuml/tests/dask/test_dask_kmeans.py .ss.ssssssssssssssssssssssssssssssssssssssssssssssssss.ss.ssssssssssssssssssssssssssssssssssssssssssssssssss.ss [ 22%]
.ssssssssssssssssssssssssssssssssssssssssssssssssss.ss.sssssssssssssssssssssssssssssssssssssssssssssssssss.....ssssssss.ssssssss.ssssssssssssssssssssssssssssss [ 54%]
ssssssssssssssssssssssssssssssss.ssssssss.ssssssss.ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.ss.ssssssssssssssssssssssssssssssssssssssssss [ 87%]
ssssssss.ss.ssssssssssssssssssssssssssssssssssssssssssssssssss [100%]
================================================================== 22 passed, 469 skipped in 39.48s ===================================================================
I will create an issue for the test here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the dask kmeans tests work locally on my workstation - but seem to be failing in CI on the cuml PR =(
https://github.com/rapidsai/cuml/actions/runs/11100367838/job/30840143557?pr=6085
looking into this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am reasonably sure that the last commit here fixes - re-running CI on the cuml pr to test this out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dask kmeans tests passing cuml ci now https://github.com/rapidsai/cuml/actions/runs/11114409824/job/30882703741?pr=6085
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should consider eventually exposing a rudimentary Dask API in cuVS for kmeans (and ann/knn when that's ready) so that we can test this stuff right inside cuVS. I'll create an issue for this.
cpp/CMakeLists.txt
Outdated
@@ -341,6 +347,32 @@ add_library( | |||
src/distance/detail/pairwise_matrix/dispatch_russel_rao_float_float_float_int.cu | |||
src/distance/detail/pairwise_matrix/dispatch_russel_rao_half_float_float_int.cu | |||
src/distance/detail/pairwise_matrix/dispatch_russel_rao_double_double_double_int.cu | |||
src/distance/detail/pairwise_matrix/dispatch_canberra_double_double_double_int64_t.cu | |||
src/distance/detail/pairwise_matrix/dispatch_canberra_float_float_float_int64_t.cu | |||
src/distance/detail/pairwise_matrix/dispatch_correlation_double_double_double_int64_t.cu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question- do we need all of these to support int64_t or just the euclidean? Kmeans is inherently coupled to euclidean (and cosine, though that's not an option yet in our kmeans).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed out a change to limit to just l2expanded distance w/ int64_t indices here 63c11ce - lmk what you think
/merge |
Changes to support using kmeans clustering inside of cuml, so we can transition cuml off of the RAFT kmeans code
double
precision kmeans