Add multigpu kmeans fit function #348

benfred · 2024-09-25T02:33:44Z

Changes to support using kmeans clustering inside of cuml, so we can transition cuml off of the RAFT kmeans code

Add a multigpu kmeans fit function
Adds instantiations for kmeans on int64_t indicies, which unfortunately also requires int64_t indices for the PW distance functions
Add support for double precision kmeans

This adds a multigpu kmeans fit function, so that we can use the kmeans clustering code inside of cuml. This also adds instantiations for kmeans on int64_t indicies, which unfortunately also requires int64_t indices for the PW distance functions.

+ fix error in header

…luster_cuml

cpp/include/cuvs/cluster/kmeans_mg.hpp

cpp/src/cluster/kmeans_fit_double.cu

cjnolet · 2024-09-26T16:14:08Z

cpp/CMakeLists.txt

@@ -344,6 +350,32 @@ add_library(
  src/distance/detail/pairwise_matrix/dispatch_russel_rao_float_float_float_int.cu
  src/distance/detail/pairwise_matrix/dispatch_russel_rao_half_float_float_int.cu
  src/distance/detail/pairwise_matrix/dispatch_russel_rao_double_double_double_int.cu
+  src/distance/detail/pairwise_matrix/dispatch_canberra_double_double_double_int64_t.cu


OH man this is so sad to see :-( I think we definitely should update all of our APIs to be int64_t (and maybe even instantiate the int32_t but have them all go through the same template instantiations in the end conditionally).

…luster_cuml

cpp/include/cuvs/cluster/kmeans.hpp

cpp/src/distance/detail/pairwise_matrix/dispatch_canberra_double_double_double_int64_t.cu

cjnolet · 2024-09-27T16:28:05Z

cpp/test/cluster/kmeans_mg.cu

@@ -0,0 +1,199 @@
+/*
+ * Copyright (c) 2022-2024, NVIDIA CORPORATION.


If it's going to take too much to get this into 24.10, I'm okay forgoing this just for this release so long as cuML mnmg (Dask) tests are running properly on multiple gpus. We should create an issue to come back tot his, though , and reference the issue in the code if you can.

fwiw - the dask kmeans in cuml test all are working for me with this change

ben@ben-Precision-7920-Tower:~/code/cuml$ pytest python/cuml/cuml/tests/dask/test_dask_kmeans.py ========================================================================= test session starts ========================================================================= platform linux -- Python 3.12.6, pytest-7.4.4, pluggy-1.5.0 benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000) rootdir: /home/ben/code/cuml/python/cuml configfile: pyproject.toml plugins: benchmark-4.0.0, xdist-3.6.1, hypothesis-6.112.1, cov-5.0.0, cases-3.8.5 collected 491 items python/cuml/cuml/tests/dask/test_dask_kmeans.py .ss.ssssssssssssssssssssssssssssssssssssssssssssssssss.ss.ssssssssssssssssssssssssssssssssssssssssssssssssss.ss [ 22%] .ssssssssssssssssssssssssssssssssssssssssssssssssss.ss.sssssssssssssssssssssssssssssssssssssssssssssssssss.....ssssssss.ssssssss.ssssssssssssssssssssssssssssss [ 54%] ssssssssssssssssssssssssssssssss.ssssssss.ssssssss.ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.ss.ssssssssssssssssssssssssssssssssssssssssss [ 87%] ssssssss.ss.ssssssssssssssssssssssssssssssssssssssssssssssssss [100%] ================================================================== 22 passed, 469 skipped in 39.48s ===================================================================

I will create an issue for the test here

the dask kmeans tests work locally on my workstation - but seem to be failing in CI on the cuml PR =(
https://github.com/rapidsai/cuml/actions/runs/11100367838/job/30840143557?pr=6085

looking into this

I am reasonably sure that the last commit here fixes - re-running CI on the cuml pr to test this out

dask kmeans tests passing cuml ci now https://github.com/rapidsai/cuml/actions/runs/11114409824/job/30882703741?pr=6085

I think we should consider eventually exposing a rudimentary Dask API in cuVS for kmeans (and ann/knn when that's ready) so that we can test this stuff right inside cuVS. I'll create an issue for this.

cjnolet · 2024-10-01T18:27:05Z

cpp/CMakeLists.txt

@@ -341,6 +347,32 @@ add_library(
  src/distance/detail/pairwise_matrix/dispatch_russel_rao_float_float_float_int.cu
  src/distance/detail/pairwise_matrix/dispatch_russel_rao_half_float_float_int.cu
  src/distance/detail/pairwise_matrix/dispatch_russel_rao_double_double_double_int.cu
+  src/distance/detail/pairwise_matrix/dispatch_canberra_double_double_double_int64_t.cu
+  src/distance/detail/pairwise_matrix/dispatch_canberra_float_float_float_int64_t.cu
+  src/distance/detail/pairwise_matrix/dispatch_correlation_double_double_double_int64_t.cu


Question- do we need all of these to support int64_t or just the euclidean? Kmeans is inherently coupled to euclidean (and cosine, though that's not an option yet in our kmeans).

pushed out a change to limit to just l2expanded distance w/ int64_t indices here 63c11ce - lmk what you think

cjnolet · 2024-10-02T15:00:36Z

/merge

Add multigpu kmeans fit function

1fbdda1

This adds a multigpu kmeans fit function, so that we can use the kmeans clustering code inside of cuml. This also adds instantiations for kmeans on int64_t indicies, which unfortunately also requires int64_t indices for the PW distance functions.

benfred added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Sep 25, 2024

benfred self-assigned this Sep 25, 2024

github-actions bot added cpp CMake labels Sep 25, 2024

benfred added 3 commits September 24, 2024 19:39

Merge branch 'branch-24.10' into cluster_cuml

cc3e341

Add unittest

7c880ea

+ fix error in header

Merge branch 'cluster_cuml' of https://github.com/benfred/cuvs into c…

dfbb94c

…luster_cuml

benfred marked this pull request as ready for review September 26, 2024 04:19

benfred requested review from a team as code owners September 26, 2024 04:19

Merge branch 'branch-24.10' into cluster_cuml

a59b6fd

benfred mentioned this pull request Sep 26, 2024

Migrate to use cuVS for vector search rapidsai/cuml#6085

Merged

cjnolet reviewed Sep 26, 2024

View reviewed changes

cpp/include/cuvs/cluster/kmeans_mg.hpp Outdated Show resolved Hide resolved

cjnolet reviewed Sep 26, 2024

View reviewed changes

cpp/src/cluster/kmeans_fit_double.cu Outdated Show resolved Hide resolved

cjnolet reviewed Sep 26, 2024

View reviewed changes

benfred added 2 commits September 27, 2024 00:36

.

b9797e7

Merge branch 'cluster_cuml' of https://github.com/benfred/cuvs into c…

2488f78

…luster_cuml

cjnolet reviewed Sep 27, 2024

View reviewed changes

cpp/include/cuvs/cluster/kmeans.hpp Outdated Show resolved Hide resolved

cjnolet reviewed Sep 27, 2024

View reviewed changes

cpp/include/cuvs/cluster/kmeans.hpp Outdated Show resolved Hide resolved

cjnolet reviewed Sep 27, 2024

View reviewed changes

cpp/src/distance/detail/pairwise_matrix/dispatch_canberra_double_double_double_int64_t.cu Outdated Show resolved Hide resolved

cjnolet reviewed Sep 27, 2024

View reviewed changes

benfred added 7 commits September 28, 2024 09:30

Merge remote-tracking branch 'origin/branch-24.10' into cluster_cuml

dac4477

updates from code review

d78dc8c

.

0b48c5a

.

e9d06aa

add header

d8157ba

docstring

38f9ca1

fix copyright date

2820cb6

benfred added 4 commits September 29, 2024 11:38

fix

7c4e97f

add missing docstrings

4c8dace

mnmg fixes

5a05ee4

fix copyright dates

e6635a8

cjnolet reviewed Oct 1, 2024

View reviewed changes

benfred added 5 commits October 1, 2024 14:03

only instantiate int64_t distance functions for euclidean distance

63c11ce

fixes

559c3a1

add l2sqrtexpanded distance to kmeans

2df8bd1

Merge branch 'branch-24.10' into cluster_cuml

f04e3d2

Merge branch 'branch-24.10' into cluster_cuml

7727a43

cjnolet approved these changes Oct 2, 2024

View reviewed changes

rapids-bot bot merged commit 2fe2e88 into rapidsai:branch-24.10 Oct 2, 2024
54 checks passed

benfred deleted the cluster_cuml branch October 4, 2024 06:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multigpu kmeans fit function #348

Add multigpu kmeans fit function #348

benfred commented Sep 25, 2024 •

edited

Loading

cjnolet Sep 26, 2024

cjnolet Sep 27, 2024

benfred Sep 29, 2024

benfred Sep 30, 2024

benfred Sep 30, 2024

benfred Sep 30, 2024

cjnolet Oct 1, 2024

cjnolet Oct 1, 2024

benfred Oct 1, 2024

cjnolet commented Oct 2, 2024

		@@ -0,0 +1,199 @@
		/*
		* Copyright (c) 2022-2024, NVIDIA CORPORATION.

Add multigpu kmeans fit function #348

Add multigpu kmeans fit function #348

Conversation

benfred commented Sep 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjnolet commented Oct 2, 2024

benfred commented Sep 25, 2024 •

edited

Loading