Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cross_val_score and GPU (not memory leakage case) #11298

Closed
satyrmipt opened this issue Mar 2, 2025 · 5 comments
Closed

cross_val_score and GPU (not memory leakage case) #11298

satyrmipt opened this issue Mar 2, 2025 · 5 comments

Comments

@satyrmipt
Copy link

satyrmipt commented Mar 2, 2025

I try to use sklearn's cross_val_score with XGBClassifier while both model and data are on GPU and python asks me to move something on CPU explicitly. What exactly i have to move to CPU (model, x_all, y_all, weights,...) and would my cross validation procedure works on GPU after i'll do it?

Selected lines of the code to describe model selection part:

              GPU_IND=1
              sklearn.set_config(enable_metadata_routing=True)
              x_all=cdf[x_col_list].iloc[train_start:train_end].values
              y_all=np.array(y_true_list[train_start:train_end])
              # move to gpu x, y
              if GPU_IND==1:
                x_all=cupy.array(x_all)
                y_all=cupy.array(y_all)
              # define sample weights somehow
              bin_all_weigts=[i for i in range(len(y_all))]
              # define cross validation k-fold strategy, not sure how to move it to GPU and do i need to do it
              cv_split_gen = StratifiedKFold(n_splits=6, shuffle=True, random_state=42)
              # lame grid search as nested for loops:
              for n_estimators in [100, 200]:
                  for max_depth in [1, 2, 3, 5]:
                      for bad_deal_factor in range(11): #[1, 2, 3, 10]:
                          # shifted sample weights
                          shifted_all_bin_weigts=[w*(bad_deal_factor if true_class==1else 1) for w, true_class in zip(bin_all_weigts, y_all)]
                          if GPU_IND==1:
                            # move shifted weights to GPU
                            shifted_all_bin_weigts=cupy.array(shifted_all_bin_weigts)
                          # crossval_params defined after we've moved shifted weights to GPU
                          crossval_params={'sample_weight': shifted_all_bin_weigts, 'adjusted':True}
                          # model with device = 'cuda' and metadata routing instruction
                          model=XGBClassifier(n_estimators=n_estimators, max_depth=max_depth,
                                              random_state=314, nthread=-1, validate_parameters=True, device='cuda' if GPU_IND==1 else 'cpu',).set_fit_request(sample_weight='sample_weight')
                          # cross val scorer with metadata routing instruction
                          bal_acc_adjusted=make_scorer(balanced_accuracy_score).set_score_request(sample_weight='sample_weight', adjusted='adjusted')
                          # at this point model, x, y, weigths are moved to gpu:
                          cv_metrics_list=cross_val_score(model, x_all, y_all, cv=cv_split_gen, params=crossval_params, scoring=bal_acc_adjusted)

The error on cv_metrics_list=cross_val_score(... row

TypeError                                 Traceback (most recent call last)
[<ipython-input-12-329d82232dde>](https://localhost:8080/#) in <cell line: 0>()
    159                           bal_acc_adjusted=make_scorer(balanced_accuracy_score).set_score_request(sample_weight='sample_weight', adjusted='adjusted')
--> 160                           cv_metrics_list=cross_val_score(model, x_all, y_all, cv=cv_split_gen, params=crossval_params, scoring=bal_acc_adjusted)
    161                           mean_cv_metric=cv_metrics_list.mean()
    162 

6 frames
[/usr/local/lib/python3.11/dist-packages/sklearn/utils/_array_api.py](https://localhost:8080/#) in _asarray_with_order(array, dtype, order, copy, xp, device)
    837             array = numpy.array(array, order=order, dtype=dtype)
    838         else:
--> 839             array = numpy.asarray(array, order=order, dtype=dtype)
    840 
    841         # At this point array is a NumPy ndarray. We convert it to an array

cupy/_core/core.pyx in cupy._core.core._ndarray_base.__array__()

TypeError: Implicit conversion to a NumPy array is not allowed. Please use `.get()` to construct a NumPy array explicitly. 
@trivialfis
Copy link
Member

trivialfis commented Mar 3, 2025

The error comes from sklearn and cupy.

Longer explanation: sklearn doesn't support GPU-based data like cupy very well at the moment. It uses array_api with numpy to handle inputs, but cupy requires users to be explicit when copying data to numpy. As a result, when you give sklearn a cupy array, there's an error when sklearn tries to treat it as numpy.

@trivialfis
Copy link
Member

cc @dantegd

@satyrmipt
Copy link
Author

satyrmipt commented Mar 4, 2025

@trivialfis thank you, i've got the underlying reason of the error and was looking for natural way to use sklearn wrappers to combine xgb model on GPU and useful CPU-only sklearn methods like cross_validate and grid search. My idea was to make pipeline with custom transformers (from sklearn.preprocessing import FunctionTransformer) like

ppl=make_pipeline(move_data_to_GPU, gpu_model, move_data_back_to_CPU)

But i didn't find a way to make custom transformer for targets and not sure i can use transformer in pipeline after the estimator. So i decided to code cross validation myself and even fine ready-to-copy example in documentation. Still don't know how to do XGBClassifier on GPU + grid search though.

@trivialfis
Copy link
Member

For now, you can put the data on CPU and let XGBoost make the necessary copy. Yes, there are warnings from XGBoost during calls to predict methods. You can silence XGBoost by using the verbosity parameter when you know everything else is fine.

@trivialfis
Copy link
Member

I'm closing this as it relates to Sklearn's ability to handle GPU data. Feel free to reopen if you have further questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants