You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using sklearn.metrics.get_scorer("roc_auc") on a DaskXGBClassifier, scikit-learn attempts to call predict_proba on what it believes is a regressor, triggering the error:
ValueError: DaskXGBClassifier should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Manually creating a scorer with make_scorer(roc_auc_score) works fine, as do other built-in scorers that only need predict (e.g. "f1"). The bug does not occur for the non-Dask XGBClassifier.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[4], line 17
15 auc_score = make_scorer(roc_auc_score)(obj, X, y) # works fine
16 f1 = get_scorer("f1")(obj, X, y) # works fine
---> 17 print(get_scorer("roc_auc")(obj, X, y)) # raises ValueError
File [/usr/local/lib/python3.12/site-packages/sklearn/metrics/_scorer.py:288](http://127.0.0.1:8888/usr/local/lib/python3.12/site-packages/sklearn/metrics/_scorer.py#line=287), in _BaseScorer.__call__(self, estimator, X, y_true, sample_weight, **kwargs)
285 if sample_weight is not None:
286 _kwargs["sample_weight"] = sample_weight
--> 288 return self._score(partial(_cached_call, None), estimator, X, y_true, **_kwargs)
File [/usr/local/lib/python3.12/site-packages/sklearn/metrics/_scorer.py:380](http://127.0.0.1:8888/usr/local/lib/python3.12/site-packages/sklearn/metrics/_scorer.py#line=379), in _Scorer._score(self, method_caller, estimator, X, y_true, **kwargs)
378 pos_label = None if is_regressor(estimator) else self._get_pos_label()
379 response_method = _check_response_method(estimator, self._response_method)
--> 380 y_pred = method_caller(
381 estimator,
382 _get_response_method_name(response_method),
383 X,
384 pos_label=pos_label,
385 )
387 scoring_kwargs = {**self._kwargs, **kwargs}
388 return self._sign * self._score_func(y_true, y_pred, **scoring_kwargs)
File [/usr/local/lib/python3.12/site-packages/sklearn/metrics/_scorer.py:90](http://127.0.0.1:8888/usr/local/lib/python3.12/site-packages/sklearn/metrics/_scorer.py#line=89), in _cached_call(cache, estimator, response_method, *args, **kwargs)
87 if cache is not None and response_method in cache:
88 return cache[response_method]
---> 90 result, _ = _get_response_values(
91 estimator, *args, response_method=response_method, **kwargs
92 )
94 if cache is not None:
95 cache[response_method] = result
File [/usr/local/lib/python3.12/site-packages/sklearn/utils/_response.py:235](http://127.0.0.1:8888/usr/local/lib/python3.12/site-packages/sklearn/utils/_response.py#line=234), in _get_response_values(estimator, X, response_method, pos_label, return_response_method_used)
233 else: # estimator is a regressor
234 if response_method != "predict":
--> 235 raise ValueError(
236 f"{estimator.__class__.__name__} should either be a classifier to be "
237 f"used with response_method={response_method} or the response_method "
238 "should be 'predict'. Got a regressor with response_method="
239 f"{response_method} instead."
240 )
241 prediction_method = estimator.predict
242 y_pred, pos_label = prediction_method(X), None
ValueError: DaskXGBClassifier should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
but the vanilla XGBClassifier is not affected:
importpandasaspdimportxgboostasxgbfromsklearn.datasetsimportmake_classificationfromsklearn.metricsimportget_scorer, make_scorer, roc_auc_scoreX, y=make_classification()
X=pd.DataFrame(X, columns=[f"var{i}"foriinrange(X.shape[1])])
y=pd.Series(y)
obj=xgb.XGBClassifier().fit(X, y)
auc_score=make_scorer(roc_auc_score)(obj, X, y) # works finef1=get_scorer("f1")(obj, X, y) # works fineother_auc_score=get_scorer("roc_auc")(obj, X, y) # works fine
Environment
dask==2024.8.0
xgboost==2.1.4
scikit-learn==1.6.1
system: python:3.12-slim-bullseye docker container on Mac M3
The text was updated successfully, but these errors were encountered:
I'm using dask 2024.11.2. It's just running into an internal error now. Will look into it when I setup a different environment with an older dask version.
File "/home/jiamingy/.anaconda/envs/xgboost_dev/lib/python3.12/site-packages/sklearn/utils/validation.py", line 1055, in check_array
array = _asarray_with_order(array, order=order, dtype=dtype, xp=xp)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jiamingy/.anaconda/envs/xgboost_dev/lib/python3.12/site-packages/sklearn/utils/_array_api.py", line 832, in _asarray_with_order
array = numpy.asarray(array, order=order, dtype=dtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jiamingy/.anaconda/envs/xgboost_dev/lib/python3.12/site-packages/dask/array/core.py", line 1746, in __array__
x =self.compute()
^^^^^^^^^^^^^^
File "/home/jiamingy/.anaconda/envs/xgboost_dev/lib/python3.12/site-packages/dask/base.py", line 372, in compute
(result,) = compute(self, traverse=False, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jiamingy/.anaconda/envs/xgboost_dev/lib/python3.12/site-packages/dask/base.py", line 660, in compute
results = schedule(dsk, keys, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jiamingy/.anaconda/envs/xgboost_dev/lib/python3.12/site-packages/distributed/client.py", line 2427, in _gatherraise exception.with_traceback(traceback)
distributed.client.FutureCancelledError: ('getitem-3437663e5548237bd31527e3c352eca6', 0) cancelled for reason: unknown.
Description
When using
sklearn.metrics.get_scorer("roc_auc")
on aDaskXGBClassifier
, scikit-learn attempts to callpredict_proba
on what it believes is a regressor, triggering the error:ValueError: DaskXGBClassifier should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.
Manually creating a scorer with
make_scorer(roc_auc_score)
works fine, as do other built-in scorers that only need predict (e.g. "f1"). The bug does not occur for the non-Dask XGBClassifier.Examples
DaskXGBClassifier:
Raises the following exception:
but the vanilla XGBClassifier is not affected:
Environment
dask==2024.8.0
xgboost==2.1.4
scikit-learn==1.6.1
system: python:3.12-slim-bullseye docker container on Mac M3
The text was updated successfully, but these errors were encountered: