AutoTabPFN giving an error. "OOF ..." #164

ShakunBaichoo · 2025-01-31T16:39:05Z

Hi!
While running AutoTabPFN I am getting the following error:

ValueError                                Traceback (most recent call last)
Cell In[75], line 3
      1 from tabpfn_extensions.post_hoc_ensembles.sklearn_interface import AutoTabPFNClassifier
      2 clf = AutoTabPFNClassifier(max_time=30) # runs for 30 seconds
----> 3 clf.fit(X_train, y_train)
      4 autotabpfn_score = roc_auc_score(y_test, clf.predict_proba(X_test)[:, 1])
      5 print(f"AutoTabPFN ROC AUC: {autotabpfn_score}")

File [~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/sklearn_interface.py:112](http://127.0.0.1:9000/lab/tree/patient_world/~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/sklearn_interface.py#line=111), in AutoTabPFNClassifier.fit(self, X, y, categorical_feature_indices)
     97 task_type = (
     98     TaskType.MULTICLASS if len(unique_labels(y)) > 2 else TaskType.BINARY
     99 )
    100 self.predictor_ = AutoPostHocEnsemblePredictor(
    101     preset=self.preset,
    102     task_type=task_type,
   (...)
    109     **self.phe_init_args_,
    110 )
--> 112 self.predictor_.fit(
    113     X,
    114     y,
    115     categorical_feature_indices=self.categorical_feature_indices,
    116 )
    118 # -- Sklearn required values
    119 self.classes_ = self.predictor_._label_encoder.classes_

File [~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/pfn_phe.py:333](http://127.0.0.1:9000/lab/tree/patient_world/~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/pfn_phe.py#line=332), in AutoPostHocEnsemblePredictor.fit(self, X, y, categorical_feature_indices)
    316 self._estimators, model_family_per_estimator = self._collect_base_models(
    317     categorical_feature_indices=categorical_feature_indices,
    318 )
    320 self._ens_model = self._ens_model(
    321     estimators=self._estimators,
    322     seed=self.ges_random_state,
   (...)
    330     model_family_per_estimator=model_family_per_estimator,
    331 )
--> 333 self._ens_model.fit(X, y)
    335 return self

File [~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble.py:234](http://127.0.0.1:9000/lab/tree/patient_world/~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble.py#line=233), in GreedyWeightedEnsemble.fit(self, X, y)
    233 def fit(self, X, y):
--> 234     weights = self.get_weights(X, y)
    236     final_weights = []
    237     base_models = []

File [~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble.py:173](http://127.0.0.1:9000/lab/tree/patient_world/~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble.py#line=172), in GreedyWeightedEnsemble.get_weights(self, X, y)
    172 def get_weights(self, X, y):
--> 173     oof_proba = self.get_oof_per_estimator(X, y)
    174     self.model_family_per_estimator = (
    175         self.model_family_per_estimator
    176         if self.model_family_per_estimator is not None
    177         else ["X"] * len(self._estimators)
    178     )
    179     self._model_family_per_estimator = self.model_family_per_estimator[
    180         : len(self._estimators)
    181     ]

File [~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/abstract_validation_utils.py:477](http://127.0.0.1:9000/lab/tree/patient_world/~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/abstract_validation_utils.py#line=476), in AbstractValidationUtils.get_oof_per_estimator(self, X, y, return_loss_per_estimator, impute_dropped_instances, _extra_processing)
    468     holdout_index_hits[holdout_index_hits == 0] = np.nan
    469     if not all(
    470         np.isclose(
    471             oof.sum(axis=1),
   (...)
    475         for oof in oof_proba_list
    476     ):
--> 477         raise ValueError(
    478             "OOF predictions are not consistent over repeats for holdout! Something went wrong.",
    479         )
    480 elif not all(
    481     np.isclose(
    482         oof[~np.isnan(oof).any(axis=1)].sum(axis=1),
   (...)
    485     for oof in oof_proba_list
    486 ):
    487     for i, oof in enumerate(oof_proba_list):

ValueError: OOF predictions are not consistent over repeats for holdout! Something went wrong.ValueError                                Traceback (most recent call last)
Cell In[75], line 3
      1 from tabpfn_extensions.post_hoc_ensembles.sklearn_interface import AutoTabPFNClassifier
      2 clf = AutoTabPFNClassifier(max_time=30) # runs for 30 seconds
----> 3 clf.fit(X_train, y_train)
      4 autotabpfn_score = roc_auc_score(y_test, clf.predict_proba(X_test)[:, 1])
      5 print(f"AutoTabPFN ROC AUC: {autotabpfn_score}")

File [~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/sklearn_interface.py:112](http://127.0.0.1:9000/lab/tree/patient_world/~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/sklearn_interface.py#line=111), in AutoTabPFNClassifier.fit(self, X, y, categorical_feature_indices)
     97 task_type = (
     98     TaskType.MULTICLASS if len(unique_labels(y)) > 2 else TaskType.BINARY
     99 )
    100 self.predictor_ = AutoPostHocEnsemblePredictor(
    101     preset=self.preset,
    102     task_type=task_type,
   (...)
    109     **self.phe_init_args_,
    110 )
--> 112 self.predictor_.fit(
    113     X,
    114     y,
    115     categorical_feature_indices=self.categorical_feature_indices,
    116 )
    118 # -- Sklearn required values
    119 self.classes_ = self.predictor_._label_encoder.classes_

File [~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/pfn_phe.py:333](http://127.0.0.1:9000/lab/tree/patient_world/~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/pfn_phe.py#line=332), in AutoPostHocEnsemblePredictor.fit(self, X, y, categorical_feature_indices)
    316 self._estimators, model_family_per_estimator = self._collect_base_models(
    317     categorical_feature_indices=categorical_feature_indices,
    318 )
    320 self._ens_model = self._ens_model(
    321     estimators=self._estimators,
    322     seed=self.ges_random_state,
   (...)
    330     model_family_per_estimator=model_family_per_estimator,
    331 )
--> 333 self._ens_model.fit(X, y)
    335 return self

File [~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble.py:234](http://127.0.0.1:9000/lab/tree/patient_world/~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble.py#line=233), in GreedyWeightedEnsemble.fit(self, X, y)
    233 def fit(self, X, y):
--> 234     weights = self.get_weights(X, y)
    236     final_weights = []
    237     base_models = []

File [~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble.py:173](http://127.0.0.1:9000/lab/tree/patient_world/~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble.py#line=172), in GreedyWeightedEnsemble.get_weights(self, X, y)
    172 def get_weights(self, X, y):
--> 173     oof_proba = self.get_oof_per_estimator(X, y)
    174     self.model_family_per_estimator = (
    175         self.model_family_per_estimator
    176         if self.model_family_per_estimator is not None
    177         else ["X"] * len(self._estimators)
    178     )
    179     self._model_family_per_estimator = self.model_family_per_estimator[
    180         : len(self._estimators)
    181     ]

File [~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/abstract_validation_utils.py:477](http://127.0.0.1:9000/lab/tree/patient_world/~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/abstract_validation_utils.py#line=476), in AbstractValidationUtils.get_oof_per_estimator(self, X, y, return_loss_per_estimator, impute_dropped_instances, _extra_processing)
    468     holdout_index_hits[holdout_index_hits == 0] = np.nan
    469     if not all(
    470         np.isclose(
    471             oof.sum(axis=1),
   (...)
    475         for oof in oof_proba_list
    476     ):
--> 477         raise ValueError(
    478             "OOF predictions are not consistent over repeats for holdout! Something went wrong.",
    479         )
    480 elif not all(
    481     np.isclose(
    482         oof[~np.isnan(oof).any(axis=1)].sum(axis=1),
   (...)
    485     for oof in oof_proba_list
    486 ):
    487     for i, oof in enumerate(oof_proba_list):

ValueError: OOF predictions are not consistent over repeats for holdout! Something went wrong.

Kindly advise.

Thanks,
Shakuntala

The text was updated successfully, but these errors were encountered:

LeoGrin · 2025-02-05T13:09:03Z

Hey @ShakunBaichoo !
Thanks for reporting :) Would you be open to sharing the input data, or to create an example to reproduce the error?

ShakunBaichoo · 2025-02-05T16:04:29Z

I was using data from MIMIC III.

noahho · 2025-02-05T16:07:15Z

@LennartPurucker Do you have an idea what is going on?

LennartPurucker · 2025-02-05T16:16:27Z

@ShakunBaichoo How many classes does the input data have, and what is the number of samples per class?

This error should only occur in a rare edge case related to the number of classes when splitting the data.

LennartPurucker · 2025-02-05T16:19:00Z

As a workaround, you could set phe_init_args=dict(n_repeats=1) to AutoTabPFNClassifier.
But this likely increases overfitting.

noahho · 2025-02-05T16:27:37Z

@LennartPurucker is there a way to catch and handle this error or related?

LennartPurucker · 2025-02-05T16:41:16Z

I don't know what causes this here, yet. As the error mentions, this should never happen.

I know the function is thoroughly tested in my repo the code originated from. Since I added it to the TabPFN code base it has seen only minor changes. So, I am unsure where this bug would come from. For reference, here is the newest version of the splitting code. Maybe you can check what happens if you use this code for the data.

LennartPurucker · 2025-02-06T17:34:51Z

@ShakunBaichoo I just pushed a fix to AutoTabPFN, if you get the latest code and run it again, this might be fixed now!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoTabPFN giving an error. "OOF ..." #164

AutoTabPFN giving an error. "OOF ..." #164

ShakunBaichoo commented Jan 31, 2025 •

edited by LennartPurucker

Loading

LeoGrin commented Feb 5, 2025

ShakunBaichoo commented Feb 5, 2025

noahho commented Feb 5, 2025

LennartPurucker commented Feb 5, 2025

LennartPurucker commented Feb 5, 2025

noahho commented Feb 5, 2025

LennartPurucker commented Feb 5, 2025

LennartPurucker commented Feb 6, 2025

AutoTabPFN giving an error. "OOF ..." #164

AutoTabPFN giving an error. "OOF ..." #164

Comments

ShakunBaichoo commented Jan 31, 2025 • edited by LennartPurucker Loading

LeoGrin commented Feb 5, 2025

ShakunBaichoo commented Feb 5, 2025

noahho commented Feb 5, 2025

LennartPurucker commented Feb 5, 2025

LennartPurucker commented Feb 5, 2025

noahho commented Feb 5, 2025

LennartPurucker commented Feb 5, 2025

LennartPurucker commented Feb 6, 2025

ShakunBaichoo commented Jan 31, 2025 •

edited by LennartPurucker

Loading