Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoTabPFN giving an error. "OOF ..." #164

Open
ShakunBaichoo opened this issue Jan 31, 2025 · 8 comments
Open

AutoTabPFN giving an error. "OOF ..." #164

ShakunBaichoo opened this issue Jan 31, 2025 · 8 comments

Comments

@ShakunBaichoo
Copy link

ShakunBaichoo commented Jan 31, 2025

Hi!
While running AutoTabPFN I am getting the following error:

ValueError                                Traceback (most recent call last)
Cell In[75], line 3
      1 from tabpfn_extensions.post_hoc_ensembles.sklearn_interface import AutoTabPFNClassifier
      2 clf = AutoTabPFNClassifier(max_time=30) # runs for 30 seconds
----> 3 clf.fit(X_train, y_train)
      4 autotabpfn_score = roc_auc_score(y_test, clf.predict_proba(X_test)[:, 1])
      5 print(f"AutoTabPFN ROC AUC: {autotabpfn_score}")

File [~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/sklearn_interface.py:112](http://127.0.0.1:9000/lab/tree/patient_world/~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/sklearn_interface.py#line=111), in AutoTabPFNClassifier.fit(self, X, y, categorical_feature_indices)
     97 task_type = (
     98     TaskType.MULTICLASS if len(unique_labels(y)) > 2 else TaskType.BINARY
     99 )
    100 self.predictor_ = AutoPostHocEnsemblePredictor(
    101     preset=self.preset,
    102     task_type=task_type,
   (...)
    109     **self.phe_init_args_,
    110 )
--> 112 self.predictor_.fit(
    113     X,
    114     y,
    115     categorical_feature_indices=self.categorical_feature_indices,
    116 )
    118 # -- Sklearn required values
    119 self.classes_ = self.predictor_._label_encoder.classes_

File [~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/pfn_phe.py:333](http://127.0.0.1:9000/lab/tree/patient_world/~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/pfn_phe.py#line=332), in AutoPostHocEnsemblePredictor.fit(self, X, y, categorical_feature_indices)
    316 self._estimators, model_family_per_estimator = self._collect_base_models(
    317     categorical_feature_indices=categorical_feature_indices,
    318 )
    320 self._ens_model = self._ens_model(
    321     estimators=self._estimators,
    322     seed=self.ges_random_state,
   (...)
    330     model_family_per_estimator=model_family_per_estimator,
    331 )
--> 333 self._ens_model.fit(X, y)
    335 return self

File [~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble.py:234](http://127.0.0.1:9000/lab/tree/patient_world/~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble.py#line=233), in GreedyWeightedEnsemble.fit(self, X, y)
    233 def fit(self, X, y):
--> 234     weights = self.get_weights(X, y)
    236     final_weights = []
    237     base_models = []

File [~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble.py:173](http://127.0.0.1:9000/lab/tree/patient_world/~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble.py#line=172), in GreedyWeightedEnsemble.get_weights(self, X, y)
    172 def get_weights(self, X, y):
--> 173     oof_proba = self.get_oof_per_estimator(X, y)
    174     self.model_family_per_estimator = (
    175         self.model_family_per_estimator
    176         if self.model_family_per_estimator is not None
    177         else ["X"] * len(self._estimators)
    178     )
    179     self._model_family_per_estimator = self.model_family_per_estimator[
    180         : len(self._estimators)
    181     ]

File [~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/abstract_validation_utils.py:477](http://127.0.0.1:9000/lab/tree/patient_world/~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/abstract_validation_utils.py#line=476), in AbstractValidationUtils.get_oof_per_estimator(self, X, y, return_loss_per_estimator, impute_dropped_instances, _extra_processing)
    468     holdout_index_hits[holdout_index_hits == 0] = np.nan
    469     if not all(
    470         np.isclose(
    471             oof.sum(axis=1),
   (...)
    475         for oof in oof_proba_list
    476     ):
--> 477         raise ValueError(
    478             "OOF predictions are not consistent over repeats for holdout! Something went wrong.",
    479         )
    480 elif not all(
    481     np.isclose(
    482         oof[~np.isnan(oof).any(axis=1)].sum(axis=1),
   (...)
    485     for oof in oof_proba_list
    486 ):
    487     for i, oof in enumerate(oof_proba_list):

ValueError: OOF predictions are not consistent over repeats for holdout! Something went wrong.ValueError                                Traceback (most recent call last)
Cell In[75], line 3
      1 from tabpfn_extensions.post_hoc_ensembles.sklearn_interface import AutoTabPFNClassifier
      2 clf = AutoTabPFNClassifier(max_time=30) # runs for 30 seconds
----> 3 clf.fit(X_train, y_train)
      4 autotabpfn_score = roc_auc_score(y_test, clf.predict_proba(X_test)[:, 1])
      5 print(f"AutoTabPFN ROC AUC: {autotabpfn_score}")

File [~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/sklearn_interface.py:112](http://127.0.0.1:9000/lab/tree/patient_world/~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/sklearn_interface.py#line=111), in AutoTabPFNClassifier.fit(self, X, y, categorical_feature_indices)
     97 task_type = (
     98     TaskType.MULTICLASS if len(unique_labels(y)) > 2 else TaskType.BINARY
     99 )
    100 self.predictor_ = AutoPostHocEnsemblePredictor(
    101     preset=self.preset,
    102     task_type=task_type,
   (...)
    109     **self.phe_init_args_,
    110 )
--> 112 self.predictor_.fit(
    113     X,
    114     y,
    115     categorical_feature_indices=self.categorical_feature_indices,
    116 )
    118 # -- Sklearn required values
    119 self.classes_ = self.predictor_._label_encoder.classes_

File [~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/pfn_phe.py:333](http://127.0.0.1:9000/lab/tree/patient_world/~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/pfn_phe.py#line=332), in AutoPostHocEnsemblePredictor.fit(self, X, y, categorical_feature_indices)
    316 self._estimators, model_family_per_estimator = self._collect_base_models(
    317     categorical_feature_indices=categorical_feature_indices,
    318 )
    320 self._ens_model = self._ens_model(
    321     estimators=self._estimators,
    322     seed=self.ges_random_state,
   (...)
    330     model_family_per_estimator=model_family_per_estimator,
    331 )
--> 333 self._ens_model.fit(X, y)
    335 return self

File [~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble.py:234](http://127.0.0.1:9000/lab/tree/patient_world/~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble.py#line=233), in GreedyWeightedEnsemble.fit(self, X, y)
    233 def fit(self, X, y):
--> 234     weights = self.get_weights(X, y)
    236     final_weights = []
    237     base_models = []

File [~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble.py:173](http://127.0.0.1:9000/lab/tree/patient_world/~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble.py#line=172), in GreedyWeightedEnsemble.get_weights(self, X, y)
    172 def get_weights(self, X, y):
--> 173     oof_proba = self.get_oof_per_estimator(X, y)
    174     self.model_family_per_estimator = (
    175         self.model_family_per_estimator
    176         if self.model_family_per_estimator is not None
    177         else ["X"] * len(self._estimators)
    178     )
    179     self._model_family_per_estimator = self.model_family_per_estimator[
    180         : len(self._estimators)
    181     ]

File [~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/abstract_validation_utils.py:477](http://127.0.0.1:9000/lab/tree/patient_world/~/tabpfn-extensions-main/src/tabpfn_extensions/post_hoc_ensembles/abstract_validation_utils.py#line=476), in AbstractValidationUtils.get_oof_per_estimator(self, X, y, return_loss_per_estimator, impute_dropped_instances, _extra_processing)
    468     holdout_index_hits[holdout_index_hits == 0] = np.nan
    469     if not all(
    470         np.isclose(
    471             oof.sum(axis=1),
   (...)
    475         for oof in oof_proba_list
    476     ):
--> 477         raise ValueError(
    478             "OOF predictions are not consistent over repeats for holdout! Something went wrong.",
    479         )
    480 elif not all(
    481     np.isclose(
    482         oof[~np.isnan(oof).any(axis=1)].sum(axis=1),
   (...)
    485     for oof in oof_proba_list
    486 ):
    487     for i, oof in enumerate(oof_proba_list):

ValueError: OOF predictions are not consistent over repeats for holdout! Something went wrong.

Kindly advise.

Thanks,
Shakuntala

@LeoGrin
Copy link
Collaborator

LeoGrin commented Feb 5, 2025

Hey @ShakunBaichoo !
Thanks for reporting :) Would you be open to sharing the input data, or to create an example to reproduce the error?

@ShakunBaichoo
Copy link
Author

I was using data from MIMIC III.

@noahho
Copy link
Collaborator

noahho commented Feb 5, 2025

@LennartPurucker Do you have an idea what is going on?

@LennartPurucker
Copy link
Collaborator

@ShakunBaichoo How many classes does the input data have, and what is the number of samples per class?

This error should only occur in a rare edge case related to the number of classes when splitting the data.

@LennartPurucker
Copy link
Collaborator

As a workaround, you could set phe_init_args=dict(n_repeats=1) to AutoTabPFNClassifier.
But this likely increases overfitting.

@noahho
Copy link
Collaborator

noahho commented Feb 5, 2025

@LennartPurucker is there a way to catch and handle this error or related?

@LennartPurucker
Copy link
Collaborator

I don't know what causes this here, yet. As the error mentions, this should never happen.

I know the function is thoroughly tested in my repo the code originated from. Since I added it to the TabPFN code base it has seen only minor changes. So, I am unsure where this bug would come from. For reference, here is the newest version of the splitting code. Maybe you can check what happens if you use this code for the data.

@LennartPurucker
Copy link
Collaborator

@ShakunBaichoo I just pushed a fix to AutoTabPFN, if you get the latest code and run it again, this might be fixed now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants