Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Long-term] Investigate and improve cross-platform prediction consistency #223

Open
noahho opened this issue Mar 1, 2025 · 0 comments
Open

Comments

@noahho
Copy link
Collaborator

noahho commented Mar 1, 2025

Issue Description

When running TabPFN consistency tests across different platforms (e.g., macOS vs Linux, x86 vs ARM), we've observed significant differences in model predictions.

Current Observations:

  1. Despite using , regression predictions on diabetes dataset still show differences:

    • On macOS (ARM):
    • On Linux CI:
    • Difference: ~2.34 (about ~1.6% relative difference)
  2. Classification predictions seem more stable but still show small variations

Impact:

  • Makes it difficult to have reproducible research/benchmarks across platforms
  • Requires platform-specific consistency tests (as implemented in PR Add model consistency tests #217)
  • Could affect production deployments across different infrastructures

Potential Causes:

  • Different CPU architectures (x86 vs. ARM)
  • Different BLAS/LAPACK implementations
  • OS-specific optimizations
  • Compiler-specific floating-point optimizations

Suggested Solutions to Investigate:

  1. More aggressive precision control beyond sklearn's 16-decimal option
  2. Implementation of deterministic mode that sacrifices some performance for better consistency
  3. Platform detection with environment-specific reference values
  4. Custom normalization/scaling approaches that are more robust to platform differences

Related PR:

PR #217 worked around this by making consistency tests platform-specific, but we should investigate a more fundamental solution.

Priority:

Medium - This is not breaking functionality but affects reproducibility

@noahho noahho changed the title Investigate and improve cross-platform prediction consistency [Long-term] Investigate and improve cross-platform prediction consistency Mar 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant