You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running TabPFN consistency tests across different platforms (e.g., macOS vs Linux, x86 vs ARM), we've observed significant differences in model predictions.
Current Observations:
Despite using , regression predictions on diabetes dataset still show differences:
Could affect production deployments across different infrastructures
Potential Causes:
Different CPU architectures (x86 vs. ARM)
Different BLAS/LAPACK implementations
OS-specific optimizations
Compiler-specific floating-point optimizations
Suggested Solutions to Investigate:
More aggressive precision control beyond sklearn's 16-decimal option
Implementation of deterministic mode that sacrifices some performance for better consistency
Platform detection with environment-specific reference values
Custom normalization/scaling approaches that are more robust to platform differences
Related PR:
PR #217 worked around this by making consistency tests platform-specific, but we should investigate a more fundamental solution.
Priority:
Medium - This is not breaking functionality but affects reproducibility
The text was updated successfully, but these errors were encountered:
noahho
changed the title
Investigate and improve cross-platform prediction consistency
[Long-term] Investigate and improve cross-platform prediction consistency
Mar 1, 2025
Issue Description
When running TabPFN consistency tests across different platforms (e.g., macOS vs Linux, x86 vs ARM), we've observed significant differences in model predictions.
Current Observations:
Despite using , regression predictions on diabetes dataset still show differences:
Classification predictions seem more stable but still show small variations
Impact:
Potential Causes:
Suggested Solutions to Investigate:
Related PR:
PR #217 worked around this by making consistency tests platform-specific, but we should investigate a more fundamental solution.
Priority:
Medium - This is not breaking functionality but affects reproducibility
The text was updated successfully, but these errors were encountered: