[Long-term] Investigate and improve cross-platform prediction consistency #223

noahho · 2025-03-01T17:04:57Z

Issue Description

When running TabPFN consistency tests across different platforms (e.g., macOS vs Linux, x86 vs ARM), we've observed significant differences in model predictions.

Current Observations:

Despite using , regression predictions on diabetes dataset still show differences:
- On macOS (ARM):
- On Linux CI:
- Difference: ~2.34 (about ~1.6% relative difference)
Classification predictions seem more stable but still show small variations

Impact:

Makes it difficult to have reproducible research/benchmarks across platforms
Requires platform-specific consistency tests (as implemented in PR Add model consistency tests #217)
Could affect production deployments across different infrastructures

Potential Causes:

Different CPU architectures (x86 vs. ARM)
Different BLAS/LAPACK implementations
OS-specific optimizations
Compiler-specific floating-point optimizations

Related PR:

PR #217 worked around this by making consistency tests platform-specific, but we should investigate a more fundamental solution.

Priority:

Medium - This is not breaking functionality but affects reproducibility

noahho changed the title ~~Investigate and improve cross-platform prediction consistency~~ [Long-term] Investigate and improve cross-platform prediction consistency Mar 1, 2025

noahho mentioned this issue Mar 1, 2025

Fix type checking issues across multiple files #224

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Long-term] Investigate and improve cross-platform prediction consistency #223

[Long-term] Investigate and improve cross-platform prediction consistency #223

noahho commented Mar 1, 2025

[Long-term] Investigate and improve cross-platform prediction consistency #223

[Long-term] Investigate and improve cross-platform prediction consistency #223

Comments

noahho commented Mar 1, 2025

Issue Description

Current Observations:

Impact:

Potential Causes:

Suggested Solutions to Investigate:

Related PR:

Priority: