Skip to content

0.4.1

Latest
Compare
Choose a tag to compare
@MaiziXiao MaiziXiao released this 04 Mar 10:35
· 11 commits to main since this release
5547fd1

OpenCompass v0.4.1 Release Notes

The OpenCompass team is thrilled to announce the release of OpenCompass v0.4.1! This version brings exciting new features, enhancements, and bug fixes to improve your experience. Let’s dive into the highlights and changes!


🌟 Highlights

Omni-Math Support: OpenCompass now supports the Omni-Math dataset, expanding its capabilities in mathematical reasoning. (#1837)
OlympiadBench Benchmark: Added support for the OlympiadBench benchmark, enabling evaluation on advanced problem-solving tasks. (#1841)
HLE Dataset: Introduced the Humanity's Last Exam (HLE) dataset, providing a challenging new benchmark for AI models. (#1902)


🚀 New Features

🔧 Math Verification: Added model post-processor for math verification, enhancing accuracy in mathematical evaluations. (#1881)
🔧 Dataset Repeat & G-Pass Compute: Support for dataset repetition and G-Pass computation for each evaluator, improving flexibility in evaluations. (#1886)
🔧 General Math & LLM Judge Evaluator: Added a general math evaluator and LLM judge evaluator for broader evaluation scenarios. (#1892)
🔧 AIME-24 Evaluation: Supported AIME-24 evaluation for DeepSeek-R1 series, enhancing competition-level benchmarking. (#1888)


📖 Documentation

📝 Broken Links Fix: Fixed broken links in README.md to ensure a smoother onboarding experience. (#1852)
📝 Supported Datasets List: Added a list of supported datasets on the HTML page for better accessibility. (#1850)


🐛 Bug Fixes

🔧 Compatibility Issue: Fixed a compatibility issue to ensure smoother integration with existing workflows. (#1904)
🔧 Hard Configs Fix: Updated hard configs with General GPassK for improved evaluation consistency. (#1906)


⚙ Enhancements and Refactors

Daily Test Updates: Updated daily test scores and scheduler for better CI/CD pipeline reliability. (#1854, #1898)
Benchmark Updates: Updated BigCodeBench, LCBench, and LiveMathBench configurations for enhanced evaluation accuracy. (#1857, #1862, #1826)
OpenAI Model Update: Updated OpenAI models and BigCodeBench configurations for better performance. (#1879)


🎉 Welcome New Contributors

🎊 A warm welcome to our new contributors:

  1. @sudanl for their first contribution in #1841
  2. @Pablohn26 for their first contribution in #1852
  3. @Zhudongsheng75 for their first contribution in #1857

Full Changelog: 0.4.0...0.4.1

Thank you for using OpenCompass! We’re excited to see how you leverage these new features and improvements. Stay tuned for more updates! 🚀