Skip to content

Pure RL without SFT to post-train base models for social reasoning capabilities. Lightweight replication of DeepSeek-R1-Zero with Social IQa dataset.

Notifications You must be signed in to change notification settings

DolbyUUU/DeepEnlighten

About

Pure RL without SFT to post-train base models for social reasoning capabilities. Lightweight replication of DeepSeek-R1-Zero with Social IQa dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published