π»
π Machine Learning Engineer | NLP & LLM
π Economist | Empirical & Behavioral
π PhD | Decision Science & Managerial Economics
Pinned Loading
-
Logic-RL-Lite
Logic-RL-Lite PublicLightweight replication study of DeepSeek-R1-Zero. Interesting findings include "No Aha Moment", "Longer CoT β Accuracy", and "Language Mixing in Instruct Models".
Python 3
-
DeepEnlighten
DeepEnlighten PublicPure RL without SFT to post-train base models for social reasoning capabilities. Lightweight replication of DeepSeek-R1-Zero with Social IQa dataset.
Python 1
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.