Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End-to-end training of DeepSeek V3 #942

Open
EugenHotaj opened this issue Mar 7, 2025 · 2 comments
Open

End-to-end training of DeepSeek V3 #942

EugenHotaj opened this issue Mar 7, 2025 · 2 comments
Assignees

Comments

@EugenHotaj
Copy link

Any plans to have a real training script for DSV3?

Right now run.py only has the forward pass on some dummy data on DSV2 so it's unclear how much of DSV3 is supported and whether it actually works.

For the DSV3 forward pass, I was able to run using 32 H200s but had to lower config.max_seq_len quite a bit, otherwise was OOMing when setting up symmetric memory. Would love to be able to train DSV3!

@lessw2020
Copy link
Contributor

Hi @EugenHotaj - yes a full DS v3 training script is coming.

The current PR's are part of an iterative process...more is coming soon!

@EugenHotaj
Copy link
Author

@lessw2020 amazing, looking forward to it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants