Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues with Llama HF->NeoX conversion #1345

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

aurelion-source
Copy link
Contributor

@aurelion-source aurelion-source commented Mar 10, 2025

Resolves #1337 and #1342

  • Following fix a GQA issue (#1314) #1315, the GQA code no longer splits heads based on num_q_heads. This PR updates tools/ckpts/convert_hf_llama_to_neox.py to concatenate tp-partitioned q, k, and v weights without per-head splitting.
  • The current RMSNorm implementation incorrectly adds epsilon to the RMS instead of the variance. Fix is the same as RMSNorm epsilon implementation #1342, but ensures compatibility with partial RMSNorm.

  instead of splitting by heads first for GQA
- Fixes RMSNorm implementation by adding epsilon to the varience
  instead of adding it directly to RMS
@aflah02
Copy link
Contributor

aflah02 commented Mar 10, 2025

Hi @aurelion-source
I think there is a similar issue in the reverse direction (NeoX to HF) caused by the same changes as when I convert a GQA model to HF it starts generating gibberish until I modify the conversion file. The RMS Norm implementation also varies which causes discrepancies b/w Llama HF class and NeoX

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

convert_hf_llama_to_neox.py appears incompatible with LLAMA-3.1-70B
2 participants