Fix issues with Llama HF->NeoX conversion #1345

aurelion-source · 2025-03-10T13:01:29Z

Following fix a GQA issue (#1314) #1315, the GQA code no longer splits heads based on num_q_heads. This PR updates tools/ckpts/convert_hf_llama_to_neox.py to concatenate tp-partitioned q, k, and v weights without per-head splitting.
The current RMSNorm implementation incorrectly adds epsilon to the RMS instead of the variance. Fix is the same as RMSNorm epsilon implementation #1342, but ensures compatibility with partial RMSNorm.

instead of splitting by heads first for GQA - Fixes RMSNorm implementation by adding epsilon to the varience instead of adding it directly to RMS

aflah02 · 2025-03-10T17:57:26Z

Hi @aurelion-source
I think there is a similar issue in the reverse direction (NeoX to HF) caused by the same changes as when I convert a GQA model to HF it starts generating gibberish until I modify the conversion file. The RMS Norm implementation also varies which causes discrepancies b/w Llama HF class and NeoX

aurelion-source added 2 commits March 10, 2025 08:42

- Fixes Hf llama -> neox conversion by simply concatenating q,k,b

1507074

instead of splitting by heads first for GQA - Fixes RMSNorm implementation by adding epsilon to the varience instead of adding it directly to RMS

- Remove RMSNorm extra argument

648996c

aurelion-source self-assigned this Mar 10, 2025

aurelion-source requested a review from Quentin-Anthony as a code owner March 10, 2025 13:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issues with Llama HF->NeoX conversion #1345

Fix issues with Llama HF->NeoX conversion #1345

aurelion-source commented Mar 10, 2025 •

edited

Loading

aflah02 commented Mar 10, 2025

Fix issues with Llama HF->NeoX conversion #1345

Are you sure you want to change the base?

Fix issues with Llama HF->NeoX conversion #1345

Conversation

aurelion-source commented Mar 10, 2025 • edited Loading

aflah02 commented Mar 10, 2025

aurelion-source commented Mar 10, 2025 •

edited

Loading