Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

自定义数据集微调internlm2_5_7b_chat注意力shape报错 #996

Open
Stardust-y opened this issue Feb 26, 2025 · 1 comment
Open

自定义数据集微调internlm2_5_7b_chat注意力shape报错 #996

Stardust-y opened this issue Feb 26, 2025 · 1 comment

Comments

@Stardust-y
Copy link

[rank0]: Traceback (most recent call last):
[rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/xtuner/tools/train.py", line 360, in
[rank0]: main()
[rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/xtuner/tools/train.py", line 356, in main
[rank0]: runner.train()
[rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 1200, in train
[rank0]: model = self.train_loop.run() # type: ignore
[rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/mmengine/runner/loops.py", line 273, in run
[rank0]: self.runner.call_hook('before_train')
[rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 1271, in call_hook
[rank0]: getattr(hook, fn_name)(self, **kwargs)
[rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/xtuner/engine/hooks/evaluate_chat_hook.py", line 234, in before_train
[rank0]: self._generate_samples(runner, max_new_tokens=50)
[rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/xtuner/engine/hooks/evaluate_chat_hook.py", line 223, in _generate_samples
[rank0]: self._eval_language(runner, model, device, max_new_tokens,
[rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/xtuner/engine/hooks/evaluate_chat_hook.py", line 181, in _eval_language
[rank0]: generation_output = model.generate(
[rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/peft/peft_model.py", line 1491, in generate
[rank0]: outputs = self.base_model.generate(*args, **kwargs)
[rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/transformers/generation/utils.py", line 2223, in generate
[rank0]: result = self._sample(
[rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/transformers/generation/utils.py", line 3214, in _sample
[rank0]: outputs = model_forward(**model_inputs, return_dict=True)
[rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/xmyu/.cache/huggingface/modules/transformers_modules/internlm2_5-7b-chat/modeling_internlm2.py", line 1215, in forward
[rank0]: outputs = self.model(
[rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/xmyu/.cache/huggingface/modules/transformers_modules/internlm2_5-7b-chat/modeling_internlm2.py", line 1010, in forward
[rank0]: layer_outputs = decoder_layer(
[rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/xmyu/.cache/huggingface/modules/transformers_modules/internlm2_5-7b-chat/modeling_internlm2.py", line 744, in forward
[rank0]: hidden_states, self_attn_weights, present_key_value = self.attention(
[rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/xmyu/anaconda3/envs/xtuner-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/xmyu/.cache/huggingface/modules/transformers_modules/internlm2_5-7b-chat/modeling_internlm2.py", line 343, in forward
[rank0]: attn_weights = attn_weights + causal_mask
[rank0]: RuntimeError: The size of tensor a (41) must match the size of tensor b (40) at non-singleton dimension 3
[rank0]:[W227 03:06:08.848877297 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())

打印了关键张量发现,再第一轮sample batch_size=32后,再继续训练seq_length 变为1,导致上述attention计算异常,是否是版本不匹配?torch=2.5.1, transformers=4.49.0

@Stardust-y
Copy link
Author

02/27 03:06:07 - mmengine - INFO - before_train in EvaluateChatHook.
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 40, 4096]) torch.Size([1, 1, 40, 40])
shape torch.Size([1, 1, 40, 40]) torch.Size([1, 32, 40, 40]) torch.Size([1, 1, 40, 40])
hidden s, am torch.Size([1, 1, 4096]) torch.Size([1, 1, 1, 40])
shape torch.Size([1, 1, 1, 40]) torch.Size([1, 32, 1, 41]) torch.Size([1, 1, 1, 40])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant