-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Qwen model #182
Add Qwen model #182
Conversation
Hi @Sanster, thank you for this. Did you test if quantizing a model works and that inference runs? the problem I ran into in my old PR was that there was some problem with the modeling code that prevented me from quantizing. |
I tried quantizing a model but the model outputs are weird and the eval does not work for this model. At this time, I don't think we can merge this pull request before we can measure that it works after quantizing |
@casper-hansen , I test @Sanster job , when I use " <|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n{query}<|im_end|>\n<|im_start|>assistant\n{answer}<|im_end|>" replace Quant script's part, it works well. |
This seems to work for me now with the right prompt template. Thanks for the PR! (NOTE: Eval is not working currently, but response of Qwen looks good). from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer, TextStreamer
quant_path = "qwen-7b-chat-awq"
# Load model
model = AutoAWQForCausalLM.from_quantized(quant_path, fuse_layers=True)
tokenizer = AutoTokenizer.from_pretrained(quant_path, trust_remote_code=True)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
# Convert prompt to tokens
prompt_template = "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n"
prompt = "You're standing on the surface of the Earth. "\
"You walk one mile south, one mile west and one mile north. "\
"You end up exactly where you started. Where are you?"
tokens = tokenizer(
prompt_template.format(prompt=prompt),
return_tensors='pt'
).input_ids.cuda()
# Generate output
generation_output = model.generate(
tokens,
streamer=streamer,
max_new_tokens=512,
eos_token_id=151645
) |
Modify according to this PR: #78