Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no response when send too big request #1174

Open
Sherlock-Holo opened this issue Mar 5, 2025 · 0 comments
Open

no response when send too big request #1174

Sherlock-Holo opened this issue Mar 5, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@Sherlock-Holo
Copy link

Describe the bug

RUST_LOG=trace mistralrs-server --port 20005 --isq Q4K --truncate-sequence plain -m /data/ai/huggingface/DeepSeek-R1-Distill-Qwen-14B

use this to start a mistrals-server, then send a big chat completion request

http POST http://gpu:20005/v1/chat/completions < /tmp/big-data.json

big-data.json

Expect

mistrals-server truncate the request, then reply the response

Happened

print this log, eat 100% CPU and 10.716G VRAM, but doesn't reply any response

mistral.log

Latest commit or version

b73e2e9

build with

export CUDA_NVCC_FLAGS=-fPIE
cargo b -r --features='cuda cudnn'

GPU is Tesla T4

@Sherlock-Holo Sherlock-Holo added the bug Something isn't working label Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant