no response when send too big request #1174

Sherlock-Holo · 2025-03-05T03:23:01Z

Describe the bug

RUST_LOG=trace mistralrs-server --port 20005 --isq Q4K --truncate-sequence plain -m /data/ai/huggingface/DeepSeek-R1-Distill-Qwen-14B

use this to start a mistrals-server, then send a big chat completion request

http POST http://gpu:20005/v1/chat/completions < /tmp/big-data.json

mistrals-server truncate the request, then reply the response

print this log, eat 100% CPU and 10.716G VRAM, but doesn't reply any response

build with

export CUDA_NVCC_FLAGS=-fPIE
cargo b -r --features='cuda cudnn'

GPU is Tesla T4

The text was updated successfully, but these errors were encountered:

Sherlock-Holo added the bug Something isn't working label Mar 5, 2025