bugfix: Respect n_predict=-2 in server (#12264) #12323

ishaangandhi · 2025-03-10T22:43:38Z

This pull request fixes issue #12264: Eval bug: server API endpoint not respecting n_predict with -2 (until context filled).

Previously, if you set n_predict to -2, the server would ignore that it had a special meaning, and immediately stop producing tokens with reason as length.

curl --location 'http://localhost:8080/v1/chat/completions' \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer no-key' \
    --data '{
    "messages": [
        {
            "role": "user",
            "content": "Write a minimum 5,000 word essay (30+ pages) on the history of the United States, starting with the American Revolution."
        }
    ],
    "n_predict": -2
    }'

After the change, we get this (correct) output:

{
  "choices": [
    {
      "finish_reason": "length",
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "<think>\nAlright, I'm supposed to write a minimum 5,000-word essay on the history of the United States,...His leadership was a symbol of unity and cooperation, and he was also seen as a symbol of unity that changed the way the colonies behaved.\n\n"
      }
    }
  ],
  "created": 1741646088,
  "model": "gpt-3.5-turbo",
  "system_fingerprint": "b4869-2c9f833d",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 4096,
    "prompt_tokens": 34,
    "total_tokens": 4130
  },
  "id": "chatcmpl-LFedbjPLZLJyrIP4mc9ax0uuXLMaX4a9",
  "timings": {
    "prompt_n": 32,
    "prompt_ms": 165.535,
    "prompt_per_token_ms": 5.17296875,
    "prompt_per_second": 193.31259250309603,
    "predicted_n": 4096,
    "predicted_ms": 79655.512,
    "predicted_per_token_ms": 19.447146484375,
    "predicted_per_second": 51.42142580164446
  }
}

We now indeed stop for length, but only after producing 4096 completion tokens.

ngxson · 2025-03-11T13:45:51Z

Could you add a small test case for it? See server/tests/test_completion.py

ishaangandhi · 2025-03-11T14:31:52Z

@ngxson Done!

ishaangandhi added 2 commits March 10, 2025 18:30

Respect n_predict=-2 in server

6777332

Remove test.sh

7f82025

ishaangandhi requested a review from ngxson as a code owner March 10, 2025 22:43

github-actions bot added examples server labels Mar 10, 2025

Add test that when n_predict=-2 predicted_n==n_ctx

ba1aed3

github-actions bot added the python python script changes label Mar 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugfix: Respect n_predict=-2 in server (#12264) #12323

bugfix: Respect n_predict=-2 in server (#12264) #12323

ishaangandhi commented Mar 10, 2025

ngxson commented Mar 11, 2025

ishaangandhi commented Mar 11, 2025

bugfix: Respect n_predict=-2 in server (#12264) #12323

Are you sure you want to change the base?

bugfix: Respect n_predict=-2 in server (#12264) #12323

Conversation

ishaangandhi commented Mar 10, 2025

ngxson commented Mar 11, 2025

ishaangandhi commented Mar 11, 2025