Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: Respect n_predict=-2 in server (#12264) #12323

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ishaangandhi
Copy link

This pull request fixes issue #12264: Eval bug: server API endpoint not respecting n_predict with -2 (until context filled).

Previously, if you set n_predict to -2, the server would ignore that it had a special meaning, and immediately stop producing tokens with reason as length.

curl --location 'http://localhost:8080/v1/chat/completions' \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer no-key' \
    --data '{
    "messages": [
        {
            "role": "user",
            "content": "Write a minimum 5,000 word essay (30+ pages) on the history of the United States, starting with the American Revolution."
        }
    ],
    "n_predict": -2
    }'

After the change, we get this (correct) output:

{
  "choices": [
    {
      "finish_reason": "length",
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "<think>\nAlright, I'm supposed to write a minimum 5,000-word essay on the history of the United States,...His leadership was a symbol of unity and cooperation, and he was also seen as a symbol of unity that changed the way the colonies behaved.\n\n"
      }
    }
  ],
  "created": 1741646088,
  "model": "gpt-3.5-turbo",
  "system_fingerprint": "b4869-2c9f833d",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 4096,
    "prompt_tokens": 34,
    "total_tokens": 4130
  },
  "id": "chatcmpl-LFedbjPLZLJyrIP4mc9ax0uuXLMaX4a9",
  "timings": {
    "prompt_n": 32,
    "prompt_ms": 165.535,
    "prompt_per_token_ms": 5.17296875,
    "prompt_per_second": 193.31259250309603,
    "predicted_n": 4096,
    "predicted_ms": 79655.512,
    "predicted_per_token_ms": 19.447146484375,
    "predicted_per_second": 51.42142580164446
  }
}

We now indeed stop for length, but only after producing 4096 completion tokens.

@ngxson
Copy link
Collaborator

ngxson commented Mar 11, 2025

Could you add a small test case for it? See server/tests/test_completion.py

@ishaangandhi
Copy link
Author

@ngxson Done!

@github-actions github-actions bot added the python python script changes label Mar 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples python python script changes server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants