SYNC Server LLM is a gRPC-based server that performs document retrieval and summarization. It leverages Qdrant for vector search and OpenAI models to generate summaries of retrieved content based on user-provided keywords.
git clone --recurse-submodules https://github.com/NCTU-SYNC/sync-server-llm.git
cd sync-server-llm
uv sync --no-dev --frozen
uv run gen-protos
Before running the server, you need to:
-
Configure the server settings in
configs/config.toml
-
Create a
.env
file with the following environment variables:Variable Description OPENAI_API_KEY
Your ChatGPT API key QDRANT_HOST
The Qdrant host address QDRANT_PORT
The Qdrant host REST API port QDRANT_COLLECTION
The Qdrant collection name
You can run SYNC Server LLM using one of the following methods:
uv run scripts/serve.py --config configs/config.toml
Notes:
- Make sure to set up and run the Qdrant server before starting
-
Build the Docker image:
docker build -t sync/backend-llm .
-
Run the container:
docker run -p 50051:50051 \ --env-file .env \ -v $(pwd)/path/to/configs:/app/configs/config.toml \ -v $(pwd)/path/to/hf_cache:/tmp/llama_index \ sync/backend-llm
Notes:
- For Windows users, add
--gpus=all
to use GPU capabilities (requires Docker with GPU support) - We strongly recommend mounting the
hf_cache
directory to avoid re-downloading Hugging Face models on container restart - Make sure to set up and run the Qdrant server before starting
- For Windows users, add
A docker-compose.yaml
file is included in the repository to simplify deployment with both the server and Qdrant database.
-
Build the services:
docker-compose build
-
Start the services:
docker-compose up -d
To test the server, you can use the provided client example:
uv run scripts/client.py
Refer to the protobuf files in the protos/
directory for the features provided by the server.