build: build llama.cpp + ggml-qnn in pure command line mode on x86-64 Windows #12215

zhouwg · 2025-03-06T01:14:22Z

this PR introduce a verified & concise approach of build llama.cpp for target x86-64 Windows and target aarch64 Windows(WoA, Windows on ARM) in pure command line mode on host x86-64 Windows. the purpose of this PR is to make the workflow simple and the complex&huge VisualStudio2022 IDE is not required.

official approach of build llama.cpp on x86-64 Windows

as official doc mentioned: Please develop/build on Windows for ARM according to the llama.cpp build instructions, section "Building for Windows (x86, x64 and arm64) with MSVC or clang as compilers" with clang as c/c++ compiler (MSVC is no longer supported for llama.cpp on Windows for ARM because of the arm CPU Q4_0 optimization inline code).

there are some problems with this approach and this approach seems doesn't make sense from point of view of Linux programmer:

a simple thing on Linux such as configure CMake in command line mode or script is not easy on Windows(because I know nothing about Windows programming)
lots of mysterious compile errors with MS's compiler and toolchain for a same llam.cpp derived project which can be easily built for Linux and Android
have to install and setup a very big IDE on Windows and I don't know how this huge IDE works

build llama.cpp for x86-64 Windows through cygwin on x86-64 Windows(patch in this PR is required for make gcc-15 happy)

download and install cygwin accordingly(make, git, cmake, gcc, g++ must be selected) on Windows from https://www.cygwin.com/
follow the following steps to verify this approach in a cygwin command line prompt

  git clone https://github.com/kantv-ai/ggml-qnn
  cd ggml-qnn
  git checkout build_fix
  ./scripts/build-run-windows.sh help
  ./scripts/build-run-windows.sh build_x86(build and run both verified)

Fig-1:cygwin on x86-64 Windows10

Fig-2: build llama.cpp for x86-64 Windows through cygwin on x86-64 Windows without VS IDE
we can see the inference performance is good(llama-cli.exe was generated by gcc15).
https://github.com/user-attachments/assets/9e392059-883e-404c-aa99-9383ff444871

build llama.cpp for x86-64 Windows through llvm-mingw on host x86-64 Windows

download the customized/dedicated toolchain llvm-mingw-20250305-ggml-ucrt-x86_64.zip from https://github.com/kantv-ai/toolchain and unzip it to C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\

open a Windows command line prompt

set PATH=C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\bin;C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\Git\cmd;C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\CMake\bin;%PATH%;

git clone https://github.com/kantv-ai/ggml-qnn
cd ggml-qnn
git checkout build_fix
cd build_fix
cmake --preset x64-windows-llvm-release -D GGML_OPENMP=OFF -DCMAKE_CXX_FLAGS=-D_WIN32_WINNT=0x602
cmake --build build-x64-windows-llvm-release

or

set PATH=C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\bin;C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\Git\cmd;C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\CMake\bin;%PATH%;

git clone https://github.com/kantv-ai/ggml-qnn
cd ggml-qnn
git checkout build_fix
cd build_fix
scripts\build-x86-windows.bat

we can see that the inference performance of llama-cli.exe is worse than that of cygwin(clang 20.1 VS gcc 15.0).

build llama.cpp + ggml-qnn for target aarch64-windows(WoA,Windows on ARM) through llvm-mingw on host x86-64 Windows

download and install Qualcomm QNN SDK on Windows accordingly from https://www.qualcomm.com/developer/software/qualcomm-ai-engine-direct-sdk, put them in C:\qairt\2.32.0.250228 (as of 03/10/2025, the latest QNN SDK is 2.32.0.250228, pls modify this accordingly)
download the customized/dedicated toolchain llvm-mingw-20250305-ggml-ucrt-x86_64.zip from https://github.com/kantv-ai/toolchain and unzip it to C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\

open a Windows command line prompt

set PATH=C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\bin;C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\Git\cmd;C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\CMake\bin;%PATH%;

git clone https://github.com/kantv-ai/ggml-qnn
cd ggml-qnn
git checkout build_fix
cd build_fix
cmake --preset arm64-windows-llvm-release -D GGML_OPENMP=OFF -DGGML_QNN=ON -DCMAKE_CXX_FLAGS=-D_WIN32_WINNT=0x602 -DGGML_QNN_SDK_PATH="C:\\qairt\\2.32.0.250228"
cmake --build build-arm64-windows-llvm-release

or

set PATH=C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\bin;C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\Git\cmd;C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\CMake\bin;%PATH%;

git clone https://github.com/kantv-ai/ggml-qnn
cd ggml-qnn
git checkout build_fix
cd build_fix
scripts\build-woa-windows.bat

open a cygwin command line prompt to verify build result or verify the built result on a real Snapdragon desktop SoC equipped WoA(Windows on ARM) device.

$ cd build-arm64-windows-llvm-release/bin/
$ file llama-cli.exe
llama-cli.exe: PE32+ executable (console) Aarch64, for MS Windows, 13 sections

5.acknowledgement

the original tech of the customized/dedicated toolchain llvm-mingw-20250305-ggml-ucrt-x86_64.zip comes from https://github.com/mstorsjo/llvm-mingw/releases, the git.exe and cmake.exe and ninja.exe comes from MS's VS2022. this so-called customized/dedicated toolchain is just a simple pack/re-pack work and the purpose of this customized toolchain is to make workflow easier. accordingly, we can re-produce our self-customized llvm-mingw toolchain easily.
thanks for that I got a meaningful help from https://github.com/Windows-on-ARM-Experiments/mingw-woarm64-build.
thanks for this post:Bug: MinGW build fails to load models with "error loading model: PrefetchVirtualMemory unavailable" #9311

conclusion

this concise approach works fine and effective for build llama.cpp for target x86-64-windows on host x86-64 Windows(in other words, this is a native compile/build)
this concise approach works fine and effective for build llama.cpp + ggml-qnn for target WoA(Windows on ARM) on host x86-64 Windows(in other words, this is a cross compile/build)
we can clearly see the difference in inference performance between gcc 15 and clang 20(that's why efforts on gcc-mingw for WoA(Windows on ARM) from some excellent MS's compiler/toolchain engineers is very important)
this PR will make workflow more easier and simpler and everything can be reproduced easily by a simple script.
provide a customized/dedicated toolchain to build llama.cpp for target WoA(Windows on ARM) on x86-64 Windows, all of this is less then 1G. the complex and huge(about 19G) VS2022 IDE is not required for this build task.
I found a minor / potential bug(might-be a typo, can be fixed by rename it to anything non key-words manually. I personally think this potential bug because of gcc 15 in cygwin is too strict) in Qualcomm's latest(2.31.0.250130) QNN SDK on Windows(@linehill):

issue report

issue reports are greatly welcomed and I will provide appropriate feedback and update the toolchain accordingly.

…gml-org#12215 more clear

zhouwg added 2 commits March 6, 2025 09:07

build: build llama.cpp with cygwin on Windows, without complex IDE

1cec9f1

build: build with cygwin on Windows --- make CI happy

bcbd796

zhouwg marked this pull request as draft March 6, 2025 04:08

zhouwg changed the title ~~[DRAFT]build: build llama.cpp with cygwin on Windows~~ build: build llama.cpp + ggml-qnn in pure command line mode on x86-64 Windows Mar 9, 2025

This was referenced Mar 9, 2025

在8gen3上crash kantv-ai/kantv#261

Open

build: fix build error when build source code on Windows #12157

Closed

zhouwg added a commit to kantv-ai/ggml-qnn that referenced this pull request Mar 10, 2025

ggml-qnn: upgrade QNN SDK and add comments in build scripts to make g…

f6629a4

…gml-org#12215 more clear

zhouwg added a commit to kantv-ai/ggml-qnn that referenced this pull request Mar 10, 2025

ggml-qnn: upgrade QNN SDK and add comments in build scripts to make g…

db04753

…gml-org#12215 more clear

zhouwg added a commit to kantv-ai/ggml-qnn that referenced this pull request Mar 10, 2025

ggml-qnn: upgrade QNN SDK and add comments in build scripts to make g…

f611803

…gml-org#12215 more clear

zhouwg added a commit to kantv-ai/ggml-qnn that referenced this pull request Mar 10, 2025

ggml-qnn: upgrade QNN SDK and add comments in build scripts to make g…

a6c8a01

…gml-org#12215 more clear

zhouwg added a commit to kantv-ai/ggml-qnn that referenced this pull request Mar 10, 2025

ggml-qnn: upgrade QNN SDK and add comments in build scripts to make g…

f27dd63

…gml-org#12215 more clear

zhouwg added a commit to kantv-ai/ggml-qnn that referenced this pull request Mar 10, 2025

ggml-qnn: upgrade QNN SDK and add comments in build scripts to make g…

8e201ad

…gml-org#12215 more clear

build: build llama.cpp on Windows without VS IDE -- the second patch

e916fa0

github-actions bot added the examples label Mar 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build: build llama.cpp + ggml-qnn in pure command line mode on x86-64 Windows #12215

build: build llama.cpp + ggml-qnn in pure command line mode on x86-64 Windows #12215

zhouwg commented Mar 6, 2025 •

edited

Loading

build: build llama.cpp + ggml-qnn in pure command line mode on x86-64 Windows #12215

Are you sure you want to change the base?

build: build llama.cpp + ggml-qnn in pure command line mode on x86-64 Windows #12215

Conversation

zhouwg commented Mar 6, 2025 • edited Loading

zhouwg commented Mar 6, 2025 •

edited

Loading