Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: build llama.cpp + ggml-qnn in pure command line mode on x86-64 Windows #12215

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

zhouwg
Copy link
Contributor

@zhouwg zhouwg commented Mar 6, 2025

this PR introduce a verified & concise approach of build llama.cpp for target x86-64 Windows and target aarch64 Windows(WoA, Windows on ARM) in pure command line mode on host x86-64 Windows. the purpose of this PR is to make the workflow simple and the complex&huge VisualStudio2022 IDE is not required.

  1. official approach of build llama.cpp on x86-64 Windows

as official doc mentioned: Please develop/build on Windows for ARM according to the llama.cpp build instructions, section "Building for Windows (x86, x64 and arm64) with MSVC or clang as compilers" with clang as c/c++ compiler (MSVC is no longer supported for llama.cpp on Windows for ARM because of the arm CPU Q4_0 optimization inline code).

there are some problems with this approach and this approach seems doesn't make sense from point of view of Linux programmer:

  • a simple thing on Linux such as configure CMake in command line mode or script is not easy on Windows(because I know nothing about Windows programming)
  • lots of mysterious compile errors with MS's compiler and toolchain for a same llam.cpp derived project which can be easily built for Linux and Android
  • have to install and setup a very big IDE on Windows and I don't know how this huge IDE works
    Screenshot from 2025-03-03 09-23-27
  1. build llama.cpp for x86-64 Windows through cygwin on x86-64 Windows(patch in this PR is required for make gcc-15 happy)
  • download and install cygwin accordingly(make, git, cmake, gcc, g++ must be selected) on Windows from https://www.cygwin.com/
  • follow the following steps to verify this approach in a cygwin command line prompt
  git clone https://github.com/kantv-ai/ggml-qnn
  cd ggml-qnn
  git checkout build_fix
  ./scripts/build-run-windows.sh help
  ./scripts/build-run-windows.sh build_x86(build and run both verified)        

Fig-1:cygwin on x86-64 Windows10
Screenshot from 2025-03-07 15-53-04

Screenshot from 2025-03-05 21-39-38

Fig-2: build llama.cpp for x86-64 Windows through cygwin on x86-64 Windows without VS IDE
we can see the inference performance is good(llama-cli.exe was generated by gcc15).
https://github.com/user-attachments/assets/9e392059-883e-404c-aa99-9383ff444871

  1. build llama.cpp for x86-64 Windows through llvm-mingw on host x86-64 Windows
  • download the customized/dedicated toolchain llvm-mingw-20250305-ggml-ucrt-x86_64.zip from https://github.com/kantv-ai/toolchain and unzip it to C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\

open a Windows command line prompt

set PATH=C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\bin;C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\Git\cmd;C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\CMake\bin;%PATH%;
git clone https://github.com/kantv-ai/ggml-qnn
cd ggml-qnn
git checkout build_fix
cd build_fix
cmake --preset x64-windows-llvm-release -D GGML_OPENMP=OFF -DCMAKE_CXX_FLAGS=-D_WIN32_WINNT=0x602
cmake --build build-x64-windows-llvm-release

or

set PATH=C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\bin;C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\Git\cmd;C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\CMake\bin;%PATH%;
git clone https://github.com/kantv-ai/ggml-qnn
cd ggml-qnn
git checkout build_fix
cd build_fix
scripts\build-x86-windows.bat

we can see that the inference performance of llama-cli.exe is worse than that of cygwin(clang 20.1 VS gcc 15.0).

  1. build llama.cpp + ggml-qnn for target aarch64-windows(WoA,Windows on ARM) through llvm-mingw on host x86-64 Windows

open a Windows command line prompt

set PATH=C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\bin;C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\Git\cmd;C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\CMake\bin;%PATH%;
git clone https://github.com/kantv-ai/ggml-qnn
cd ggml-qnn
git checkout build_fix
cd build_fix
cmake --preset arm64-windows-llvm-release -D GGML_OPENMP=OFF -DGGML_QNN=ON -DCMAKE_CXX_FLAGS=-D_WIN32_WINNT=0x602 -DGGML_QNN_SDK_PATH="C:\\qairt\\2.32.0.250228"
cmake --build build-arm64-windows-llvm-release

or

set PATH=C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\bin;C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\Git\cmd;C:\Program Files\llvm-mingw-20250305-ggml-ucrt-x86_64\CMake\bin;%PATH%;
git clone https://github.com/kantv-ai/ggml-qnn
cd ggml-qnn
git checkout build_fix
cd build_fix
scripts\build-woa-windows.bat

open a cygwin command line prompt to verify build result or verify the built result on a real Snapdragon desktop SoC equipped WoA(Windows on ARM) device.

$ cd build-arm64-windows-llvm-release/bin/
$ file llama-cli.exe
llama-cli.exe: PE32+ executable (console) Aarch64, for MS Windows, 13 sections

5.acknowledgement

  1. conclusion
  • this concise approach works fine and effective for build llama.cpp for target x86-64-windows on host x86-64 Windows(in other words, this is a native compile/build)

  • this concise approach works fine and effective for build llama.cpp + ggml-qnn for target WoA(Windows on ARM) on host x86-64 Windows(in other words, this is a cross compile/build)

  • we can clearly see the difference in inference performance between gcc 15 and clang 20(that's why efforts on gcc-mingw for WoA(Windows on ARM) from some excellent MS's compiler/toolchain engineers is very important)

  • this PR will make workflow more easier and simpler and everything can be reproduced easily by a simple script.

  • provide a customized/dedicated toolchain to build llama.cpp for target WoA(Windows on ARM) on x86-64 Windows, all of this is less then 1G. the complex and huge(about 19G) VS2022 IDE is not required for this build task.

  • I found a minor / potential bug(might-be a typo, can be fixed by rename it to anything non key-words manually. I personally think this potential bug because of gcc 15 in cygwin is too strict) in Qualcomm's latest(2.31.0.250130) QNN SDK on Windows(@linehill):
    Screenshot from 2025-03-07 16-25-51

  1. issue report

issue reports are greatly welcomed and I will provide appropriate feedback and update the toolchain accordingly.

@zhouwg zhouwg marked this pull request as draft March 6, 2025 04:08
@zhouwg zhouwg changed the title [DRAFT]build: build llama.cpp with cygwin on Windows build: build llama.cpp + ggml-qnn in pure command line mode on x86-64 Windows Mar 9, 2025
zhouwg added a commit to kantv-ai/ggml-qnn that referenced this pull request Mar 10, 2025
zhouwg added a commit to kantv-ai/ggml-qnn that referenced this pull request Mar 10, 2025
zhouwg added a commit to kantv-ai/ggml-qnn that referenced this pull request Mar 10, 2025
zhouwg added a commit to kantv-ai/ggml-qnn that referenced this pull request Mar 10, 2025
zhouwg added a commit to kantv-ai/ggml-qnn that referenced this pull request Mar 10, 2025
zhouwg added a commit to kantv-ai/ggml-qnn that referenced this pull request Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant