build: build llama.cpp + ggml-qnn in pure command line mode on x86-64 Windows #12215
+8
−6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
this PR introduce a verified & concise approach of build llama.cpp for target x86-64 Windows and target aarch64 Windows(WoA, Windows on ARM) in pure command line mode on host x86-64 Windows. the purpose of this PR is to make the workflow simple and the complex&huge VisualStudio2022 IDE is not required.
as official doc mentioned: Please develop/build on Windows for ARM according to the llama.cpp build instructions, section "Building for Windows (x86, x64 and arm64) with MSVC or clang as compilers" with clang as c/c++ compiler (MSVC is no longer supported for llama.cpp on Windows for ARM because of the arm CPU Q4_0 optimization inline code).
there are some problems with this approach and this approach seems doesn't make sense from point of view of Linux programmer:
Fig-1:cygwin on x86-64 Windows10

Fig-2: build llama.cpp for x86-64 Windows through cygwin on x86-64 Windows without VS IDE
we can see the inference performance is good(llama-cli.exe was generated by gcc15).
https://github.com/user-attachments/assets/9e392059-883e-404c-aa99-9383ff444871
open a Windows command line prompt
or
we can see that the inference performance of llama-cli.exe is worse than that of cygwin(clang 20.1 VS gcc 15.0).
open a Windows command line prompt
or
open a cygwin command line prompt to verify build result or verify the built result on a real Snapdragon desktop SoC equipped WoA(Windows on ARM) device.
5.acknowledgement
this concise approach works fine and effective for build llama.cpp for target x86-64-windows on host x86-64 Windows(in other words, this is a native compile/build)
this concise approach works fine and effective for build llama.cpp + ggml-qnn for target WoA(Windows on ARM) on host x86-64 Windows(in other words, this is a cross compile/build)
we can clearly see the difference in inference performance between gcc 15 and clang 20(that's why efforts on gcc-mingw for WoA(Windows on ARM) from some excellent MS's compiler/toolchain engineers is very important)
this PR will make workflow more easier and simpler and everything can be reproduced easily by a simple script.
provide a customized/dedicated toolchain to build llama.cpp for target WoA(Windows on ARM) on x86-64 Windows, all of this is less then 1G. the complex and huge(about 19G) VS2022 IDE is not required for this build task.
I found a minor / potential bug(might-be a typo, can be fixed by rename it to anything non key-words manually. I personally think this potential bug because of gcc 15 in cygwin is too strict) in Qualcomm's latest(2.31.0.250130) QNN SDK on Windows(@linehill):

issue reports are greatly welcomed and I will provide appropriate feedback and update the toolchain accordingly.