Skip to content

Commit e270c6e

Browse files
authored
Update README.md: add mention of -f unroll loops option for gcc
1 parent def12a2 commit e270c6e

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

README.md

+2
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,8 @@ The fastest throughput I saw so far on my MacBook Air (M1) so far is with `make
144144

145145
You can also experiment with replacing `gcc` with `clang`.
146146

147+
If compiling with gcc, try experimenting with `-funroll-all-loops`, see PR [#183](https://github.com/karpathy/llama2.c/pull/183)
148+
147149
### OpenMP
148150
Big improvements can also be achieved by compiling with OpenMP, which "activates" the `#pragma omp parallel for` inside the matmul and attention, allowing the work in the loops to be split up over multiple processors.
149151
You'll need to install the OpenMP library and the clang compiler first (e.g. `apt install clang libomp-dev` on ubuntu). I was not able to get improvements from OpenMP on my MacBook, though. Then you can compile with `make runomp`, which does:

0 commit comments

Comments
 (0)