Skip to content

Commit 3cdc72a

Browse files
committed
Speed up inference ~4x for 7B model
Problem: - inference for 7B model is slow. Solution: - unroll the loop in matmul to perform 4 operation in parallel with simd. Result (with float16): - before: 16tok/s - after: 71tok/s
1 parent f565089 commit 3cdc72a

File tree

1 file changed

+6
-2
lines changed

1 file changed

+6
-2
lines changed

run.c

+6-2
Original file line numberDiff line numberDiff line change
@@ -193,8 +193,12 @@ void matmul(float* xout, float* x, float* w, int n, int d) {
193193
#pragma omp parallel for
194194
for (int i = 0; i < d; i++) {
195195
float val = 0.0f;
196-
for (int j = 0; j < n; j++) {
197-
val += w[i * n + j] * x[j];
196+
const int i_n = i * n;
197+
for (int j = 0; j < n; j+=4) {
198+
val += w[i_n + j] * x[j];
199+
val += w[i_n + j + 1] * x[j + 1];
200+
val += w[i_n + j + 2] * x[j + 2];
201+
val += w[i_n + j + 3] * x[j + 3];
198202
}
199203
xout[i] = val;
200204
}

0 commit comments

Comments
 (0)