Skip to content

Commit

Permalink
Extend benchmark script suite to CPU profiling with perf and samply
Browse files Browse the repository at this point in the history
This is useful to dig into why exactly C Zopfli has different
performance than Rust Zopfli in a case-by-case basis.
  • Loading branch information
AlexTMjugador committed Feb 26, 2025
1 parent 05e2ddd commit cff69f5
Show file tree
Hide file tree
Showing 8 changed files with 74 additions and 10 deletions.
12 changes: 12 additions & 0 deletions .cargo/config.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Do a best effort at producing builds with frame pointers, which
# are useful for low-overhead, cross-platform, and accurate enough
# stack unwinding by profilers such as perf.
#
# This does not affect downstream Zopfli users.
#
# Related reads:
# - <https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html>
# - <https://fedoraproject.org/wiki/Changes/fno-omit-frame-pointer>
# - <https://pagure.io/fedora-rust/rust2rpm/pull-request/237>
[build]
rustflags = ["-C", "force-frame-pointers=true"]
2 changes: 2 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@ std = [
nightly = ["crc32fast?/nightly"]

[profile.release]
# Generate full debug information for release builds anyway for ease of profiling.
# This does not affect packages that depend on the Zopfli library package
debug = true

# docs.rs uses a nightly toolchain, so it can leverage unstable rustdoc features.
Expand Down
1 change: 0 additions & 1 deletion benchmark-builds/data

This file was deleted.

3 changes: 2 additions & 1 deletion benchmark-builds/.gitignore → test/perf/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@ data/*.gz
data_google
data_rust
zopfli
benchmark*.json
benchmark*.json
profiles
4 changes: 2 additions & 2 deletions benchmark-builds/README.md → test/perf/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## What kind of benchmarks?
## Performance testing and bechmarking

The purpose of this, is to bench the current rust version vs. the [original version](https://github.com/google/zopfli/blob/master/README).
The purpose of this folder is to ease comparing the performance of the current rust version vs. the [original version](https://github.com/google/zopfli/blob/master/README).

The measurement is done using [hyperfine](https://github.com/sharkdp/hyperfine).

Expand Down
8 changes: 4 additions & 4 deletions benchmark-builds/bench.sh → test/perf/bench.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,18 +7,18 @@ set -o pipefail

# Clean output from previous runs
rm -f benchmark-builds*.json
rm -f data/*.gz
rm -f ../data/*.gz
rm -rf data_google data_rust

# Duplicate the input data for the bench execution,
# to avoid any side effect due to either having to
# 'delete the output' or 'overwrite the output' during the bench.
cp -r data data_google
cp -r data data_rust
cp -r ../data data_google
cp -r ../data data_rust

# Bench all input cases individually,
# and store respective results.
for input in $(ls data | grep -v '\.gz$'); do
for input in $(ls ../data | grep -v '\.gz$'); do
printf "Benching with ${input}...\n"
command_name="Google zopfli with '${input}'"
command="zopfli/zopfli data_google/${input}"
Expand Down
5 changes: 3 additions & 2 deletions benchmark-builds/prepare.sh → test/perf/prepare.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ then
fi

# Build Google/zopfli
cd zopfli && make && cd -
CFLAGS='-fno-omit-frame-pointer' CXXFLAGS='-fno-omit-frame-pointer' \
make -C zopfli

# Build zopfli-rs
cd .. && cargo build --release && cd -
cargo build --release
49 changes: 49 additions & 0 deletions test/perf/profile.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#!/bin/sh -eu

# Helper script to record profiles of Zopfli's CPU usage on a Linux computer with `perf`,
# and display them with `samply`. The profiles are not recorded directly with `samply`
# because doing so does not simplify the script much anyway, `perf`'s output format is
# more interoperable with other visualization frontends, and `perf record` exposes
# much more native features and knobs that are useful.

mkdir -p test/perf/profiles

if [ -d /sys/devices/cpu_core ]; then
# For hybrid Intel CPUs (Alder Lake+, with performance and efficiency cores),
# only record cycles for the performance cores Zopfli will be scheduled to,
# as Samply does not have proper support for combining both efficiency and
# performance events
readonly CPU_CYCLES_EVENT='cpu_core/cycles/'
else
readonly CPU_CYCLES_EVENT='cpu-cycles'
fi

case "$1" in
'rust')
readonly BINARY_PATH='target/release/zopfli'
readonly PROFILE_FILE_PREFIX='rust'
cargo build --release;;
'c')
# Run the prepare.sh script first to get the Zopfli sources
readonly BINARY_PATH='zopfli/zopfli'
readonly PROFILE_FILE_PREFIX='c'
make -C zopfli;;
*)
echo 'Invalid Zopfli flavor specified, expected either "rust" or "c"' >&2
exit 1;;
esac
shift

PERF_DATA_FILE="test/perf/profiles/${PROFILE_FILE_PREFIX}_zopfli_$(date +%s).perf.data"
readonly PERF_DATA_FILE

(set -x; perf record \
--call-graph=fp -F 5000 \
--event="$CPU_CYCLES_EVENT" \
-o "$PERF_DATA_FILE" -- "$BINARY_PATH" "$@")

PROCESSED_PROFILE="${TMPDIR:-/tmp}/profile-$(date +%s).bin"
readonly PROCESSED_PROFILE
trap 'rm -f "$PROCESSED_PROFILE" || true' EXIT INT TERM

(set -x; samply import -o "$PROCESSED_PROFILE" "$PERF_DATA_FILE")

0 comments on commit cff69f5

Please sign in to comment.