-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[RFC][BPF] Support Jump Table #133856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[RFC][BPF] Support Jump Table #133856
Conversation
@aspsk As we discussed in LSFMMBPF, here is the implementation for llvm jump table support. Please take a look and try libbpf/kernel implementations. Let me know if you hit any issues. |
Don't bother. x86 is doing it to save a byte in encoding. This technique doesn't apply to bpf isa. |
|
||
let isIndirectBranch = 1 in { | ||
def JX : JMP_IND<BPF_JA, "gotox", [(brind i64:$dst)]>; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice to see how it should be done, I just had hardcoded it in my test branch: aspsk@98773c6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yonghong-song! I will test this, match with the verification part, and post my results in this PR
@@ -65,10 +65,11 @@ BPFTargetLowering::BPFTargetLowering(const TargetMachine &TM, | |||
|
|||
setOperationAction(ISD::BR_CC, MVT::i64, Custom); | |||
setOperationAction(ISD::BR_JT, MVT::Other, Expand); | |||
setOperationAction(ISD::BRIND, MVT::Other, Expand); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this does remove restriction to not produce indirect jumps?
Is there a way to control if we want to generate indirect jumps "in general" vs., say, "only for large switches"? (Or even only for a particular switch?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this does remove restriction to not produce indirect jumps?
Yes, we do not want to expand 'brind', rather we will do pattern matching with 'brind'.
Is there a way to control if we want to generate indirect jumps "in general" vs., say, "only for large switches"? (Or even only for a particular switch?)
Good point. Let me do some experiments with a flag for this. I am not sure whether I could do 'only for a particular switch', but I will do some investigation. Hopefully can find a s solution for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an option to control how many cases in a switch statement to use jump table. The default is 4 cases. But you can change it with additional clang option, e.g., the minimum number of cases must be 6, then
clang ... -mllvm -bpf-min-jump-table-entries=6
I checked other targets, there are no control for a specific switch. So I think we do not need them for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks!
@yonghong-song could you please elaborate on this? How exactly is to classify those into per-table? |
The below is an example for test_tc_tunnel.bpf.o with
The above .rodata is what you really care. You can also find all .rodata relocations happen in decap and .text sections.
You then need to go through sections 'decap' and '.text' for their .rodata relocations.
It corresponds to insn 7 (0x38/8 = 7).
In the above 'r3 = 0x80' means the relocation starts 0x80 at .rodata section. You need to scan ALL such relocations in .text and decap sections and with that you can sort based on start of each relocation. After that, you will be able to calculate each relocation size. After you calculated each relocation size (for .rodata section), you need to check whether a particular relocation is for gotox or something else. So you need to go backwords to scan. For example,
You find a gotox insn with target r2, then you need to go back and find 'r2 = *(u64 *)(r2 + 0x0)' and then 'r2 += r3' and then 'r2 = 0x140 ll'. The above code pattern is gernated by llvm and should be generally true for jump table implementation. And you will be certain that the table for this particular gotox will be in offset 0x140 of .rodata section. The size of the table is already calculated based on the previous mechanism by scanning all .rodata relocations in .text and decap sections. |
I am looking into how to automate this properly (I have a really hacky PoC test working with this version of llvm and my custom test). It looks simpler with explicit jump tables (when I take an address of a label and store in an array), because then I can just push values to a custom section. Will post updates here. |
I find a llvm option
This way, you just need to scan related code section. As long as it |
This is one test failure like below:
The reason should be due to my unconditional enabling |
Thanks @yonghong-song, that size/offset section is really useful! This looks sufficient for me to continue with a PoC.
Unfortunately, I do, this is required for verification. For indirect jumps to work, two things should be verified:
The So, in order to construct a verifiable program, libbpf should:
(Haven't checked yet for real, but this looks to be enough for "custom", e.g., user-defined, jump tables to work. Just declare it as |
You are right. Verification does need to connect jump table map and gotox insn.
Backtrack certainly work. But maybe there is an alternative not to do backtrack.
Your user-defined jump table may work. But it would be great if we can just allow the current common switch statements from code cleanness and developer productivity. |
Right, this is exactly what I've meant by "backtrack". Looks like for |
Yes, libbpf does not need to do verifier work. The range analysis should be done in verifier. |
Hi @yonghong-song! I was trying different switch variants, simple ones work like magic, so we're definitely going the right direction. One simple case fails for me though. Namely, in the example below LLVM generates an unreachable instruction. Could you take a look please? An example source program is
Then the object file looks like
Now, the jump table is
And the check
makes sure that And this makes the instruction
unreachable. |
I suspect it won't be easy to avoid this on llvm side. Probably better to teach verifier to ignore those. |
Ok, thanks, will do this for now |
Update. I have a patch for kernel + libbpf which uses this LLVM and which passes all my new selftests + all (but one) standard bpf selftests which are compiled to use So far only one selftest fails ( |
✅ With the latest revision this PR passed the C/C++ code formatter. |
Thanks for the update. When trying your above example
I found a problem and just added another commit to fix the problem. The issue is due to llvm machine-sink pass. The implementation is similar to X86 (X86InstrInfo::getJumpTableIndex()). See the top commit (commit 4) for more details. |
Thanks @yonghong-song! I will test your latest changes over this weekend. (The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to modify the ASMParser also?
llvm-project/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
Lines 228 to 233 in f2e62cf
static bool isValidIdAtStart(StringRef Name) { | |
return StringSwitch<bool>(Name.lower()) | |
.Case("if", true) | |
.Case("call", true) | |
.Case("callx", true) | |
.Case("goto", true) |
Right, need to add gotox as well. Will fix. Thanks! |
Here's the kernel side which works with this LLVM: https://lore.kernel.org/bpf/[email protected]/ The following selftests contain indirect jumps (and pass):
A new selftest |
Thanks @aspsk I will also take a look at the kernel patch. Also, the current patch has some conflicts with latest 'main' branch. I will rebase and repost the new llvm patch after doing some testing. |
NOTE: We probably need cpu v5 or other flags to enable this feature. We can add it later when necessary. This patch adds jump table support. A new insn 'gotox <reg>' is added to allow goto through a register. The register represents the address in the current section. The function is a concrete example with bpf selftest progs/user_ringbuf_success.c. Compilation command line to generate .s file: ============================================= clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian \ -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include \ -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf \ -I/home/yhs/work/bpf-next/tools/include/uapi \ -I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -std=gnu11 \ -fno-strict-aliasing -Wno-compare-distinct-pointer-types \ -idirafter /home/yhs/work/llvm-project/llvm/build.21/Release/lib/clang/21/include \ -idirafter /usr/local/include -idirafter /usr/include \ -DENABLE_ATOMICS_TESTS -O2 -S progs/user_ringbuf_success.c \ -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/user_ringbuf_success.bpf.o.s \ --target=bpf -mcpu=v3 The related assembly: read_protocol_msg: ... r3 <<= 3 r1 = .LJTI1_0 ll r1 += r3 r1 = *(u64 *)(r1 + 0) gotox r1 LBB1_4: r1 = *(u64 *)(r0 + 8) goto LBB1_5 LBB1_7: r1 = *(u64 *)(r0 + 8) goto LBB1_8 LBB1_9: w1 = *(u32 *)(r0 + 8) r1 <<= 32 r1 s>>= 32 r2 = kern_mutated ll r3 = *(u64 *)(r2 + 0) r3 *= r1 *(u64 *)(r2 + 0) = r3 goto LBB1_11 LBB1_6: w1 = *(u32 *)(r0 + 8) r1 <<= 32 r1 s>>= 32 LBB1_5: ... .section .rodata,"a",@progbits .p2align 3, 0x0 .LJTI1_0: .quad LBB1_4 .quad LBB1_6 .quad LBB1_7 .quad LBB1_9 ... publish_next_kern_msg: ... r6 <<= 3 r1 = .LJTI6_0 ll r1 += r6 r1 = *(u64 *)(r1 + 0) gotox r1 LBB6_3: ... LBB6_5: ... LBB6_6: ... LBB6_4: ... .section .rodata,"a",@progbits .p2align 3, 0x0 .LJTI6_0: .quad LBB6_3 .quad LBB6_4 .quad LBB6_5 .quad LBB6_6 You can see in the above .LJTI1_0 and .LJTI6_0 are actually jump table targets and these two jump tables are used in insns so they can get proper jump table target with gotox insn. Now let us look at sections in .o file ======================================= For example, [ 6] .rodata PROGBITS 0000000000000000 000740 0000d6 00 A 0 0 8 [ 7] .rel.rodata REL 0000000000000000 003860 000080 10 I 39 6 8 [ 8] .llvm_jump_table_sizes LLVM_JT_SIZES 0000000000000000 000816 000010 00 0 0 1 [ 9] .rel.llvm_jump_table_sizes REL 0000000000000000 0038e0 000010 10 I 39 8 8 ... [14] .llvm_jump_table_sizes LLVM_JT_SIZES 0000000000000000 000958 000010 00 0 0 1 [15] .rel.llvm_jump_table_sizes REL 0000000000000000 003970 000010 10 I 39 14 8 With llvm-readelf dump section 8 and 14: $ llvm-readelf -x 8 user_ringbuf_success.bpf.o Hex dump of section '.llvm_jump_table_sizes': 0x00000000 00000000 00000000 04000000 00000000 ................ $ llvm-readelf -x 14 user_ringbuf_success.bpf.o Hex dump of section '.llvm_jump_table_sizes': 0x00000000 20000000 00000000 04000000 00000000 ............... You can see. There are two jump tables: jump table 1: offset 0, size 4 (4 labels) jump table 2: offset 0x20, size 4 (4 labels) Check sections 9 and 15, we can find the corresponding section: Relocation section '.rel.llvm_jump_table_sizes' at offset 0x38e0 contains 1 entries: Offset Info Type Symbol's Value Symbol's Name 0000000000000000 0000000a00000002 R_BPF_64_ABS64 0000000000000000 .rodata Relocation section '.rel.llvm_jump_table_sizes' at offset 0x3970 contains 1 entries: Offset Info Type Symbol's Value Symbol's Name 0000000000000000 0000000a00000002 R_BPF_64_ABS64 0000000000000000 .rodata and confirmed that the relocation is against '.rodata'. Dump .rodata section: 0x00000000 a8000000 00000000 10010000 00000000 ................ 0x00000010 b8000000 00000000 c8000000 00000000 ................ 0x00000020 28040000 00000000 00050000 00000000 (............... 0x00000030 70040000 00000000 b8040000 00000000 p............... 0x00000040 44726169 6e207265 7475726e 65643a20 Drain returned: So we can get two jump tables: .rodata offset 0, # of lables 4: 0x00000000 a8000000 00000000 10010000 00000000 ................ 0x00000010 b8000000 00000000 c8000000 00000000 ................ .rodata offset 0x200, # of lables 4: 0x00000020 28040000 00000000 00050000 00000000 (............... 0x00000030 70040000 00000000 b8040000 00000000 p............... This way, you just need to scan related code section. As long as it matches one of jump tables (.rodata relocation, offset also matching), you do not need to care about gotox at all in libbpf. An option -bpf-min-jump-table-entries is implemented to control the minimum number of entries to use a jump table on BPF. The default value 4, but it can be changed with the following clang option clang ... -mllvm -bpf-min-jump-table-entries=6 where the number of jump table cases needs to be >= 6 in order to use jump table.
Rebased on top of current main branch. No functionality change compared to previous version (in more than a month ago). |
NOTE: We probably need cpu v5 or other flags to enable this feature. We can add it later when necessary.
This patch adds jump table support. A new insn 'gotox ' is added to allow goto through a register. The register represents the address in the current section. The function is a concrete example with bpf selftest progs/user_ringbuf_success.c.
Compilation command line to generate .s file:
The related assembly:
You can see in the above .LJTI1_0 and .LJTI6_0 are actually jump table targets
and these two jump tables are used in insns so they can get proper jump
table target with gotox insn.
Now let us look at sections in .o file
For example,
With llvm-readelf dump section 8 and 14:
You can see. There are two jump tables:
Check sections 9 and 15, we can find the corresponding section:
and confirmed that the relocation is against '.rodata'.
Dump .rodata section:
So we can get two jump tables:
This way, you just need to scan related code section. As long as it
matches one of jump tables (
.rodata
relocation, offset also matching),you do not need to care about gotox at all in libbpf.
An option
-bpf-min-jump-table-entries
is implemented to control the minimumnumber of entries to use a jump table on BPF. The default value 4, but it
can be changed with the following clang option
where the number of jump table cases needs to be >= 6 in order to
use jump table.