-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault for "small" and "medium" self-tests on AMD 7950X #16
Comments
Thanks for the bug report! I recall Ken also had issues with a newer AMD CPU, but we were not able to fully debug the problem. It looks like you are using the AVX512 build mode with GCC 11.4. Do the "teensy", "tiny", "large" and "huge" self-tests work work as expected? Would you mind running Mlucas with GDB so we can see where exactly it is seg faulting. If you have GDB installed, for the "small" self-test just run: In addition, could you try building it with Clang so we could rule out any compiler differences. If you have Clang installed, just remove the existing |
How do I start the 5 self-tests?
|
Thanks for the additional information! That is very helpful.
To start the self-tests, just pass each self-test value to the ./Mlucas -s teensy
./Mlucas -s tiny
./Mlucas -s small
./Mlucas -s medium
./Mlucas -s large
./Mlucas -s huge If you want to run them in GDB instead, just prefix each command with |
I did run all 6, teensy/tiny/large/huge do complete (huge took 15min to complete with around 100% CPU).
|
Thanks for testing them all. Yes, the larger self-tests do take progressively longer, but 15 minutes is actually quite fast for the huge self-test when run single threaded. It looks the both the small and medium self-tests are seg faulting in the same inline assembly. @ldesnogu - Since you are our resident inline assembly expert, do you have any insights as to why this is seg faulting on AMD CPUs? |
I alas have no access to an AMD machine. |
|
I'm afraid you didn't scroll enough. The place where the segfault occurs would start with '=>' For instance:
And when typing that comment, I noticed the '=>' disappeared from the quote... |
I found no "=>", but learned how to query the (big) offset:
|
That's indeed much easier :-) Now I'd like to see register contents: 'i r' |
|
The address is definitely not aligned then. But according to Intel documentation this should cause no issue. Can you please try to dump memory around 0x555d354b77f8? You can play with the 'x/100x address' command. For instance |
Hmm I might have looked at the wrong place about the alignment enforcement: it looks like movaps needs aligned data. |
And in the source code:
We now need to understand why this ends up being unaligned. And the comment or code is buggy: the code uses rbx and the comment mentions rcx. |
(I'm likely misusing this to dump my thoughts, but as we don't have other means of communication...) I tried on an Intel AVX-512 machine and I could reach the offending code but not from the same point. Dumping the pointers add1/add2/add3, I can confirm the access on my machine is aligned diff --git a/src/radix1024_main_carry_loop.h b/src/radix1024_main_carry_loop.h
index 761bd6e..81a0f73 100755
--- a/src/radix1024_main_carry_loop.h
+++ b/src/radix1024_main_carry_loop.h
@@ -266,6 +266,7 @@ normally be getting dispatched to [radix] separate blocks of the A-array, we nee
add1 = &wt1[col +ii]; /* Don't use add0 here, to avoid need to reload main-array address */
add2 = &wt1[co2-1-ii];
add3 = &wt1[co3-1-ii];
+ printf("%p %p %p\n", add1, add2, add3);
// Since use wt1-array in the wtsinit macro, need to fiddle this here:
co2 = co3; // For all data but the first set in each j-block, co2=co3. Thus, after the first block of data is done |
|
@Hermann-SW could you please try the patch above with the printf? |
core dump broke last line of output, so I added fflush():
|
I wonder if the last changes fixed the issue. I doubt it, but it's worth giving it a try. |
I was asked to run "./Mlucas -s tiny" here:
#15 (comment)
Just wanted to create this issue that small and medium dump core.
and
The text was updated successfully, but these errors were encountered: