-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PMP NA4-R test may stuck in an infinite loop on a compliant implementation #577
Comments
PMP granularity is unfortunately an invisible configuration; this is not an
issue confined to this particular test, and we probably need to look at all
the tests.
Running an NA4 test on a model that doesn't support NA4 may be broken. I
guess I'm not terribly surprised...
But it won't get into an infinite loop. The trap handler will exit the test
if the trap signature is overrun (which an infinite trap loop will do).
As long as Sail gives the same answers, that's not optimal, but not fatal
(especially in this case, since it really shouldn't be run at all)
This particular failure is a "hurts if you do that, don't do that" kind of
situation.
A change to the RVTEST_CASE macro conditions would fix that. Models should
be defining a PMP_GRANULARITY variable in their riscv-config YAML file,
and we can write tests that use that fact (e.g. by aligning regions to
those boundaries. I'm pretty sure we haven't done that yet.
It's really hard to test PMPs, given the range of options it has,
(e.g. each entry could have readonly addresses, could not support NA4, or
NAPOT, or TOR, in any combination, or could be locked at reset.)
DV can write tests that work for a particular config, but we don't have the
resources to do this (well) for all possible configurations,
so we concentrate on the most likely (and we have a hard requirement for at
least 6 configurable entries)
Having a higher granularity than NA4 probably isn't too unusual, and we
should look to see how we can configure tests for that.
But that's independent from this test, which works just fine for this. NA4
coverage is nil, but it doesn't support NA4, so that's acceptable.
…On Tue, Dec 10, 2024 at 7:07 AM marcfedorow ***@***.***> wrote:
https://github.com/riscv-non-isa/riscv-arch-test/blob/cd94912fed2aab75d7d5f115b441da0813fdce8d/riscv-test-suite/rv64i_m/pmp64/src/pmp64-NA4-R-priority-level-2.S#L158
Priv spec says the following:
All PMP CSR fields are WARL
Let's assume that DUT has PMP granularity of, e.g., 12 bits (region size
of 0x1000).
Let's assume that if we write 0xYYYYYXXX (e.g. 0x20000212, correcponding
to the address 0x80000828) to PMPADDRx, 0xYYYYY000 will be written (e.g.
0x20000000, corresponding to the address 0x80000000).
Let's assume that if we write 0x91 to the PMPCFGx (NA4 R L), actual value
of 0x99 will be written (NAPOT R L) and ox3ff will be OR'ed to the PMPADDRx
(e.g. now PMP#x now protects the region [0x80000000..0x80000fff]).
Under the following assumptions I believe the above code will lead
execution to the following:
hart 0: 0x0000000080000578 (0x3a022073) csrs pmpcfg0, tp
hart 0: exception trap_instruction_access_fault, epc 0x000000008000057c
hart 0: tval 0x000000008000057c
hart 0: exception trap_instruction_access_fault, epc 0x0000000080000940
hart 0: tval 0x0000000080000940
hart 0: exception trap_instruction_access_fault, epc 0x0000000080000940
hart 0: tval 0x0000000080000940
hart 0: exception trap_instruction_access_fault, epc 0x0000000080000940
In other words, the following will occure:
1. csrs protects the whole region instead of just 4 bytes.
2. While this is the same region that is being execuded right now, the
very next instruction causes instruction access fault.
3. The same is correct for the very first instruction of the trap
handler, so go to step 2.
While I suppose the suggested architecture to be compliant, the test will
never finish being stuck in the infinite loop.
—
Reply to this email directly, view it on GitHub
<#577>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHPXVJXQV4Z7XYZNH72OJT32E37RTAVCNFSM6AAAAABTLMEXI2VHI2DSMVQWIX3LMV43ASLTON2WKOZSG4ZTAMZZGE4DOOI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Unfortunately at least some tests that don't contain "NA4" in their name are broken too, so it is not possible to filter-out NA4 test knowing that they require NA4 support.
That is not true and I've shown that in my example: while the trap handler itself is in the protected area, not a single instruction of it shall be execuded. Executing the first instruction of the trap handler causes exception itself, leading the execution to be infinitely stuck at that point. So to be clear: this is fatal. The trap handler will not exit this test, because no instructions of trap handling are actually executed, because the trap handler itself is in the protected memory region. Because of framework being ugly, such an issue is fatal not just for the test, but for the whole testing. It is still fatal for the test itself, so this is still an issue of the test, not the framework (at least untill framework picks this test for execution). |
The PMP trap handler is in Mmode, and (except for locked entries) immune to
PMP effects.
…On Tue, Dec 10, 2024 at 8:09 PM marcfedorow ***@***.***> wrote:
Running an NA4 test on a model that doesn't support NA4 may be broken.
Unfortunately at least some tests that don't contain "NA4" in their name
are broken too, so it is not possible to filter-out NA4 test knowing that
they require NA4 support.
But it won't get into an infinite loop. The trap handler will exit the test
That is not true and I've shown that in my example: while the trap handler
itself is in the protected area, not a single instruction of it shall be
execuded. Executing the first instruction of the trap handler causes
exception itself, leading the execution to be infinitely stuck at that
point.
Trap handler will not be able to do anything with the signature so the
test will never stop (at least untill being interrupted due to external
cause such as ^C).
This leads to not comparing all the following signatures (even if they are
present) but it's a design flaw of the framework, not a problem of this
test.
So to be clear: this is fatal. The trap handler will *not* exit this
test, because no instructions of trap handling are actually executed,
because the trap handler itself is in the protected memory region.
SAIL results will not match DUT results because either DUT will stuck in
the infinite loop preventing results for this and the following test on DUT
and *any* results on SAIL to appear, or (if the timeouts are implemented)
riscof_XXX.py will kill these tests and riscof itself will fail because of
not having signature file(s) for this test, and that itself will abort the
task so no following signatures are compared.
Because of framework being ugly, such an issue is fatal not just for the
test, but for the whole testing. It is still fatal for the test itself, so
this is still an issue of the test, not the framework (at least untill
framework picks this test for execution).
—
Reply to this email directly, view it on GitHub
<#577 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHPXVJRGJHJWQIBDS6UVKH32E63HRAVCNFSM6AAAAABTLMEXI2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMZTGU4DKNZWGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Unlucky enough, there is a locked region, so trap handler is not immune to PMP effect even being execuded in M-mode riscv-arch-test/riscv-test-suite/rv64i_m/pmp64/src/pmp64-NA4-R-priority-level-2.S Line 105 in cd94912
|
Then we need to fix that one so it isn't executed if NA4 isn't supported.
…On Tue, Dec 10, 2024 at 9:05 PM marcfedorow ***@***.***> wrote:
The PMP trap handler is in Mmode, and (except for locked entries) immune
to PMP effects.
Unlucky enough, there is a locked region, so trap handler is not immune to
PMP effect even being execuded in M-mode
https://github.com/riscv-non-isa/riscv-arch-test/blob/cd94912fed2aab75d7d5f115b441da0813fdce8d/riscv-test-suite/rv64i_m/pmp64/src/pmp64-NA4-R-priority-level-2.S#L105
—
Reply to this email directly, view it on GitHub
<#577 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHPXVJTZHBIN5BKM7XEORIL2E7BY5AVCNFSM6AAAAABTLMEXI2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMZTGY2DKNRQGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
OK, I am now convinced that there is a real bug. This will never even start
executing the test, it will set up the PMP entries, and immediately trap as
soon as the CMP entry is turned on.
I think this triggers a timeout (if I recall correctly), and no signature
is saved, but that does need fixing.
On Tue, Dec 10, 2024 at 9:09 PM Allen Baum ***@***.***>
wrote:
… Then we need to fix that one so it isn't executed if NA4 isn't supported.
On Tue, Dec 10, 2024 at 9:05 PM marcfedorow ***@***.***>
wrote:
> The PMP trap handler is in Mmode, and (except for locked entries) immune
> to PMP effects.
>
> Unlucky enough, there is a locked region, so trap handler is not immune
> to PMP effect even being execuded in M-mode
>
> https://github.com/riscv-non-isa/riscv-arch-test/blob/cd94912fed2aab75d7d5f115b441da0813fdce8d/riscv-test-suite/rv64i_m/pmp64/src/pmp64-NA4-R-priority-level-2.S#L105
>
> —
> Reply to this email directly, view it on GitHub
> <#577 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AHPXVJTZHBIN5BKM7XEORIL2E7BY5AVCNFSM6AAAAABTLMEXI2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMZTGY2DKNRQGA>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
|
I hope this will not be fixed via allowing trap handler to overflow the signature. |
No; this won't be an easy fix, and ensuring there aren't any c.unimp won't
matter in the grand scheme of things.
…On Tue, Dec 10, 2024 at 9:56 PM marcfedorow ***@***.***> wrote:
Then we need to fix that one so it isn't executed if NA4 isn't supported.
I hope this will not be fixed via allowing trap handler to overflow the
signature.
—
Reply to this email directly, view it on GitHub
<#577 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHPXVJU2AT6IB6YCDVAZQ3T2E7HYLAVCNFSM6AAAAABTLMEXI2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMZTG4YDINZZGI>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Does it cause timeout or not purely relies on riscof_XXX.py implementation (i.e. is outside of ATG/riscof/etc control). Adequate implementation of riscof_XXX.py may or may not provide per-test time-outs. If it does not, the whole testing might be killed via external time-out with providing no results at all. If it does trigger a time-out though, riscof will abort execution on the first test that doesn't have signature file(s) to compare. This will fail the whole task, not just one test's signatures' comparison. This behavior should be fixed too, but that's a major issue of the framework, not of the test. |
Please file an issue on that.
…On Tue, Dec 10, 2024 at 10:02 PM marcfedorow ***@***.***> wrote:
I think this triggers a timeout (if I recall correctly), and no signature
is saved, but that does need fixing.
Does it cause timeout or not purely relies on riscof_XXX.py implementation
(i.e. is outside of ATG/riscof/etc control). Adequate implementation of
riscof_XXX.py may or may not provide per-test time-outs. If it does not,
the whole testing might be killed via external time-out with providing no
results at all.
If it does trigger a time-out though, riscof will abort execution on the
first test that doesn't have signature file(s) to compare. This will fail
the whole task, not just one test's signatures' comparison. This behavior
should be fixed too, but that's a major issue of the framework, not of the
test.
—
Reply to this email directly, view it on GitHub
<#577 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHPXVJS567PAQSCIGXJE6P32E7IO7AVCNFSM6AAAAABTLMEXI2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMZTG4YTEOBUG4>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Edit: changed granularity to simplify an example
riscv-arch-test/riscv-test-suite/rv64i_m/pmp64/src/pmp64-NA4-R-priority-level-2.S
Line 158 in cd94912
Priv spec says the following:
Let's assume that DUT has PMP granularity of, e.g., 14 bits of address (region size of 0x4000).
Let's assume that if we write 0xYYYYYXXX (e.g. 0x20000212, correcponding to the address 0x80000828) to PMPADDRx, 0xYYYYY000 will be written (e.g. 0x20000000, corresponding to the address 0x80000000).
Let's assume that if we write 0x91 to the PMPCFGx (NA4 R L), actual value of 0x99 will be written (NAPOT R L) and 0x7ff will be OR'ed to the PMPADDRx (e.g. now PMP#x now protects the region [0x80000000..0x80003fff]).
Under the following assumptions I believe the above code will lead execution to the following:
In other words, the following will occure:
csrs
protects the whole region instead of just 4 bytes.While I suppose the suggested architecture to be compliant, the test will never finish being stuck in the infinite loop.
The text was updated successfully, but these errors were encountered: