Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decompiler: PCode representation produced by TypeOp::printRaw is ambiguous #4951

Open
LukeSerne opened this issue Feb 1, 2023 · 0 comments · May be fixed by #5063
Open

Decompiler: PCode representation produced by TypeOp::printRaw is ambiguous #4951

LukeSerne opened this issue Feb 1, 2023 · 0 comments · May be fixed by #5063
Assignees
Labels
Feature: Decompiler Status: Triage Information is being gathered

Comments

@LukeSerne
Copy link
Contributor

LukeSerne commented Feb 1, 2023

Describe the bug
The output produced by the printRaw function of various TypeOp subclasses is ambiguous. For example, either INT_LESS, INT_SLESS or FLOAT_LESS could have produced the line u0x10000012:1(0x800fb41c:61) = r3(0x800fb40c:19) < #0x0. This can be observed when using the decomp_dbg binary.

To Reproduce
Steps to reproduce the behavior:

  1. Compile decomp_dbg:
    1. Go to SLEIGHHOME/Ghidra/Features/Decompiler/src/decompile/cpp
    2. Run make decomp_dbg
  2. Extract example.xml from example.zip and save it somewhere
  3. Start the decomp_dbg program
  4. restore <path_to_example_xml>
  5. load function main
  6. print raw
  7. See that the same line is printed 3 times, and it is impossible to tell from this output alone what comparison is INT_LESS, INT_SLESS and FLOAT_LESS.

Expected behavior
I expected to quickly see the difference between INT_LESS, INT_SLESS and FLOAT_LESS, perhaps using <, s< and f< respectively.

Observed behavior
The output of print raw, with all unrelated PCode operations removed. The first < is FLOAT_LESS, the second is INT_SLESS, and the third is INT_LESS. While it might be possible to infer that the first < is FLOAT_LESS based on the name of the inputs, it is much harder (impossible?) to differentiate between the latter two.

0
Basic Block 0 0x00101139-0x001011ca
...
0x00101177:2f:    u0x00018a80:1(0x00101177:2f) = XMM0_Da(free) < XMM1_Da(free)
...
0x00101189:44:    CF(0x00101189:44) = EDX(free) < EAX(free)
...
0x00101189:47:    SF(0x00101189:47) = u0x00029800:4(free) < #0x0:4
...

Attachments
The source code of the example program is attached, as well as the xml obtained from compiling it with gcc, opening it in Ghidra and clicking "Debug Function Decompilation". These files are zipped into example.zip.

Environment:

  • Ghidra Version: 10.2.3
  • Ghidra Origin: locally built

Additional context
This ambiguity occurs several times. It seems that the issue of ambiguous printRaw output was detected previously, which caused INT_RIGHT and INT_SRIGHT to be represented by different symbols (>> and s>> respectively). As such, I think that a similar solution could be implemented for the remaining ambiguities. For example: INT_LESS could use <, INT_SLESS could use s< and FLOAT_LESS could use f<.

To find all ambiguities, I went through all classes defined in typeop.hh and described their printRaw representation. The resulting table is shown below. This table shows that there are 10 ambiguous symbols: (unary)-, ==, !=, <, <=, +, (binary)-, *, / and %. These ambiguities always come from PCode operations that only differ on whether the operation is signed or unsigned, or whether it is an operation on integers or floating point numbers.

Click here for the full table
CPUI Constant TypeOp Class Name TypeOp::printRaw Output
N/A TypeOpFunc <out> = <name>(<in0>,<in1>,...)
N/A TypeOpUnary <out> = <name> <in0>
N/A TypeOpBinary <out> = <in0> <name> <in1>
--- --- ---
COPY TypeOpCopy <out> = <in0>
LOAD TypeOpLoad <out> = *(<in0_space_name>,<in1>)
STORE TypeOpStore *(<in0_space_name>,<in1>) = <in2>
BRANCH TypeOpBranch goto <in0>
CBRANCH TypeOpCbranch goto <in0> if (<in1> == 0)
goto <in0> if (<in1> != 0)
BRANCHIND TypeOpBranchind switch <in0>
CALL TypeOpCall <out> = call <in0>
<out> = call <in0>(<in1>,<in2>,...)
call <in0>
call <in0>(<in1>,<in2>,...)
CALLIND TypeOpCallind <out> = callind <in0>
<out> = callind <in0>(<in1>,<in2>,...)
callind <in0>
callind <in0>(<in1>,<in2>,...)
CALLOTHER TypeOpCallother <out> = syscall <opname_in0>
<out> = syscall <opname_in0>(<in1>,<in2>,...)
syscall <opname_in0>
syscall <opname_in0>(<in1>,<in2>,...)
RETURN TypeOpReturn return
return(<in0>)
return(<in0>) <in1>,<in2>,...
SEGMENTOP TypeOpSegment <out> = segmentop(<in0_space_name>,<in1>,<in2>)
segmentop(<in0_space_name>,<in1>,<in2>)
CPOOLREF TypeOpCpoolref <out> = cpoolref_<token>(<in0>,<in2>,<in3>,...)
<out> = cpoolref_<token>(<in0>)
cpoolref_<token>(<in0>,<in2>,<in3>,...)
cpoolref_<token>(<in0>)
NEW TypeOpNew <out> = new(<in0>)
<out> = new(<in0>,<in1>,...)
new(<in0>)
new(<in0>,<in1>,...)
MULTIEQUAL TypeOpMulti <out> = <in0> NAME
<out> = <in0> NAME <in1> NAME <in2> ...
INDIRECT TypeOpIndirect <out> = <in0> [] <in1>
<out> = [create] <in1>
CAST TypeOpCast <out> = (cast) <in0>
PTRADD TypeOpPtradd <out> = <in0> + <in1>(*<in2>)
PTRSUB TypeOpPtrsub <out> = <in0> -> <in1>
INT_SRIGHT TypeOpIntSright <out> = <in0> s>> <in1>
INT_2COMP TypeOpInt2Comp TypeOpUnary (-)
FLOAT_NEG TypeOpFloatNeg TypeOpUnary (-)
INT_NEGATE TypeOpIntNegate TypeOpUnary (~)
BOOL_NEGATE TypeOpBoolNegate TypeOpUnary (!)
INT_EQUAL TypeOpEqual TypeOpBinary (==)
FLOAT_EQUAL TypeOpFloatEqual TypeOpBinary (==)
INT_NOTEQUAL TypeOpNotEqual TypeOpBinary (!=)
FLOAT_NOTEQUAL TypeOpFloatNotEqual TypeOpBinary (!=)
INT_SLESS TypeOpIntSless TypeOpBinary (<)
INT_LESS TypeOpIntLess TypeOpBinary (<)
FLOAT_LESS TypeOpFloatLess TypeOpBinary (<)
INT_SLESSEQUAL TypeOpIntSlessEqual TypeOpBinary (<=)
INT_LESSEQUAL TypeOpIntLessEqual TypeOpBinary (<=)
FLOAT_LESSEQUAL TypeOpFloatLessEqual TypeOpBinary (<=)
INT_ADD TypeOpIntAdd TypeOpBinary (+)
FLOAT_ADD TypeOpFloatAdd TypeOpBinary (+)
INT_SUB TypeOpIntSub TypeOpBinary (-)
FLOAT_SUB TypeOpFloatSub TypeOpBinary (-)
INT_XOR TypeOpIntXor TypeOpBinary (^)
INT_AND TypeOpIntAnd TypeOpBinary (&)
INT_OR TypeOpIntOr TypeOpBinary (|)
INT_LEFT TypeOpIntLeft TypeOpBinary (<<)
INT_RIGHT TypeOpIntRight TypeOpBinary (>>)
INT_MULT TypeOpIntMult TypeOpBinary (*)
FLOAT_MULT TypeOpFloatMult TypeOpBinary (*)
INT_SDIV TypeOpIntSdiv TypeOpBinary (/)
INT_DIV TypeOpIntDiv TypeOpBinary (/)
FLOAT_DIV TypeOpFloatDiv TypeOpBinary (/)
INT_REM TypeOpIntRem TypeOpBinary (%)
INT_SREM TypeOpIntSrem TypeOpBinary (%)
BOOL_XOR TypeOpBoolXor TypeOpBinary (^^)
BOOL_AND TypeOpBoolAnd TypeOpBinary (&&)
BOOL_OR TypeOpBoolOr TypeOpBinary (||)
INT_ZEXT TypeOpIntZext TypeOpFunc (ZEXT<insize><outsize>)
INT_SEXT TypeOpIntSext TypeOpFunc (SEXT<insize><outsize>)
INT_CARRY TypeOpIntCarry TypeOpFunc (CARRY<insize>)
INT_SCARRY TypeOpIntScarry TypeOpFunc (SCARRY<insize>)
INT_SBORROW TypeOpIntSborrow TypeOpFunc (SBORROW<insize>)
FLOAT_NAN TypeOpFloatNan TypeOpFunc (NAN)
FLOAT_ABS TypeOpFloatAbs TypeOpFunc (ABS)
FLOAT_SQRT TypeOpFloatSqrt TypeOpFunc (SQRT)
FLOAT_INT2FLOAT TypeOpFloatInt2Float TypeOpFunc (INT2FLOAT)
FLOAT_FLOAT2FLOAT TypeOpFloatFloat2Float TypeOpFunc (FLOAT2FLOAT)
FLOAT_TRUNC TypeOpFloatTrunc TypeOpFunc (TRUNC)
FLOAT_CEIL TypeOpFloatCeil TypeOpFunc (CEIL)
FLOAT_FLOOR TypeOpFloatFloor TypeOpFunc (FLOOR)
FLOAT_ROUND TypeOpFloatRound TypeOpFunc (ROUND)
PIECE TypeOpPiece TypeOpFunc (CONCAT<in0_size><in1_size>)
SUBPIECE TypeOpSubpiece TypeOpFunc (SUB<in0_size><in1_size>)
INSERT TypeOpInsert TypeOpFunc (INSERT)
EXTRACT TypeOpExtract TypeOpFunc (EXTRACT)
POPCOUNT TypeOpPopcount TypeOpFunc (POPCOUNT)
COUNTLEADINGZEROS TypeOpCountLeadingZeros TypeOpFunc (COUNTLEADINGZEROS)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Decompiler Status: Triage Information is being gathered
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants