Decompiler: PCode representation produced by `TypeOp::printRaw` is ambiguous #4951

LukeSerne · 2023-02-01T15:24:05Z

Describe the bug
The output produced by the printRaw function of various TypeOp subclasses is ambiguous. For example, either INT_LESS, INT_SLESS or FLOAT_LESS could have produced the line u0x10000012:1(0x800fb41c:61) = r3(0x800fb40c:19) < #0x0. This can be observed when using the decomp_dbg binary.

To Reproduce
Steps to reproduce the behavior:

Compile decomp_dbg:
1. Go to SLEIGHHOME/Ghidra/Features/Decompiler/src/decompile/cpp
2. Run make decomp_dbg
Extract example.xml from example.zip and save it somewhere
Start the decomp_dbg program
restore <path_to_example_xml>
load function main
print raw
See that the same line is printed 3 times, and it is impossible to tell from this output alone what comparison is INT_LESS, INT_SLESS and FLOAT_LESS.

Expected behavior
I expected to quickly see the difference between INT_LESS, INT_SLESS and FLOAT_LESS, perhaps using <, s< and f< respectively.

Observed behavior
The output of print raw, with all unrelated PCode operations removed. The first < is FLOAT_LESS, the second is INT_SLESS, and the third is INT_LESS. While it might be possible to infer that the first < is FLOAT_LESS based on the name of the inputs, it is much harder (impossible?) to differentiate between the latter two.

0
Basic Block 0 0x00101139-0x001011ca
...
0x00101177:2f:    u0x00018a80:1(0x00101177:2f) = XMM0_Da(free) < XMM1_Da(free)
...
0x00101189:44:    CF(0x00101189:44) = EDX(free) < EAX(free)
...
0x00101189:47:    SF(0x00101189:47) = u0x00029800:4(free) < #0x0:4
...

Attachments
The source code of the example program is attached, as well as the xml obtained from compiling it with gcc, opening it in Ghidra and clicking "Debug Function Decompilation". These files are zipped into example.zip.

Environment:

Ghidra Version: 10.2.3
Ghidra Origin: locally built

Additional context
This ambiguity occurs several times. It seems that the issue of ambiguous printRaw output was detected previously, which caused INT_RIGHT and INT_SRIGHT to be represented by different symbols (>> and s>> respectively). As such, I think that a similar solution could be implemented for the remaining ambiguities. For example: INT_LESS could use <, INT_SLESS could use s< and FLOAT_LESS could use f<.

To find all ambiguities, I went through all classes defined in typeop.hh and described their printRaw representation. The resulting table is shown below. This table shows that there are 10 ambiguous symbols: (unary)-, ==, !=, <, <=, +, (binary)-, *, / and %. These ambiguities always come from PCode operations that only differ on whether the operation is signed or unsigned, or whether it is an operation on integers or floating point numbers.

Click here for the full table

CPUI Constant	TypeOp Class Name	`TypeOp::printRaw` Output
N/A	TypeOpFunc	`<out> = <name>(<in0>,<in1>,...)`
N/A	TypeOpUnary	`<out> = <name> <in0>`
N/A	TypeOpBinary	`<out> = <in0> <name> <in1>`
---	---	---
COPY	TypeOpCopy	`<out> = <in0>`
LOAD	TypeOpLoad	`<out> = *(<in0_space_name>,<in1>)`
STORE	TypeOpStore	`*(<in0_space_name>,<in1>) = <in2>`
BRANCH	TypeOpBranch	`goto <in0>`
CBRANCH	TypeOpCbranch	`goto <in0> if (<in1> == 0)` `goto <in0> if (<in1> != 0)`
BRANCHIND	TypeOpBranchind	`switch <in0>`
CALL	TypeOpCall	`<out> = call <in0>` `<out> = call <in0>(<in1>,<in2>,...)` `call <in0>` `call <in0>(<in1>,<in2>,...)`
CALLIND	TypeOpCallind	`<out> = callind <in0>` `<out> = callind <in0>(<in1>,<in2>,...)` `callind <in0>` `callind <in0>(<in1>,<in2>,...)`
CALLOTHER	TypeOpCallother	`<out> = syscall <opname_in0>` `<out> = syscall <opname_in0>(<in1>,<in2>,...)` `syscall <opname_in0>` `syscall <opname_in0>(<in1>,<in2>,...)`
RETURN	TypeOpReturn	`return` `return(<in0>)` `return(<in0>) <in1>,<in2>,...`
SEGMENTOP	TypeOpSegment	`<out> = segmentop(<in0_space_name>,<in1>,<in2>)` `segmentop(<in0_space_name>,<in1>,<in2>)`
CPOOLREF	TypeOpCpoolref	`<out> = cpoolref_<token>(<in0>,<in2>,<in3>,...)` `<out> = cpoolref_<token>(<in0>)` `cpoolref_<token>(<in0>,<in2>,<in3>,...)` `cpoolref_<token>(<in0>)`
NEW	TypeOpNew	`<out> = new(<in0>)` `<out> = new(<in0>,<in1>,...)` `new(<in0>)` `new(<in0>,<in1>,...)`
MULTIEQUAL	TypeOpMulti	`<out> = <in0> NAME` `<out> = <in0> NAME <in1> NAME <in2> ...`
INDIRECT	TypeOpIndirect	`<out> = <in0> [] <in1>` `<out> = [create] <in1>`
CAST	TypeOpCast	`<out> = (cast) <in0>`
PTRADD	TypeOpPtradd	`<out> = <in0> + <in1>(*<in2>)`
PTRSUB	TypeOpPtrsub	`<out> = <in0> -> <in1>`
INT_SRIGHT	TypeOpIntSright	`<out> = <in0> s>> <in1>`
INT_2COMP	TypeOpInt2Comp	`TypeOpUnary (-)`
FLOAT_NEG	TypeOpFloatNeg	`TypeOpUnary (-)`
INT_NEGATE	TypeOpIntNegate	`TypeOpUnary (~)`
BOOL_NEGATE	TypeOpBoolNegate	`TypeOpUnary (!)`
INT_EQUAL	TypeOpEqual	`TypeOpBinary (==)`
FLOAT_EQUAL	TypeOpFloatEqual	`TypeOpBinary (==)`
INT_NOTEQUAL	TypeOpNotEqual	`TypeOpBinary (!=)`
FLOAT_NOTEQUAL	TypeOpFloatNotEqual	`TypeOpBinary (!=)`
INT_SLESS	TypeOpIntSless	`TypeOpBinary (<)`
INT_LESS	TypeOpIntLess	`TypeOpBinary (<)`
FLOAT_LESS	TypeOpFloatLess	`TypeOpBinary (<)`
INT_SLESSEQUAL	TypeOpIntSlessEqual	`TypeOpBinary (<=)`
INT_LESSEQUAL	TypeOpIntLessEqual	`TypeOpBinary (<=)`
FLOAT_LESSEQUAL	TypeOpFloatLessEqual	`TypeOpBinary (<=)`
INT_ADD	TypeOpIntAdd	`TypeOpBinary (+)`
FLOAT_ADD	TypeOpFloatAdd	`TypeOpBinary (+)`
INT_SUB	TypeOpIntSub	`TypeOpBinary (-)`
FLOAT_SUB	TypeOpFloatSub	`TypeOpBinary (-)`
INT_XOR	TypeOpIntXor	`TypeOpBinary (^)`
INT_AND	TypeOpIntAnd	`TypeOpBinary (&)`
INT_OR	TypeOpIntOr	`TypeOpBinary (\|)`
INT_LEFT	TypeOpIntLeft	`TypeOpBinary (<<)`
INT_RIGHT	TypeOpIntRight	`TypeOpBinary (>>)`
INT_MULT	TypeOpIntMult	`TypeOpBinary (*)`
FLOAT_MULT	TypeOpFloatMult	`TypeOpBinary (*)`
INT_SDIV	TypeOpIntSdiv	`TypeOpBinary (/)`
INT_DIV	TypeOpIntDiv	`TypeOpBinary (/)`
FLOAT_DIV	TypeOpFloatDiv	`TypeOpBinary (/)`
INT_REM	TypeOpIntRem	`TypeOpBinary (%)`
INT_SREM	TypeOpIntSrem	`TypeOpBinary (%)`
BOOL_XOR	TypeOpBoolXor	`TypeOpBinary (^^)`
BOOL_AND	TypeOpBoolAnd	`TypeOpBinary (&&)`
BOOL_OR	TypeOpBoolOr	`TypeOpBinary (\|\|)`
INT_ZEXT	TypeOpIntZext	`TypeOpFunc (ZEXT<insize><outsize>)`
INT_SEXT	TypeOpIntSext	`TypeOpFunc (SEXT<insize><outsize>)`
INT_CARRY	TypeOpIntCarry	`TypeOpFunc (CARRY<insize>)`
INT_SCARRY	TypeOpIntScarry	`TypeOpFunc (SCARRY<insize>)`
INT_SBORROW	TypeOpIntSborrow	`TypeOpFunc (SBORROW<insize>)`
FLOAT_NAN	TypeOpFloatNan	`TypeOpFunc (NAN)`
FLOAT_ABS	TypeOpFloatAbs	`TypeOpFunc (ABS)`
FLOAT_SQRT	TypeOpFloatSqrt	`TypeOpFunc (SQRT)`
FLOAT_INT2FLOAT	TypeOpFloatInt2Float	`TypeOpFunc (INT2FLOAT)`
FLOAT_FLOAT2FLOAT	TypeOpFloatFloat2Float	`TypeOpFunc (FLOAT2FLOAT)`
FLOAT_TRUNC	TypeOpFloatTrunc	`TypeOpFunc (TRUNC)`
FLOAT_CEIL	TypeOpFloatCeil	`TypeOpFunc (CEIL)`
FLOAT_FLOOR	TypeOpFloatFloor	`TypeOpFunc (FLOOR)`
FLOAT_ROUND	TypeOpFloatRound	`TypeOpFunc (ROUND)`
PIECE	TypeOpPiece	`TypeOpFunc (CONCAT<in0_size><in1_size>)`
SUBPIECE	TypeOpSubpiece	`TypeOpFunc (SUB<in0_size><in1_size>)`
INSERT	TypeOpInsert	`TypeOpFunc (INSERT)`
EXTRACT	TypeOpExtract	`TypeOpFunc (EXTRACT)`
POPCOUNT	TypeOpPopcount	`TypeOpFunc (POPCOUNT)`
COUNTLEADINGZEROS	TypeOpCountLeadingZeros	`TypeOpFunc (COUNTLEADINGZEROS)`

The text was updated successfully, but these errors were encountered:

ryanmkurtz assigned caheckman Feb 1, 2023

ryanmkurtz added Feature: Decompiler Status: Triage Information is being gathered labels Feb 1, 2023

LukeSerne linked a pull request Mar 5, 2023 that will close this issue

Decompiler: Add printRaw implementation for ambiguous TypeOps #5063

Open

LukeSerne mentioned this issue Dec 23, 2024

Export p-code and intermediate steps in "Debug Function Decompilation" #7315

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decompiler: PCode representation produced by `TypeOp::printRaw` is ambiguous #4951

Decompiler: PCode representation produced by `TypeOp::printRaw` is ambiguous #4951

LukeSerne commented Feb 1, 2023 •

edited

Loading

Decompiler: PCode representation produced by TypeOp::printRaw is ambiguous #4951

Decompiler: PCode representation produced by TypeOp::printRaw is ambiguous #4951

Comments

LukeSerne commented Feb 1, 2023 • edited Loading

Decompiler: PCode representation produced by `TypeOp::printRaw` is ambiguous #4951

Decompiler: PCode representation produced by `TypeOp::printRaw` is ambiguous #4951

LukeSerne commented Feb 1, 2023 •

edited

Loading