You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The output produced by the printRaw function of various TypeOp subclasses is ambiguous. For example, either INT_LESS, INT_SLESS or FLOAT_LESS could have produced the line u0x10000012:1(0x800fb41c:61) = r3(0x800fb40c:19) < #0x0. This can be observed when using the decomp_dbg binary.
To Reproduce
Steps to reproduce the behavior:
Compile decomp_dbg:
Go to SLEIGHHOME/Ghidra/Features/Decompiler/src/decompile/cpp
Run make decomp_dbg
Extract example.xml from example.zip and save it somewhere
Start the decomp_dbg program
restore <path_to_example_xml>
load function main
print raw
See that the same line is printed 3 times, and it is impossible to tell from this output alone what comparison is INT_LESS, INT_SLESS and FLOAT_LESS.
Expected behavior
I expected to quickly see the difference between INT_LESS, INT_SLESS and FLOAT_LESS, perhaps using <, s< and f< respectively.
Observed behavior
The output of print raw, with all unrelated PCode operations removed. The first < is FLOAT_LESS, the second is INT_SLESS, and the third is INT_LESS. While it might be possible to infer that the first < is FLOAT_LESS based on the name of the inputs, it is much harder (impossible?) to differentiate between the latter two.
Attachments
The source code of the example program is attached, as well as the xml obtained from compiling it with gcc, opening it in Ghidra and clicking "Debug Function Decompilation". These files are zipped into example.zip.
Environment:
Ghidra Version: 10.2.3
Ghidra Origin: locally built
Additional context
This ambiguity occurs several times. It seems that the issue of ambiguous printRaw output was detected previously, which caused INT_RIGHT and INT_SRIGHT to be represented by different symbols (>> and s>> respectively). As such, I think that a similar solution could be implemented for the remaining ambiguities. For example: INT_LESS could use <, INT_SLESS could use s< and FLOAT_LESS could use f<.
To find all ambiguities, I went through all classes defined in typeop.hh and described their printRaw representation. The resulting table is shown below. This table shows that there are 10 ambiguous symbols: (unary)-, ==, !=, <, <=, +, (binary)-, *, / and %. These ambiguities always come from PCode operations that only differ on whether the operation is signed or unsigned, or whether it is an operation on integers or floating point numbers.
Click here for the full table
CPUI Constant
TypeOp Class Name
TypeOp::printRaw Output
N/A
TypeOpFunc
<out> = <name>(<in0>,<in1>,...)
N/A
TypeOpUnary
<out> = <name> <in0>
N/A
TypeOpBinary
<out> = <in0> <name> <in1>
---
---
---
COPY
TypeOpCopy
<out> = <in0>
LOAD
TypeOpLoad
<out> = *(<in0_space_name>,<in1>)
STORE
TypeOpStore
*(<in0_space_name>,<in1>) = <in2>
BRANCH
TypeOpBranch
goto <in0>
CBRANCH
TypeOpCbranch
goto <in0> if (<in1> == 0) goto <in0> if (<in1> != 0)
Describe the bug
The output produced by the
printRaw
function of variousTypeOp
subclasses is ambiguous. For example, eitherINT_LESS
,INT_SLESS
orFLOAT_LESS
could have produced the lineu0x10000012:1(0x800fb41c:61) = r3(0x800fb40c:19) < #0x0
. This can be observed when using thedecomp_dbg
binary.To Reproduce
Steps to reproduce the behavior:
decomp_dbg
:SLEIGHHOME/Ghidra/Features/Decompiler/src/decompile/cpp
make decomp_dbg
example.xml
from example.zip and save it somewheredecomp_dbg
programrestore <path_to_example_xml>
load function main
print raw
INT_LESS
,INT_SLESS
andFLOAT_LESS
.Expected behavior
I expected to quickly see the difference between
INT_LESS
,INT_SLESS
andFLOAT_LESS
, perhaps using<
,s<
andf<
respectively.Observed behavior
The output of
print raw
, with all unrelated PCode operations removed. The first<
isFLOAT_LESS
, the second isINT_SLESS
, and the third isINT_LESS
. While it might be possible to infer that the first<
isFLOAT_LESS
based on the name of the inputs, it is much harder (impossible?) to differentiate between the latter two.Attachments
The source code of the example program is attached, as well as the xml obtained from compiling it with
gcc
, opening it in Ghidra and clicking "Debug Function Decompilation". These files are zipped into example.zip.Environment:
Additional context
This ambiguity occurs several times. It seems that the issue of ambiguous
printRaw
output was detected previously, which causedINT_RIGHT
andINT_SRIGHT
to be represented by different symbols (>>
ands>>
respectively). As such, I think that a similar solution could be implemented for the remaining ambiguities. For example:INT_LESS
could use<
,INT_SLESS
could uses<
andFLOAT_LESS
could usef<
.To find all ambiguities, I went through all classes defined in
typeop.hh
and described theirprintRaw
representation. The resulting table is shown below. This table shows that there are 10 ambiguous symbols: (unary)-
,==
,!=
,<
,<=
,+
, (binary)-
,*
,/
and%
. These ambiguities always come from PCode operations that only differ on whether the operation is signed or unsigned, or whether it is an operation on integers or floating point numbers.Click here for the full table
TypeOp::printRaw
Output<out> = <name>(<in0>,<in1>,...)
<out> = <name> <in0>
<out> = <in0> <name> <in1>
<out> = <in0>
<out> = *(<in0_space_name>,<in1>)
*(<in0_space_name>,<in1>) = <in2>
goto <in0>
goto <in0> if (<in1> == 0)
goto <in0> if (<in1> != 0)
switch <in0>
<out> = call <in0>
<out> = call <in0>(<in1>,<in2>,...)
call <in0>
call <in0>(<in1>,<in2>,...)
<out> = callind <in0>
<out> = callind <in0>(<in1>,<in2>,...)
callind <in0>
callind <in0>(<in1>,<in2>,...)
<out> = syscall <opname_in0>
<out> = syscall <opname_in0>(<in1>,<in2>,...)
syscall <opname_in0>
syscall <opname_in0>(<in1>,<in2>,...)
return
return(<in0>)
return(<in0>) <in1>,<in2>,...
<out> = segmentop(<in0_space_name>,<in1>,<in2>)
segmentop(<in0_space_name>,<in1>,<in2>)
<out> = cpoolref_<token>(<in0>,<in2>,<in3>,...)
<out> = cpoolref_<token>(<in0>)
cpoolref_<token>(<in0>,<in2>,<in3>,...)
cpoolref_<token>(<in0>)
<out> = new(<in0>)
<out> = new(<in0>,<in1>,...)
new(<in0>)
new(<in0>,<in1>,...)
<out> = <in0> NAME
<out> = <in0> NAME <in1> NAME <in2> ...
<out> = <in0> [] <in1>
<out> = [create] <in1>
<out> = (cast) <in0>
<out> = <in0> + <in1>(*<in2>)
<out> = <in0> -> <in1>
<out> = <in0> s>> <in1>
TypeOpUnary (-)
TypeOpUnary (-)
TypeOpUnary (~)
TypeOpUnary (!)
TypeOpBinary (==)
TypeOpBinary (==)
TypeOpBinary (!=)
TypeOpBinary (!=)
TypeOpBinary (<)
TypeOpBinary (<)
TypeOpBinary (<)
TypeOpBinary (<=)
TypeOpBinary (<=)
TypeOpBinary (<=)
TypeOpBinary (+)
TypeOpBinary (+)
TypeOpBinary (-)
TypeOpBinary (-)
TypeOpBinary (^)
TypeOpBinary (&)
TypeOpBinary (|)
TypeOpBinary (<<)
TypeOpBinary (>>)
TypeOpBinary (*)
TypeOpBinary (*)
TypeOpBinary (/)
TypeOpBinary (/)
TypeOpBinary (/)
TypeOpBinary (%)
TypeOpBinary (%)
TypeOpBinary (^^)
TypeOpBinary (&&)
TypeOpBinary (||)
TypeOpFunc (ZEXT<insize><outsize>)
TypeOpFunc (SEXT<insize><outsize>)
TypeOpFunc (CARRY<insize>)
TypeOpFunc (SCARRY<insize>)
TypeOpFunc (SBORROW<insize>)
TypeOpFunc (NAN)
TypeOpFunc (ABS)
TypeOpFunc (SQRT)
TypeOpFunc (INT2FLOAT)
TypeOpFunc (FLOAT2FLOAT)
TypeOpFunc (TRUNC)
TypeOpFunc (CEIL)
TypeOpFunc (FLOOR)
TypeOpFunc (ROUND)
TypeOpFunc (CONCAT<in0_size><in1_size>)
TypeOpFunc (SUB<in0_size><in1_size>)
TypeOpFunc (INSERT)
TypeOpFunc (EXTRACT)
TypeOpFunc (POPCOUNT)
TypeOpFunc (COUNTLEADINGZEROS)
The text was updated successfully, but these errors were encountered: