In emulation mode, u8-u64 should be represented by just int #2173

certik · 2023-07-17T19:58:27Z

Currently we created our own class UnsignedInteger and we provide a CPython incompatible behavior of some operators such as ~. Rather, if we need a CPython incompatible behavior, we should introduce some special functions, such as from lpython import uneg. The intrinsic operators should work just like in CPython.

Here is an example:

print(u16(0xff), 0xff)
print(~u16(0xff), ~0xff)

This currently prints:

255 255
65280 -256

So the ~ is behaving differently for u16 and Python ints. It should not. The design of u16 and i16 is that they are strict value restricted subset of Python ints and should behave exactly identical on this subset, and LPython should give compile time or runtime error messages if any operation is performed outside of this subset.

Since in CPython ~255 becomes -256, then the same behavior must happen on ~u16(255). In order to provide the other operation, we can make uneg(255) be 65280. Then the operation can be correctly implemented in emulation mode, and all unsigned integers can be just integers, which will speedup operation and remove many bugs that we currently have, when we are passing UnsignedInteger somewhere where a regular integer is expected.

I think the current behavior was introduced by trying to emulate C, but Python is not C. We need to be compatible with Python, not C. If C behavior is desired, it must be done using functions like uneg, not using intrinsic integer operators (which should rather behave like CPython does).

The text was updated successfully, but these errors were encountered:

certik · 2023-07-17T21:49:47Z

The "-" is two's complement, so

-(1) = -1
-(2) = -2
-(-1) = 1
-(-2) = 2

The "~" is one's complement, so:

~(1) = -2
~(2) = -3
~(-2) = 1
~(-3) = 2

And in Python it also does:

~(0) = -1
~(-1) = 0

The relation between "-" and "~" is:

~x = -x - 1
-x = ~x + 1

Using these definitions, we have ~0xff = ~255 = -255-1 = -256. Currently in LPython we have (by a mistake) implemented a wraparound and represented ~u16(0xff) = -256 as 65280, which is a mistake. The correct behavior is to give a runtime (or compile time if possible) exception that -256 is out of range for u16.

We have to provide other functions to implement the wraparound if that is needed.

certik · 2023-07-17T21:57:14Z

As a consequence, since ~x = -x-1, this operation is by definition not supported for unsigned integers, because ~ applied to a non-negative number is negative, outside of range of unsigned. So to fix this:

make LPython give a nice error message for ~ applied to unsigned, saying the result is out of range and providing suggestions how to obtain this operation (by properly casting to signed first, and then applying ~, or by using bitnot)
investigate if we need uneg intrinsic function, or if the workaround via casting to signed is good enough. Answer: yes, we need that, and it will be called bitnot(u), implemented for u8, u16, u32, u64 (I would not implement it for signed integers s, since there it is equivalent to ~s) and represented by our current UnsignedIntegerBitNot node, which implements the bit not operation and wraps around.
Implement bitnot(u)

certik · 2023-07-17T22:01:01Z

Example: this

u: u16 = u16(255)
u = ~u

will not compile. Must be rewritten to:

u: u16 = u16(255)
u = u16(~i16(u)+2**16)

This is not super easy to type, so we could provide casting from signed to unsigned with wrap around, such as:

u: u16 = u16(255)
u = u16_wrap(~i16(u))

Note that in full emulation mode, u16 and i16 is a no-op, but u16_wrap correctly wraps it using 2**16 (a simple implementation). So effectively it becomes:

u = 255
u = u16_wrap(~u)

So u is just a CPython int, and ~ is just a CPython ~ and of course u16_wrap is our own function implementing the wrapping. Everything is simple, compatible with CPython and we don't need the UnsignedInteger in CPython mode that is causing troubles.

@Shaikh-Ubaid, do you want to implement it?

certik · 2023-07-17T22:07:24Z

The error message for:

u: u16 = u16(255)
u = ~u

Could be:

The result of the bit not ~ operation is negative, thus out of range for u16.
u = ~u
    ^^ help: use ~i32(u) for signed result or bitnot(u) for unsigned result

It should offer ~u8(u) -> ~i16(u), ~u16(u) -> ~i32(u), ~u32(u) -> ~i64(u), ~u64(u) -> ~i64(u); The first three cases always succeed, in the last case the i64(u) cast can sometimes fail at runtime/compile time if it can't represent the large u64 value.

Shaikh-Ubaid · 2023-07-18T01:47:29Z

It seems using ~ on unsigned integers, LPython would be able to print an error, but I think CPython would run (and produce a negative value). Is it fine that LPython fails and CPython runs? Should not CPython also support checks for runtime overflow for unsigned integers?

Shaikh-Ubaid · 2023-07-18T01:54:17Z

What should be the behaviour for casting a negative integer to unsigned integer in LPython as well as CPython. For example, what should be the output of the following in CPython and LPython.

print(u16(-5))

Shaikh-Ubaid · 2023-07-18T02:21:21Z

Also, do we need to support runtime checks for binary operations on integers as well as unsigned integers? Is it a part of this issue?

Shaikh-Ubaid · 2023-07-18T15:51:38Z

With respect to this #2173 (comment), similar to LPython, should CPython catch and report overflow errors?

Shaikh-Ubaid · 2023-07-19T19:17:10Z

Here #2173 (comment) we discussed about u16_wrap(~u) for CPython. We currently have bitnot_u16(u) in place of it. Please share if it is fine.

Shaikh-Ubaid · 2023-07-19T19:17:42Z

I think with merging of #2189 we would complete this issue.

certik · 2023-07-19T19:21:19Z

Perfect! Thank you @Shaikh-Ubaid.

certik assigned Shaikh-Ubaid Jul 17, 2023

certik mentioned this issue Jul 17, 2023

Fix unsigned integer bitnot #2061

Merged

This was referenced Jul 19, 2023

Unsigned ints no wrap #2176

Merged

Unsigned integers no wrap II #2189

Merged

Shaikh-Ubaid closed this as completed in #2189 Jul 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In emulation mode, u8-u64 should be represented by just int #2173

In emulation mode, u8-u64 should be represented by just int #2173

certik commented Jul 17, 2023 •

edited

Loading

certik commented Jul 17, 2023 •

edited

Loading

certik commented Jul 17, 2023 •

edited by Shaikh-Ubaid

Loading

certik commented Jul 17, 2023 •

edited

Loading

certik commented Jul 17, 2023 •

edited

Loading

Shaikh-Ubaid commented Jul 18, 2023

Shaikh-Ubaid commented Jul 18, 2023

Shaikh-Ubaid commented Jul 18, 2023

Shaikh-Ubaid commented Jul 18, 2023

Shaikh-Ubaid commented Jul 19, 2023

Shaikh-Ubaid commented Jul 19, 2023

certik commented Jul 19, 2023

In emulation mode, u8-u64 should be represented by just int #2173

In emulation mode, u8-u64 should be represented by just int #2173

Comments

certik commented Jul 17, 2023 • edited Loading

certik commented Jul 17, 2023 • edited Loading

certik commented Jul 17, 2023 • edited by Shaikh-Ubaid Loading

certik commented Jul 17, 2023 • edited Loading

certik commented Jul 17, 2023 • edited Loading

Shaikh-Ubaid commented Jul 18, 2023

Shaikh-Ubaid commented Jul 18, 2023

Shaikh-Ubaid commented Jul 18, 2023

Shaikh-Ubaid commented Jul 18, 2023

Shaikh-Ubaid commented Jul 19, 2023

Shaikh-Ubaid commented Jul 19, 2023

certik commented Jul 19, 2023

certik commented Jul 17, 2023 •

edited

Loading

certik commented Jul 17, 2023 •

edited

Loading

certik commented Jul 17, 2023 •

edited by Shaikh-Ubaid

Loading

certik commented Jul 17, 2023 •

edited

Loading

certik commented Jul 17, 2023 •

edited

Loading