Skip to content

In emulation mode, u8-u64 should be represented by just int #2173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
certik opened this issue Jul 17, 2023 · 11 comments · Fixed by #2189
Closed

In emulation mode, u8-u64 should be represented by just int #2173

certik opened this issue Jul 17, 2023 · 11 comments · Fixed by #2189
Assignees

Comments

@certik
Copy link
Contributor

certik commented Jul 17, 2023

Currently we created our own class UnsignedInteger and we provide a CPython incompatible behavior of some operators such as ~. Rather, if we need a CPython incompatible behavior, we should introduce some special functions, such as from lpython import uneg. The intrinsic operators should work just like in CPython.

Here is an example:

print(u16(0xff), 0xff)
print(~u16(0xff), ~0xff)

This currently prints:

255 255
65280 -256

So the ~ is behaving differently for u16 and Python ints. It should not. The design of u16 and i16 is that they are strict value restricted subset of Python ints and should behave exactly identical on this subset, and LPython should give compile time or runtime error messages if any operation is performed outside of this subset.

Since in CPython ~255 becomes -256, then the same behavior must happen on ~u16(255). In order to provide the other operation, we can make uneg(255) be 65280. Then the operation can be correctly implemented in emulation mode, and all unsigned integers can be just integers, which will speedup operation and remove many bugs that we currently have, when we are passing UnsignedInteger somewhere where a regular integer is expected.

I think the current behavior was introduced by trying to emulate C, but Python is not C. We need to be compatible with Python, not C. If C behavior is desired, it must be done using functions like uneg, not using intrinsic integer operators (which should rather behave like CPython does).

@certik
Copy link
Contributor Author

certik commented Jul 17, 2023

The "-" is two's complement, so

-(1) = -1
-(2) = -2
-(-1) = 1
-(-2) = 2

The "~" is one's complement, so:

~(1) = -2
~(2) = -3
~(-2) = 1
~(-3) = 2

And in Python it also does:

~(0) = -1
~(-1) = 0

The relation between "-" and "~" is:

~x = -x - 1
-x = ~x + 1

Using these definitions, we have ~0xff = ~255 = -255-1 = -256. Currently in LPython we have (by a mistake) implemented a wraparound and represented ~u16(0xff) = -256 as 65280, which is a mistake. The correct behavior is to give a runtime (or compile time if possible) exception that -256 is out of range for u16.

We have to provide other functions to implement the wraparound if that is needed.

@certik
Copy link
Contributor Author

certik commented Jul 17, 2023

As a consequence, since ~x = -x-1, this operation is by definition not supported for unsigned integers, because ~ applied to a non-negative number is negative, outside of range of unsigned. So to fix this:

  • make LPython give a nice error message for ~ applied to unsigned, saying the result is out of range and providing suggestions how to obtain this operation (by properly casting to signed first, and then applying ~, or by using bitnot)
  • investigate if we need uneg intrinsic function, or if the workaround via casting to signed is good enough. Answer: yes, we need that, and it will be called bitnot(u), implemented for u8, u16, u32, u64 (I would not implement it for signed integers s, since there it is equivalent to ~s) and represented by our current UnsignedIntegerBitNot node, which implements the bit not operation and wraps around.
  • Implement bitnot(u)

Sorry, something went wrong.

@certik
Copy link
Contributor Author

certik commented Jul 17, 2023

Example: this

u: u16 = u16(255)
u = ~u

will not compile. Must be rewritten to:

u: u16 = u16(255)
u = u16(~i16(u)+2**16)

This is not super easy to type, so we could provide casting from signed to unsigned with wrap around, such as:

u: u16 = u16(255)
u = u16_wrap(~i16(u))

Note that in full emulation mode, u16 and i16 is a no-op, but u16_wrap correctly wraps it using 2**16 (a simple implementation). So effectively it becomes:

u = 255
u = u16_wrap(~u)

So u is just a CPython int, and ~ is just a CPython ~ and of course u16_wrap is our own function implementing the wrapping. Everything is simple, compatible with CPython and we don't need the UnsignedInteger in CPython mode that is causing troubles.

@Shaikh-Ubaid, do you want to implement it?

@certik
Copy link
Contributor Author

certik commented Jul 17, 2023

The error message for:

u: u16 = u16(255)
u = ~u

Could be:

The result of the bit not ~ operation is negative, thus out of range for u16.
u = ~u
    ^^ help: use ~i32(u) for signed result or bitnot(u) for unsigned result

It should offer ~u8(u) -> ~i16(u), ~u16(u) -> ~i32(u), ~u32(u) -> ~i64(u), ~u64(u) -> ~i64(u); The first three cases always succeed, in the last case the i64(u) cast can sometimes fail at runtime/compile time if it can't represent the large u64 value.

@Shaikh-Ubaid
Copy link
Collaborator

It seems using ~ on unsigned integers, LPython would be able to print an error, but I think CPython would run (and produce a negative value). Is it fine that LPython fails and CPython runs? Should not CPython also support checks for runtime overflow for unsigned integers?

@Shaikh-Ubaid
Copy link
Collaborator

What should be the behaviour for casting a negative integer to unsigned integer in LPython as well as CPython. For example, what should be the output of the following in CPython and LPython.

print(u16(-5))

@Shaikh-Ubaid
Copy link
Collaborator

Also, do we need to support runtime checks for binary operations on integers as well as unsigned integers? Is it a part of this issue?

@Shaikh-Ubaid
Copy link
Collaborator

With respect to this #2173 (comment), similar to LPython, should CPython catch and report overflow errors?

This was referenced Jul 19, 2023
@Shaikh-Ubaid
Copy link
Collaborator

Here #2173 (comment) we discussed about u16_wrap(~u) for CPython. We currently have bitnot_u16(u) in place of it. Please share if it is fine.

@Shaikh-Ubaid
Copy link
Collaborator

I think with merging of #2189 we would complete this issue.

@certik
Copy link
Contributor Author

certik commented Jul 19, 2023

Perfect! Thank you @Shaikh-Ubaid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants