Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert some typeof tags to small integers #49556

Merged
merged 3 commits into from
May 8, 2023
Merged

convert some typeof tags to small integers #49556

merged 3 commits into from
May 8, 2023

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Apr 28, 2023

This allows us to avoid needing relocations for some builtin objects (esp. Strings) when loading the system image, saving many objects from being faulted into memory. Takes some of the same code from #48432, but in a different direction so it is applicable to GC, codegen, and pkgimages too.

Closes #48432

@vtjnash vtjnash requested a review from gbaraldi April 28, 2023 15:59
@vtjnash vtjnash force-pushed the jn/small-typeof branch from 4717f6d to 1224a17 Compare May 2, 2023 16:11
@vtjnash vtjnash added the needs nanosoldier run This PR should have benchmarks run on it label May 2, 2023
@vtjnash
Copy link
Member Author

vtjnash commented May 2, 2023

@nanosoldier runbenchmarks("scalar", vs=":master")

@vtjnash
Copy link
Member Author

vtjnash commented May 2, 2023

@nanosoldier runbenchmarks(!"scalar", vs=":master")

@@ -989,7 +1048,7 @@ STATIC_INLINE jl_value_t *jl_svecset(
#else
STATIC_INLINE jl_value_t *jl_svecref(void *t JL_PROPAGATES_ROOT, size_t i) JL_NOTSAFEPOINT
{
assert(jl_typeis(t,jl_simplevector_type));
assert(jl_typetagis(t,jl_simplevector_tag << 4));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the << 4 into the macro?

@@ -1055,13 +1114,13 @@ STATIC_INLINE jl_value_t *jl_array_ptr_set(
STATIC_INLINE uint8_t jl_array_uint8_ref(void *a, size_t i) JL_NOTSAFEPOINT
{
assert(i < jl_array_len(a));
assert(jl_typeis(a, jl_array_uint8_type));
assert(jl_typetagis(a, jl_array_uint8_type));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(jl_typetagis(a, jl_array_uint8_type));
assert(jl_typeis(a, jl_array_uint8_type));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jl_typetagis is always faster than jl_typeis, which is kept as a legacy alias, if you can guarantee that the type is not sometimes represented as a tag.

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@vtjnash
Copy link
Member Author

vtjnash commented May 3, 2023

@nanosoldier runbenchmarks("union", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@vtjnash
Copy link
Member Author

vtjnash commented May 3, 2023

Fascinating. I didn’t expect this to improve performance by that much, but seems consistent in the two runs.

@gbaraldi
Copy link
Member

gbaraldi commented May 3, 2023

I wonder if removing the load makes it codegen a bit better.

@vtjnash
Copy link
Member Author

vtjnash commented May 3, 2023

Okay, it turns out our codegen for this is just pretty bad:

    %22 = (%19 === Main.nothing)::Bool
;  @ REPL[1]:4 within `perf_countnothing`
  %94 = and i8 %tindex_phi3, 127, !dbg !67
  %95 = icmp eq i8 %94, 1, !dbg !67
  %96 = select i1 %95, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 4815131216 to {}*) to {} addrspace(10)*), {} addrspace(10)* null, !dbg !67
  %97 = icmp eq i8 %94, 2, !dbg !67
  %98 = select i1 %97, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 4815126496 to {}*) to {} addrspace(10)*), {} addrspace(10)* %96, !dbg !67
  %99 = icmp eq {} addrspace(10)* %98, addrspacecast ({}* inttoptr (i64 4815131216 to {}*) to {} addrspace(10)*), !dbg !67
  %100 = zext i1 %99 to i8, !dbg !67

But this PR, as a side-effect, simplified that slightly, which let LLVM mostly optimize it away in this case:

;  @ REPL[1]:4 within `perf_countnothing`
  %94 = and i8 %tindex_phi3, 127, !dbg !67
  %95 = icmp eq i8 %94, 1, !dbg !67
  %96 = select i1 %95, i64 4847026336, i64 0, !dbg !67
  %97 = icmp eq i8 %94, 2, !dbg !67
  %98 = select i1 %97, i64 4847021616, i64 %96, !dbg !67
  %99 = icmp eq i64 %98, 4847026336, !dbg !67
  %100 = zext i1 %99 to i8, !dbg !67

Vs. if we replace that with x isa Nothing, which emits the optimal code directly:

  %94 = and i8 %tindex_phi3, 127, !dbg !67
  %95 = icmp eq i8 %94, 1, !dbg !67
  %96 = zext i1 %95 to i8, !dbg !67
function perf_countnothing(X::AbstractArray); n = 0; @inbounds for x in X; n += x === nothing; end; n; end
@code_llvm raw=true optimize=false debuginfo=:source perf_countnothing(Vector{Union{Float64, Nothing}}(undef,0))

@vtjnash
Copy link
Member Author

vtjnash commented May 3, 2023

The fix for that conflicts slightly with our change here, because we need to edit some of the lines that need to be moved, so I can do that as a followup after merging this, if it is ready to go, or before if this ends up needing to be rebased anyways.

@vtjnash vtjnash removed the needs nanosoldier run This PR should have benchmarks run on it label May 3, 2023
@gbaraldi
Copy link
Member

gbaraldi commented May 3, 2023

Is isa Nothing special cased? Or is codegen for singleton types for === worse than it should be in general?

@vtjnash
Copy link
Member Author

vtjnash commented May 3, 2023

isa is special cased, but === was not falling back to it correctly

vtjnash added 3 commits May 6, 2023 00:39
The value here is two-fold. One, we speed up sysimg load times slightly,
since we can entirely skip relocations for these values and avoid
faulting in some writable pages. This should let us (later) move more
content into the ConstData section, such as String objects. Secondly,
some type-manipulation native code becomes simpler, since it is based on
small consecutive constants instead of requiring extra pointer loads to
check for pointer identity. This is similar to the existing small-union
optimization, but for type tags, and only for specific ones.

This makes use of the fact that the zero page (about 4096 byes) cannot
be allocated in the usual C memory model, which lets us define up to 255
of these special types, after taking into account the 4 gc bits. As a
current implementation limitation for performance, these cannot be
parametric types.

Note there are probably more places in Base that can be updated to use
`jl_typetagof` instead of `jl_typeof`, where we know if it is a type or tag.
For example, this optimize most of builtins.c file, except for the
`jl_object_id` function.
String is simpler and smaller (since we do not yet have a Buffer{UInt8}
type, which might be almost as simple) making it easier to manipulate.
Combined with the previous changes, this lets us eliminate many more
relocations from the image (previously every Array->data), which is
quite a few megabytes of array header objects simply gone.

MASTER:
sysimg size breakdown:
     sys data: 79207516
  isbits data: 58603452
      symbols:   463610
    tags list:  1078555
   reloc list:  4491653
    gvar list:   118848
    fptr list:   214000
.text          11109176
.data         144184704

BEFORE (moving existing Strings):
sysimg size breakdown:
     sys data: 75146604
  isbits data: 62679532
      symbols:   463411
    tags list:  1015525
   reloc list:  4481370
    gvar list:   118688
    fptr list:   213952
.text          11129192
.data         144126152

AFTER (making CodeInfo into Strings):
     sys data: 71348124
  isbits data: 63789244
      symbols:   463411
    tags list:   933984
   reloc list:  4398377
    gvar list:   118688
    fptr list:   213904
.text          11132968
.data         141272824
@vtjnash vtjnash force-pushed the jn/small-typeof branch from 1224a17 to 4be81cd Compare May 6, 2023 04:41
@vtjnash vtjnash added the merge me PR is reviewed. Merge when all tests are passing label May 8, 2023
@vtjnash vtjnash merged commit 8bcea42 into master May 8, 2023
@vtjnash vtjnash deleted the jn/small-typeof branch May 8, 2023 17:13
@oscardssmith oscardssmith added types and dispatch Types, subtyping and method dispatch and removed merge me PR is reviewed. Merge when all tests are passing labels May 8, 2023
NHDaly added a commit that referenced this pull request May 10, 2023
- Remove JL_NORETURN
- Account for changes in #49556
d-netto pushed a commit that referenced this pull request May 19, 2023
- Remove JL_NORETURN
- Account for changes in #49556
@kpamnany kpamnany mentioned this pull request May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
types and dispatch Types, subtyping and method dispatch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants