Compile packages independently, link using LTO

After #285, I'd like to move one step further: by compiling packages entirely separately and doing optimizations across packages using ThinLTO (or, optionally, full LTO if desired). The main benefit is that compilation should be a _lot_ faster. Both with a cold cache (by parallelizing codegen) and with small changes to the source code (by reusing most packages). We should be able to get close to the speed of the `go` toolchain: TinyGo is currently a lot slower.

How we currently compile packages is as follows:

 1. LLVM IR for packages is generated in parallel (and cached in ~/.cache/tinygo).
 2. This IR is then merged together to create one huge LLVM module with the IR of all packages.
 3. Some generic LLVM optimizations and TinyGo specific transformation passes are applied to all this combined IR.
 4. The IR is then written to a temporary location, either as bitcode (for ThinLTO) or as an object file (for non-ThinLTO builds).
 5. The linker (usually lld) is invoked to link everything together to generate an executable. In the ThinLTO case, lld creates an object file internally and caches it.

What I'd like to see:

 1. LLVM IR for packages is generated in parallel (and cached in ~/.cache/tinygo), as before. TinyGo specific optimizations need to be done in this phase.
 2. The linker (lld) is then used to link all bitcode files together, using ThinLTO.

This means there is no phase in which all IR is combined into one big module, which avoids the serial step that currently takes up most of the compile time.

This is no small task. We currently rely heavily on merging all packages together to perform some (required) optimization passes. These will need to be changed in some way to work well with LTO, by modifying them or replacing them with something else:

 - [x] We don't support ThinLTO yet for some targets (see #2867, #2865 for example).
 - [x] Some targets need the `AddGlobalsBitmap` pass to be able to scan global variables in the GC mark phase. It should be possible to convert this to simply scanning the `.data`/`.bss` sections everywhere (see #2867, #2869 for example).
 - [ ] WebAssembly uses the `MakeGCStackSlots` pass. We need to make this pass run per package. In the future, the [WebAssembly GC](https://github.com/WebAssembly/gc) would be an alternative.
 - [x] Reflect information is currently processed for the whole program in `LowerReflect`. I've been working on a replacement in #2640 but it's going to cost something. In return, the compiler itself becomes easier to understand and new reflect features are easier to add.
 - [ ] Interface method calls are lowered to direct calls in `LowerInterfaces`. We probably need to switch to [vtable style interfaces](https://research.swtch.com/interfaces). The optimizations that we currently do might be replaced by LLVM support for [whole program devirtualization for C++](https://lists.llvm.org/pipermail/llvm-dev/2019-December/137543.html).
 - [ ] Interrupt handlers are currently combined in `LowerInterrupts`. This is done late so that unused interrupts can be optimized away. I'm not sure how to do this efficiently in any other way other than at this stage.

Of course, the resulting binaries should remain small. It's hard to avoid a slight increase, but hopefully the benefits of a simpler compiler and (_much_) faster compile times outweigh the downsides.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compile packages independently, link using LTO #2870

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Compile packages independently, link using LTO #2870

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions