From 4233695feaa203816f2fea8cffede89d851a1c5c Mon Sep 17 00:00:00 2001
From: Manuel Drehwald <git@manuel.drehwald.info>
Date: Mon, 2 Jun 2025 12:06:27 -0700
Subject: [PATCH] initial instructions for gpu offload

---
 src/SUMMARY.md              |  2 ++
 src/offload/installation.md | 71 +++++++++++++++++++++++++++++++++++++
 src/offload/internals.md    |  9 +++++
 3 files changed, 82 insertions(+)
 create mode 100644 src/offload/installation.md
 create mode 100644 src/offload/internals.md

diff --git a/src/SUMMARY.md b/src/SUMMARY.md
index 50a3f44ad..7f2f32c62 100644
--- a/src/SUMMARY.md
+++ b/src/SUMMARY.md
@@ -101,6 +101,8 @@
 	- [The `rustdoc` test suite](./rustdoc-internals/rustdoc-test-suite.md)
 	- [The `rustdoc-gui` test suite](./rustdoc-internals/rustdoc-gui-test-suite.md)
 	- [The `rustdoc-json` test suite](./rustdoc-internals/rustdoc-json-test-suite.md)
+- [GPU offload internals](./offload/internals.md)
+    - [Installation](./offload/installation.md)
 - [Autodiff internals](./autodiff/internals.md)
     - [Installation](./autodiff/installation.md)
     - [How to debug](./autodiff/debugging.md)
diff --git a/src/offload/installation.md b/src/offload/installation.md
new file mode 100644
index 000000000..2536af09a
--- /dev/null
+++ b/src/offload/installation.md
@@ -0,0 +1,71 @@
+# Installation
+
+In the future, `std::offload` should become available in nightly builds for users. For now, everyone still needs to build rustc from source. 
+
+## Build instructions
+
+First you need to clone and configure the Rust repository:
+```bash
+git clone --depth=1 git@github.com:rust-lang/rust.git
+cd rust
+./configure --enable-llvm-link-shared --release-channel=nightly --enable-llvm-assertions --enable-offload --enable-enzyme --enable-clang --enable-lld --enable-option-checking --enable-ninja --disable-docs
+```
+
+Afterwards you can build rustc using:
+```bash
+./x.py build --stage 1 library
+```
+
+Afterwards rustc toolchain link will allow you to use it through cargo:
+```
+rustup toolchain link offload build/host/stage1
+rustup toolchain install nightly # enables -Z unstable-options
+```
+
+
+
+## Build instruction for LLVM itself
+```bash
+git clone --depth=1 git@github.com:llvm/llvm-project.git 
+cd llvm-project
+mkdir build
+cd build
+cmake -G Ninja ../llvm -DLLVM_TARGETS_TO_BUILD="host,AMDGPU,NVPTX" -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_ENABLE_PROJECTS="clang;lld" -DLLVM_ENABLE_RUNTIMES="offload,openmp" -DLLVM_ENABLE_PLUGINS=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=.
+ninja
+ninja install
+```
+This gives you a working LLVM build.
+
+
+## Testing
+run
+```
+./x.py test --stage 1 tests/codegen/gpu_offload
+```
+
+## Usage
+It is important to use a clang compiler build on the same llvm as rustc. Just calling clang without the full path will likely use your system clang, which probably will be incompatible.
+```
+/absolute/path/to/rust/build/x86_64-unknown-linux-gnu/stage1/bin/rustc --edition=2024 --crate-type cdylib src/main.rs --emit=llvm-ir  -O -C lto=fat -Cpanic=abort -Zoffload=Enable
+/absolute/path/to/rust/build/x86_64-unknown-linux-gnu/llvm/bin/clang++ -fopenmp --offload-arch=native -g  -O3 main.ll -o main -save-temps
+LIBOMPTARGET_INFO=-1  ./main
+```
+The first step will generate a `main.ll` file, which has enough instructions to cause the offload runtime to move data to and from a gpu.
+The second step will use clang as the compilation driver to compile our IR file down to a working binary. Only a very small Rust subset will work out of the box here, unless
+you use features like build-std, which are not covered by this guide. Look at the codegen test to get a feeling for how to write a working example.
+In the last step you can run your binary, if all went well you will see a data transfer being reported:
+```
+omptarget device 0 info: Entering OpenMP data region with being_mapper at unknown:0:0 with 1 arguments:
+omptarget device 0 info: tofrom(unknown)[1024]
+omptarget device 0 info: Creating new map entry with HstPtrBase=0x00007fffffff9540, HstPtrBegin=0x00007fffffff9540, TgtAllocBegin=0x0000155547200000, TgtPtrBegin=0x0000155547200000, Size=1024, DynRefCount=1, HoldRefCount=0, Name=unknown
+omptarget device 0 info: Copying data from host to device, HstPtr=0x00007fffffff9540, TgtPtr=0x0000155547200000, Size=1024, Name=unknown
+omptarget device 0 info: OpenMP Host-Device pointer mappings after block at unknown:0:0:
+omptarget device 0 info: Host Ptr           Target Ptr         Size (B) DynRefCount HoldRefCount Declaration
+omptarget device 0 info: 0x00007fffffff9540 0x0000155547200000 1024     1           0            unknown at unknown:0:0
+// some other output
+omptarget device 0 info: Exiting OpenMP data region with end_mapper at unknown:0:0 with 1 arguments:
+omptarget device 0 info: tofrom(unknown)[1024]
+omptarget device 0 info: Mapping exists with HstPtrBegin=0x00007fffffff9540, TgtPtrBegin=0x0000155547200000, Size=1024, DynRefCount=0 (decremented, delayed deletion), HoldRefCount=0
+omptarget device 0 info: Copying data from device to host, TgtPtr=0x0000155547200000, HstPtr=0x00007fffffff9540, Size=1024, Name=unknown
+omptarget device 0 info: Removing map entry with HstPtrBegin=0x00007fffffff9540, TgtPtrBegin=0x0000155547200000, Size=1024, Name=unknown
+```
diff --git a/src/offload/internals.md b/src/offload/internals.md
new file mode 100644
index 000000000..28857a6e7
--- /dev/null
+++ b/src/offload/internals.md
@@ -0,0 +1,9 @@
+# std::offload
+
+This module is under active development. Once upstream, it should allow Rust developers to run Rust code on GPUs.
+We aim to develop a `rusty` GPU programming interface, which is safe, convenient and sufficiently fast by default.
+This includes automatic data movement to and from the GPU, in a efficient way. We will (later)
+also offer more advanced, possibly unsafe, interfaces which allow a higher degree of control.
+
+The implementation is based on LLVM's "offload" project, which is already used by OpenMP to run Fortran or C++ code on GPUs.
+While the project is under development, users will need to call other compilers like clang to finish the compilation process.