====================
Clang nvlink Wrapper
====================

.. contents::
   :local:

.. _clang-nvlink-wrapper:

Introduction
============

This tools works as a wrapper around the NVIDIA ``nvlink`` linker. The purpose
of this wrapper is to provide an interface similar to the ``ld.lld`` linker
while still relying on NVIDIA's proprietary linker to produce the final output.

``nvlink`` has a number of known quirks that make it difficult to use in a
unified offloading setting. For example, it does not accept ``.o`` files as they
must be named ``.cubin``. Static archives do not work, so passing a ``.a`` will
provide a linker error. ``nvlink`` also does not support link time optimization
and ignores many standard linker arguments. This tool works around these issues.

Usage
=====

This tool can be used with the following options. Any arguments not intended
only for the linker wrapper will be forwarded to ``nvlink``.

.. code-block:: console

  OVERVIEW: A utility that wraps around the NVIDIA 'nvlink' linker.
  This enables static linking and LTO handling for NVPTX targets.

  USAGE: clang-nvlink-wrapper [options] <options to passed to nvlink>

  OPTIONS:
    --arch <value>       Specify the 'sm_' name of the target architecture.
    --cuda-path=<dir>    Set the system CUDA path
    --dry-run            Print generated commands without running.
    --feature <value>    Specify the '+ptx' freature to use for LTO.
    -g                   Specify that this was a debug compile.
    -help-hidden         Display all available options
    -help                Display available options (--help-hidden for more)
    -L <dir>             Add <dir> to the library search path
    -l <libname>         Search for library <libname>
    -mllvm <arg>         Arguments passed to LLVM, including Clang invocations,
                         for which the '-mllvm' prefix is preserved. Use '-mllvm
                         --help' for a list of options.
    -o <path>            Path to file to write output
    --plugin-opt=jobs=<value>
                         Number of LTO codegen partitions
    --plugin-opt=lto-partitions=<value>
                         Number of LTO codegen partitions
    --plugin-opt=O<O0, O1, O2, or O3>
                         Optimization level for LTO
    --plugin-opt=thinlto<value>
                         Enable the thin-lto backend
    --plugin-opt=<value> Arguments passed to LLVM, including Clang invocations,
                         for which the '-mllvm' prefix is preserved. Use '-mllvm
                         --help' for a list of options.
    --save-temps         Save intermediate results
    --version            Display the version number and exit
    -v                   Print verbose information

Example
=======

This tool is intended to be invoked when targeting the NVPTX toolchain directly
as a cross-compiling target. This can be used to create standalone GPU
executables with normal linking semantics similar to standard compilation.

.. code-block:: console

  clang --target=nvptx64-nvidia-cuda -march=native -flto=full input.c