|
1 |
| -# Project |
| 1 | +# MSCCL-EXECUTOR-NCCL |
2 | 2 |
|
3 |
| -> This repo has been populated by an initial template to help get you started. Please |
4 |
| -> make sure to update the content to build a great experience for community-building. |
| 3 | +Microsoft Collective Communication Library Exector on NCCL (MSCCL-EXECUTOR-NCCL) is an inter-accelerator communication framework that is built on top of [NCCL](https://github.com/nvidia/nccl) and uses its building blocks to execute custom-written collective communication algorithms. |
5 | 4 |
|
6 |
| -As the maintainer of this project, please make a few updates: |
| 5 | +## Introduction |
7 | 6 |
|
8 |
| -- Improving this README.MD file to provide a great experience |
9 |
| -- Updating SUPPORT.MD with content about this project's support experience |
10 |
| -- Understanding the security reporting process in SECURITY.MD |
11 |
| -- Remove this section from the README |
| 7 | +MSCCL-EXECUTOR-NCCL is a stand-alone library of standard communication routines for GPUs, implementing all-reduce, all-gather, all-to-all, as well as any send/receive based communication pattern. It has been optimized to achieve high bandwidth on platforms using PCIe, NVLink, NVswitch, as well as networking using InfiniBand Verbs or TCP/IP sockets. MSCCL-EXECUTOR-NCCL supports an arbitrary number of GPUs installed in a single node or across multiple nodes, and can be used in either single- or multi-process (e.g., MPI) applications. To achieve this, MSCCL has multiple capabilities: |
12 | 8 |
|
13 |
| -## Contributing |
| 9 | +- Programmibility: Inter-connection among accelerators have different latencies and bandwidths. Therefore, a generic collective communication algorithm does not necessarily well for all topologies and buffer sizes. MSCCL-EXECUTOR-NCCL allows a user to write a hyper-optimized collective communication algorithm for a given topology and a buffer size. This is possbile through two main components: [MSCCL toolkit](https://github.com/microsoft/msccl-tools) and [MSCCL-EXECUTOR-NCCL](https://github.com/Azure/msccl-executor-nccl) (this repo). MSCCL toolkit contains a high-level DSL (MSCCLang) and a compiler which generate an IR for the MSCCL runtime (this repo) to run on the backend. MSCCL will automatically fall back to a NCCL's generic algorithm in case there is no custom algorithm. [Example](#Example) provides some instances on how MSCCL toolkit with the runtime works. Please refer to [MSCCL toolkit](https://github.com/microsoft/msccl-tools) for more information. |
| 10 | +- Profiling: MSCCL-EXECUTOR-NCCL has a profiling tool [NPKit](https://github.com/microsoft/npkit) which provides detailed timeline for each primitive send and receive operation to understand the bottlenecks in a given collective communication algorithms. |
14 | 11 |
|
15 |
| -This project welcomes contributions and suggestions. Most contributions require you to agree to a |
16 |
| -Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us |
17 |
| -the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com. |
| 12 | +## Build |
| 13 | + |
| 14 | +To build the library : |
| 15 | + |
| 16 | +```sh |
| 17 | +$ git clone https://github.com/microsoft/msccl.git --recurse-submodules |
| 18 | +$ cd msccl/executor/msccl-executor-nccl |
| 19 | +$ make -j src.build |
| 20 | +``` |
| 21 | + |
| 22 | +If CUDA is not installed in the default /usr/local/cuda path, you can define the CUDA path with : |
| 23 | + |
| 24 | +```sh |
| 25 | +$ make src.build CUDA_HOME=<path to cuda install> |
| 26 | +``` |
| 27 | + |
| 28 | +MSCCL-EXECUTOR-NCCL will be compiled and installed in `build/` unless `BUILDDIR` is set. |
| 29 | + |
| 30 | +By default, MSCCL-EXECUTOR-NCCL is compiled for all supported architectures. To accelerate the compilation and reduce the binary size, consider redefining `NVCC_GENCODE` (defined in `makefiles/common.mk`) to only include the architecture of the target platform : |
| 31 | +```sh |
| 32 | +$ make -j src.build NVCC_GENCODE="-gencode=arch=compute_80,code=sm_80" |
| 33 | +``` |
| 34 | + |
| 35 | +## Install |
| 36 | + |
| 37 | +To install MSCCL-EXECUTOR-NCCL on the system, create a package then install it as root. |
18 | 38 |
|
19 |
| -When you submit a pull request, a CLA bot will automatically determine whether you need to provide |
20 |
| -a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions |
21 |
| -provided by the bot. You will only need to do this once across all repos using our CLA. |
| 39 | +Debian/Ubuntu : |
| 40 | +```sh |
| 41 | +$ # Install tools to create debian packages |
| 42 | +$ sudo apt install build-essential devscripts debhelper fakeroot |
| 43 | +$ # Build MSCCL-EXECUTOR-NCCL deb package |
| 44 | +$ make pkg.debian.build |
| 45 | +$ ls build/pkg/deb/ |
| 46 | +``` |
22 | 47 |
|
23 |
| -This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). |
24 |
| -For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or |
25 |
| -contact [[email protected]](mailto:[email protected]) with any additional questions or comments. |
| 48 | +RedHat/CentOS : |
| 49 | +```sh |
| 50 | +$ # Install tools to create rpm packages |
| 51 | +$ sudo yum install rpm-build rpmdevtools |
| 52 | +$ # Build MSCCL-EXECUTOR-NCCL rpm package |
| 53 | +$ make pkg.redhat.build |
| 54 | +$ ls build/pkg/rpm/ |
| 55 | +``` |
26 | 56 |
|
27 |
| -## Trademarks |
| 57 | +OS-agnostic tarball : |
| 58 | +```sh |
| 59 | +$ make pkg.txz.build |
| 60 | +$ ls build/pkg/txz/ |
| 61 | +``` |
28 | 62 |
|
29 |
| -This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft |
30 |
| -trademarks or logos is subject to and must follow |
| 63 | +## Tests |
| 64 | + |
| 65 | +Tests for MSCCL-EXECUTOR-NCCL are maintained separately at https://github.com/Azure/msccl-tests-nccl. |
| 66 | + |
| 67 | +```sh |
| 68 | +$ git clone https://github.com/Azure/msccl-tests-nccl.git |
| 69 | +$ cd msccl-tests-nccl |
| 70 | +$ make |
| 71 | +$ ./build/all_reduce_perf -b 8 -e 256M -f 2 -g <ngpus> |
| 72 | +``` |
| 73 | + |
| 74 | +For more information on NCCL usage, please refer to the [NCCL documentation](https://docs.nvidia.com/deeplearning/sdk/nccl-developer-guide/index.html). |
| 75 | + |
| 76 | +This project welcomes contributions and suggestions. Most contributions require you to agree to a |
| 77 | +Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us |
| 78 | +the rights to use your contribution. For details, visit [CLA](https://cla.opensource.microsoft.com). |
| 79 | + |
| 80 | +This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft |
| 81 | +trademarks or logos is subject to and must follow |
31 | 82 | [Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
|
32 | 83 | Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
|
33 | 84 | Any use of third-party trademarks or logos are subject to those third-party's policies.
|
| 85 | + |
| 86 | +## Copyright |
| 87 | + |
| 88 | +All source code and accompanying documentation is copyright (c) 2015-2022, NVIDIA CORPORATION. All rights reserved. |
| 89 | + |
| 90 | +All modifications are copyright (c) 2022-2023, Microsoft Corporation. All rights reserved. |
0 commit comments