Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nix build using flake #162

Merged
merged 11 commits into from
Mar 29, 2022
Merged

Nix build using flake #162

merged 11 commits into from
Mar 29, 2022

Conversation

league
Copy link
Collaborator

@league league commented Mar 9, 2022

This is a start on specifying a “flake” for building bifrost with Nix. A flake is a format that pins all dependencies to support reproducible builds. This initial version includes github workflow for building with different releases of the nixpkgs tree (containing compilers, python dependencies, etc.) and different versions of python, then running the tests on all those configurations. (This builds on autoconf.) More background: nixos.org, Flakes on Nix wiki, Flakes tutorial from tweag.io

It doesn't yet support GPU builds, though I've gotten it mostly to work. It's a little tricky to integrate because the libcuda.so.1 must come from the host platform, so it agrees with the kernel version and GPU hardware. If the host is itself NixOS it's manageable, but when Nix is being used on top of another platform (e.g. Ubuntu) we can only build against stubs, and then sub in the real libcuda later. For similar reasons, auto-detecting the right GPU architecture during the build seems to be problematic. As long as GPU architecture is an input to the build, it gets hashed into the package signature, but we can't ask what GPU architecture is “from the inside.” Same story as for builtins.currentSystem for the overall architecture/OS tag. Some hints about libcuda on Nix

All this should be solvable (and hopefully useful). I'd like to continue to work on it and tweak it here... so marking this as a draft.

@codecov-commenter
Copy link

codecov-commenter commented Mar 9, 2022

Codecov Report

Merging #162 (ced5117) into master (657e705) will not change coverage.
The diff coverage is n/a.

❗ Current head ced5117 differs from pull request most recent head 38bfbec. Consider uploading reports for the commit 38bfbec to get more accurate results

@@           Coverage Diff           @@
##           master     #162   +/-   ##
=======================================
  Coverage   58.54%   58.54%           
=======================================
  Files          67       67           
  Lines        5727     5727           
=======================================
  Hits         3353     3353           
  Misses       2374     2374           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 657e705...38bfbec. Read the comment docs.

@jaycedowell
Copy link
Member

With regards to auto-detecting the GPU arch., would something like building all available archs. be a workable solution? It would take forever to compile but it should be generic.

@league
Copy link
Collaborator Author

league commented Mar 11, 2022

I'd like to find a way for telemetry to quietly disable itself if it turns out that $HOME doesn't exist or is not writable. It prevents me from ever doing import bifrost in a nix build environment.

bifrost-doc>   File "/nix/store/mdvjrgc0m3lhcz75hvcpz9lgnis69wbv-python3-3.9.6-env/lib/python3.9/site-packages/bifrost/telemetry.py", line 52, in <module>
bifrost-doc>     os.mkdir(os.path.join(os.path.expanduser('~'), '.bifrost'))
bifrost-doc> FileNotFoundError: [Errno 2] No such file or directory: '/homeless-shelter/.bifrost'

I was toying with generating the documentation from nix (and maybe restoring the auto-update of gh-pages as a workflow action). To generate the python API docs, it needs a python from which it can import bifrost... which can be arranged but the telemetry bombs in that environment.

I guess that whole preamble could be wrapped in a try/catch and on FileNotFound set TELEMETRY_ACTIVE = False and _INSTALL_KEY to a fresh uuid. It would still fail if somebody called enable or disable because you can't unlink or write to _ACTIVE_KEY, but that's probably okay… at least you should be able to import telemetry.

@jaycedowell
Copy link
Member

Now that #157 has been merged we should probably close this PR and open a new one that targets master.

@league
Copy link
Collaborator Author

league commented Mar 23, 2022

Yep. I may just rebase/squash onto new master... the timeline is pretty confusing with occasional merge commits from autoconf... but ultimately it should just add 3 files.

@league league changed the base branch from autoconf to master March 23, 2022 22:01
@league league marked this pull request as ready for review March 26, 2022 00:57
@jaycedowell
Copy link
Member

I think I will try out the nix thing once the new server gets setup.

@league
Copy link
Collaborator Author

league commented Mar 28, 2022

I may merge the nix stuff soon, but I feel like it deserves a small section of the README or manual... maybe before next release. I'll add a CHANGELOG entry as a placeholder. Next time you're on qblocks, you might try a quick nix setup like this:

# Install nix
  wget https://nixos.org/nix/install
  sh install
# Log out and back in, or source the script given at the end of the install to set up current shell
# Optional: install our binary cache, maybe another useful tool or so
  nix-env -i cachix ripgrep
  cachix use bifrost
# Some basic nix config: (unfree for nvidia)
  mkdir -p ~/.config/{nix,nixpkgs}
  echo "experimental-features = nix-command flakes" > ~/.config/nix/nix.conf
  echo "{ allowUnfree = true; }" > ~/.config/nixpkgs/config.nix

Even before a git clone, you should be able to do stuff like the following. (If it needs to configure and build it will, but if you hit exactly the configuration that's in cachix, it will just download. Cachix is usually populated by the CI.)

  • Run ctypesgen tool (after merge, won't need /nix-flake branch specifier)
    $ nix run github:ledatelescope/bifrost/nix-flake#ctypesgen-py3 --
    ERROR: No header files specified
    
  • Load a python with basic (CPU-only) bifrost
    $ nix run github:ledatelescope/bifrost/nix-flake#python3-bifrost  
    Python 3.9.6 (default, Jun 28 2021, 08:57:49) 
    [GCC 10.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import bifrost
    >>> bifrost.version.__version__
    '0.10.0'
    
  • Same, but python 3.8:
    $ nix run github:ledatelescope/bifrost/nix-flake#python38-bifrost
    Python 3.8.12 (default, Aug 30 2021, 16:42:10) 
    [GCC 10.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import bifrost
    
  • Check on its config (note the double dash needed to separate the nix run options from the python options).
    $ nix run github:ledatelescope/bifrost/nix-flake#python3-bifrost -- -m bifrost.version --config
    bifrost 0.10.0
    Copyright (c) 2016-2020, The Bifrost Authors. All rights reserved.
    Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
    License: BSD 3-Clause
    
    Configuration:
     Memory alignment: 4096 B
     OpenMP support: yes
     NUMA support no
     Hardware locality support: no
     Mellanox messaging accelerator (VMA) support: no
     Logging directory: /dev/shm/bifrost
     Debugging: no
     CUDA support: no
    
  • The debug-enabled version is pre-configured in the flake, just named python3-bifrost-debug:
    $ nix run github:ledatelescope/bifrost/nix-flake#python3-bifrost-debug -- -m bifrost.version --config
    bifrost 0.10.0
    [etc...]
     Logging directory: /dev/shm/bifrost
     Debugging: yes
     CUDA support: no
    
  • CUDA versions available too, it's just that it will have to download the pinned cuda toolkit from nvidia (won't use the one from the underlying system) and that can take a surprising amount of time. This one probably won't be pre-built in the cache because CUDA versions aren't built by CI.
    # Make sure we can find libcuda.so.1:
    $ export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:
    $ nix run github:ledatelescope/bifrost/nix-flake#python3-bifrost-cuda11 -- -m bifrost.version --config
    

league added 9 commits March 28, 2022 12:45

Verified

This commit was signed with the committer’s verified signature.
facundomedica Facundo Medica
It provides an overlay for nixpkgs that adds ctypesgen and a
configurable bifrost package, that can be overridden with various
versions of python3 and cudatoolkit (or without them), and can enable
debugging or not.  The github workflow uses nix to build a few
variations, run non-GPU tests, and update documentation in gh-pages
branch. Builds are cached by cachix; upon merge that part may need
some keys loaded onto the ledatelescope repo. Maybe needs a blurb in
the README before next release.

Verified

This commit was signed with the committer’s verified signature.
facundomedica Facundo Medica
That technique pulled in cudatoolkit too early, even when not building
with it (and so it failed on darwin).
Just going with ["70" "75"] in default CUDA builds for now. Always
possible to override the `gpuArchs` argument.
overlay → overlays.default
devShell.ARCH → devShells.ARCH.default
Better for consistency with non-nix build.
@league league merged commit 7790fdb into lwa-project:master Mar 29, 2022
@league league deleted the nix-flake branch March 29, 2022 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants