Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] No matching distribution found for triton==1.0.0 #1710

Closed
stas00 opened this issue Jan 19, 2022 · 12 comments
Closed

[BUG] No matching distribution found for triton==1.0.0 #1710

stas00 opened this issue Jan 19, 2022 · 12 comments
Assignees
Labels
bug Something isn't working

Comments

@stas00
Copy link
Collaborator

stas00 commented Jan 19, 2022

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

$ git clone https://github.com/microsoft/DeepSpeed
$ cd DeepSpeed
$ DS_BUILD_CPU_ADAM=1 DS_BUILD_UTILS=1 pip install -e . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check
[...]
ERROR: Could not find a version that satisfies the requirement triton==1.0.0 (from deepspeed) (from versions: 0.1, 0.1.1, 0.1.2, 0.1.3, 0.2.0, 0.2.1, 0.2.2, 0.2.3, 0.3.0)
ERROR: No matching distribution found for triton==1.0.0

Once I manually do:

pip install triton==1.0.0

it then works.

Not sure why the manual build doesn't take care of figuring out the requirements.

@jeffra

@stas00 stas00 added the bug Something isn't working label Jan 19, 2022
@jeffra jeffra self-assigned this Jan 26, 2022
@aphedges
Copy link
Contributor

@stas00, I actually came across this problem myself earlier this month but for a different package. I can describe the full problem if you want, but I think the problem I encountered has a slightly different cause.

Looking at triton, I can see that the most recent version to have a source distribution is 0.3.0, which is also the most recent version that pip found for you. From what I can tell, pip does not appear to show versions that are unavailable for an specific platform. For example:

# On x86_64 macOS 11:
$ pip index versions triton
triton (0.3.0)
Available versions: 0.3.0, 0.2.3, 0.2.2, 0.2.1, 0.2.0, 0.1.3, 0.1.2, 0.1.1, 0.1
# On x86_64 CentOS 7:
$ pip index versions triton
triton (1.1.1)
Available versions: 1.1.1, 1.1.0, 1.0.0, 0.4.2, 0.4.1, 0.3.0, 0.2.3, 0.2.2, 0.2.1, 0.2.0, 0.1.3, 0.1.2, 0.1.1, 0.1
  INSTALLED: 1.0.0
  LATEST:    1.1.1

Versions of Triton past 0.3.0 only have manylinux2014 x86_64 wheels on PyPI. I'm guessing you're experiencing these errors on a different platform than that?

@stas00
Copy link
Collaborator Author

stas00 commented Feb 1, 2022

pip finally added index - yay!

I think it's something else here, since I do see 1.0.0 on JeanZay:

$ pip index versions triton
WARNING: pip index is currently an experimental command. It may be removed/changed in a future release without prior warning.
triton (1.1.1)
Available versions: 1.1.1, 1.1.0, 1.0.0, 0.4.2, 0.4.1, 0.3.0, 0.2.3, 0.2.2, 0.2.1, 0.2.0, 0.1.3, 0.1.2, 0.1.1, 0.1
  INSTALLED: 1.0.0
  LATEST:    1.1.1

but during manual build for some reason it sees only some of those:

$ DS_BUILD_CPU_ADAM=1 DS_BUILD_UTILS=1 pip install -e . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check
[...]
ERROR: Could not find a version that satisfies the requirement triton==1.0.0 (from deepspeed) (from versions: 0.1, 0.1.1, 0.1.2, 0.1.3, 0.2.0, 0.2.1, 0.2.2, 0.2.3, 0.3.0)

and I can install:

pip install triton==1.0.0

so what it looks like is that perhaps when pip install is called it calls setup.py and it somehow doesn't see the latter versions. As you're saying it's some sort of wheel category filter that you showed in the output you shared.

@jeffra
Copy link
Collaborator

jeffra commented Feb 1, 2022

Super interesting, thanks @aphedges and @stas00 for context here.

@stas00, from a Big Science perspective I think this latest PR #1727 should resolve the issue for you guys. We also were running into this strange triton install issue on one of our nodes internally. Due to your issue and ours we reverted Sparse Attention (which requires triton) to be manually added add install time. Since Big Science isn't even using Sparse Attention you shouldn't need to install triton anyway.

@stas00
Copy link
Collaborator Author

stas00 commented Feb 1, 2022

I figured it out. It's a bug in pip's --global-option see: pypa/pip#4118 which they aren't going to fix - it was reported in 2016!


How I discovered it:

  1. changed the build command to -vv:
DS_BUILD_UTILS=1 pip install -e . --global-option="build_ext" --global-option="-j8" -vv

which gave a much more detailed log, which included dozens of these:

  Skipping link: No binaries permitted for triton: https://files.pythonhosted.org/packages/ab/28/dbc3f95650b6c97d36e70a4e678796ea342ccdd98cf9adf4f840fd932b82/triton-1.1.2.dev20220106-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl#sha256=185afa64d0165655fe76c73a948899fe994b937dfc2feb26dcc22408c669e133 (from https://pypi.org/simple/triton/)
[...]
  1. which I then looked up: 'Skipping link: No binaries permitted' and found the bug report from 2016.

So in details when using --global-option pip skips dependencies that have binary wheels.

We probably should document this to install the dependency manually?

$ grep -Ir global-option .
./DeepSpeedExamples/Megatron-LM/docker/Dockerfile:pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .
./DeepSpeedExamples/MoQ/huggingface-transformers/docker/transformers-pytorch-gpu/Dockerfile:    pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
./DeepSpeedExamples/MoQ/huggingface-transformers/docker/transformers-gpu/Dockerfile:    pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
./DeepSpeedExamples/inference/huggingface/transformers/docker/transformers-pytorch-gpu/Dockerfile:    pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
./DeepSpeedExamples/inference/huggingface/transformers/docker/transformers-gpu/Dockerfile:    pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
./DeepSpeedExamples/inference/huggingface/transformers/docs/source/main_classes/trainer.rst:    --global-option="build_ext" --global-option="-j8" --no-cache -v \

but as you replied while I was writing this perhaps it's no longer an issue.

In which case we can safely close this Issue.

@aphedges
Copy link
Contributor

aphedges commented Feb 1, 2022

@stas00, that's interesting! I was not aware of the --global-option parameter, but I can easily see how --global-option="-j8" can speed installation up. Building has always been slow for me (~15 minutes per install), which makes trying out different versions annoying. What the --global-option="build_ext" does is less apparent, but I can see from setup.py that it's part of distutils.

I've never actually compiled DeepSpeed with either of these options. I wonder if the lack of --global-option="build_ext" has caused any problems for me.

@stas00
Copy link
Collaborator Author

stas00 commented Feb 1, 2022

it's an equivalent of make -j8 - so yes, I specifically use it to speed things up - you can actually pass just -j and it'll use all your cores.

I wish apex had something of a sort, it takes forever to build. But it appears that one needs to instrument setup.py to use and it doesn't just work out of the box.

The latter enables: setup.py build_ext which is what makes it prebuild deepspeed.

So what you're doing is:

python setup.py build_ext -j8

but getting pip to do it for you as part of its install.

You can install pip install deepspeed and then it'll build the cuda kernels the first time they are needed (JIT), which something can cause problems, and manual pre-building is the easiest way to deal with that then. I use the latter most of the time out of habit now.

One of the main problem with JIT build is that it installs the cuda kernels into the same location under ~/.cache/torch_extensions/ so if you have multiple cuda envs this can be a big problem as they will all try to overwrite / reuse each other.

@aphedges
Copy link
Contributor

aphedges commented Feb 1, 2022

I forgot it was from make! I knew -j was from somewhere, but I did not remember where.

I was unaware that it's essentially calling python setup.py build_ext, which is now deprecated to call directly. I wonder if --global-option="build_ext" is also considered deprecated as well. I'm very far from a distutils/setuptools expert, though.

I always compile with DS_BUILD_OPS=1, and it seems to completely avoid JIT usage. I don't think I've had anything stored under ~/.cache/torch_extensions/ when compiling this way. I guess that if all the ops are compiled, the JIT never needs to be activated.

@stas00
Copy link
Collaborator Author

stas00 commented Feb 1, 2022

I was unaware that it's essentially calling python setup.py build_ext, which is now deprecated to call directly. I wonder if --global-option="build_ext" is also considered deprecated as well. I'm very far from a distutils/setuptools expert, though.

If it is then what is the modern way and such that allows parallel builds?

Ninja isn't great here since it doesn't parallelize small project builds. But it's used for JIT build of pytorch CUDA extensions. e.g. see here my attempts to speed up Megatron-Lm which has just 4 cuda kernels and barely 2 files each pytorch/pytorch#68923,

I always compile with DS_BUILD_OPS=1, and it seems to completely avoid JIT usage. I don't think I've had anything stored under ~/.cache/torch_extensions/ when compiling this way. I guess that if all the ops are compiled, the JIT never needs to be activated.

That's right. That's actually the main reason I prebuild as I have many cuda envs with different pytorch builds. Now I remember why !

@aphedges
Copy link
Contributor

aphedges commented Feb 2, 2022

I don't know what to do instead. From a very short search, the best I can find is https://discuss.python.org/t/the-difference-between-python-setup-py-build-ext-i-and-pip-install-e/3716/12 and pypa/setuptools#3025. At least the deprecation period should be long enough for a replacement to be made.

I was unaware of those details of Ninja. I haven't written any CUDA code myself, so I've mostly managed to avoid learning the details of CUDA build systems. Good to know about its performance, though.

@stas00
Copy link
Collaborator Author

stas00 commented Feb 2, 2022

BTW, I run into this problem everywhere I try to pre-build, so it's not JeanZay-specific. It clearly has to do with some limitation imposed by setup.py tools.

@jeffra, #1727 didn't make any difference. with master the same failure happens.

update: it looks like I had an old deepspeed, updating it it no longer appeared.

@stas00
Copy link
Collaborator Author

stas00 commented Feb 3, 2022

@aphedges, I am actually not at all sure build_ext is officially deprecated, This just seems to be one person's opinion in one thread and I can't find any official support to that statement. If you look at the latest python 3.10 docs, they promote using build_ext:

  1. https://docs.python.org/3/distutils/configfile.html
  2. https://docs.python.org/3/distutils/setupscript.html

I did find this discussion https://news.ycombinator.com/item?id=26159509 where the key phrase is:

The entire distutils package is deprecated, to be removed in Python 3.12.

But! It'll take years before deepspeed will be able to drop support for python 3.6, so why worry about python 3.12. Even if python 3.12 removes distutils they will make them available as a 3rd party package to support projects that rely on those tools and can't cut-off older python versions.

And given that they provide ziltch docs on how to replace build_ext we can safely ignore this for quite some time.

@stas00
Copy link
Collaborator Author

stas00 commented Feb 3, 2022

Since #1727 dealt with this issue I'm closing this.

Thank you Alex and Jeff!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants