-
Notifications
You must be signed in to change notification settings - Fork 2k
Docker error #1424
Comments
This seems related to this (which I am unable to reproduce): Can you give me more info on your system so I can see if I can reproduce the error? Without a way to reproduce it I will never be able to make progress on it. |
hello Klueska what i can say is when i run but when i run +-----------------------------------------------------------------------------+ hope it could be helpfull for you |
Interesting. So it works with a cuda 10 image, but not with a cuda 11 image. |
Hello. I am having the exact same issue as reported by @stephcar75020 Here is my system setup: root@OMV: Server: Docker Engine - Community +-----------------------------------------------------------------------------+ |
The fact that it works with a 10.0 image, but breaks with an 11.0 image suggests that it has something to do with One thing that may be causing this is the following: When this bug is triggered, if the compat libraries fail to be detected, then running Can you install the RC at the following link to see if it fixes this issue for you? Link to package: Command to install:
|
So I was able to reproduce this with the following setting in
By default this should be the following on
The first will attempt to run It's unclear exactly why the first one is erroring out on
That said, can you double check what your settings for this in |
hello I tried your explaination
with the modified settings in /etc/nvidia-container-runtime/config.toml
disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
#accept-nvidia-visible-devices-as-volume-mounts = false
[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
#no-cgroups = false
#user = "root:video"
ldconfig = "@/sbin/ldconfig"
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
then running docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
i get this message:
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.
running docker run --rm --gpus all nvidia/cuda:10.0-base nvidia-smi
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.
if I go back in /etc/nvidia-container-runtime/config.toml
root@NAS:~# cat /etc/nvidia-container-runtime/config.toml
disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
#accept-nvidia-visible-devices-as-volume-mounts = false
[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
#no-cgroups = false
#user = "root:video"
ldconfig = "/sbin/ldconfig"
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
I go back with cuda:10.0-base running and cuda:11.0-base with error
hope it will be helpfull
Btw do you have an idea about the long life nvidia driver compatible with
the 5.9 kernel release date ?
Le mar. 8 déc. 2020 à 13:02, Kevin Klues <[email protected]> a
écrit :
… So I was able to reproduce this with the following setting in
/etc/nvidia-container-runtime/config.toml
ldconfig = "/sbin/ldconfig"
By default this should be the following on debian systems:
ldconfig = "@/sbin/ldconfig"
The first will attempt to run /sbin/ldconfig from inside the container,
while the second will attempt to run /sbin/ldconfig from the host file
system. The second is preferable, because you never know exactly what will
be installed on every container you run.
It's unclear exactly why the first one is erroring out on cuda:11.0-base(because
the contents of/sbin/ldconfigare identical on bothcuda:10.0-baseandcuda:11.0-base`),
i.e.:
$ docker run -it nvidia/cuda:10.0-base
***@***.***:/# cat /sbin/ldconfig
#!/bin/sh
if test $# = 0 \
&& test x"$LDCONFIG_NOTRIGGER" = x \
&& test x"$DPKG_MAINTSCRIPT_PACKAGE" != x \
&& dpkg-trigger --check-supported 2>/dev/null
then
if dpkg-trigger --no-await ldconfig; then
if test x"$LDCONFIG_TRIGGER_DEBUG" != x; then
echo "ldconfig: wrapper deferring update (trigger activated)"
fi
exit 0
fi
fi
exec /sbin/ldconfig.real "$@"
$ docker run -it nvidia/cuda:11.0-base
***@***.***:/# cat /sbin/ldconfig
#!/bin/sh
if test $# = 0 \
&& test x"$LDCONFIG_NOTRIGGER" = x \
&& test x"$DPKG_MAINTSCRIPT_PACKAGE" != x \
&& dpkg-trigger --check-supported 2>/dev/null
then
if dpkg-trigger --no-await ldconfig; then
if test x"$LDCONFIG_TRIGGER_DEBUG" != x; then
echo "ldconfig: wrapper deferring update (trigger activated)"
fi
exit 0
fi
fi
exec /sbin/ldconfig.real "$@"
That said, can you double check what your settings for this in
/etc/nvidia-container-runtime/config.toml are?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1424 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ARYP2B3A76J2MRJKU5G7U3DSTYIULANCNFSM4UFXVFLQ>
.
|
The newest version of Specifically this change in The latest release packages for the full
|
THis should have been resolved. If not, please open a new issue against https://github.com/NVIDIA/nvidia-container-toolkit |
Hello following the installation describe i'm experiencing this error
if someone have any clue i will apreciate !
1. Issue or feature description
2. Steps to reproduce the issue
nvidia-container-cli -k -d /dev/tty info
uname -a
Any relevant kernel output lines from
dmesg
Driver information from
nvidia-smi -a
docker version
dpkg -l '*nvidia*'
orrpm -qa '*nvidia*'
nvidia-container-cli -V
The text was updated successfully, but these errors were encountered: