-
Notifications
You must be signed in to change notification settings - Fork 2.3k
GPU crash after Ctrl-C on Linux #349
Comments
Does it happen when NOT overclocked? Need to troubleshoot/narrow it down as you mentioned.. |
I have same result for Ctrl-C with overclocked 1060 3GB (Ubuntu 16). |
Was away, sorry. |
Is it possible to add some command inside the program to shut down the process gracefully? I am also experiencing the problem on Windows 7 with 1080 Ti |
@fhlfibh Those are high overclocks for 1060s. I experience similar lockups when I push the clocks. Sometimes it will appear to be working great and a couple of hours later it'll freeze. Not sure the -200 GPU clock delta does anything... doesn't seem to affect power or performance on my 1060s. I overclock the memory transfer by a modest 404. I value reliability over a few percent extra performance. |
@dennis97519 yeah it would be nice, or simply have binary 'watch' for |
@dennis97519 @bmatthewshea What exactly would you expect on a "graceful" exit? |
@EoD No "Graphics driver crashed and recovered" message when exiting. |
@dennis97519 Well, this sounds more like a driver issue than an issue with ethminer. I am not sure how ethminer can do a "graceful" exit when the driver is actually the one crashing. |
@EoD The driver isn't crashing arbitrarily, obviously. It's crashing when exiting ethminer. Hence it was reported here. I have had this happen when not over-clocked at all. Doesn't seem to matter. Doesn't happen all the time. |
@bmatthewshea I am unable to reproduce the crash at all and never experienced it, hence I assume this crash is Nvidia-only. And this would indicate even more that it is a driver problem. Mitigating driver issues in a userland program (like ethminer) might be possible, but our "fix" is then just a workaround for a driver bug and never a proper fix. Did anyone of you try reporting this upstream to Nvidia? |
@EoD Understood & no I have not reported the driver crash upstream as it has never happened with anything but ethminer. As I said, it's hit and miss at that & not a huge issue. I -do- use the machine w/ nv 1060 occasionally when it's mining (web browser / etc , but nothing gpu intensive) whereas others who don't see it may be solely mining at all times. Maybe that is a factor. Maybe not.. |
The demand (requirements) on drivers are of course higher if they are under heavy load and especially if they are switching from low->high and high->low load. Hence I recommend report it upstream. |
So, let's see: The CTRL-C crashing the NVIDIA drivers must be an NVIDIA driver problem when it NEVER occurs using Claymore and it ALWAYS occurs when I try to close ethminer on a system with more than two GPU cards installed. It seems to be a strange stance for @EoD to take when there seems to be at least one fix (threads #305 and #331) that appears to be able to correct the CTRL-C problem. Is @EoD testing this on more than one system? More than one video card? I think that what user |
@inprosys Windows or Linux? Ah, never mind... Windows only it seems. |
@inprosys yes, of course. Two completely different systems with two different kind of cards (both AMD, but different generations and different drivers) and on both systems both Linux and Windows. The issue never happened in any configuration. I am not against the idea of #331, the idea is good in general. I just wanted to point out that we are working around a driver issue. As I already tried to say above, working around a driver issue might just be a temporary fix and not a permanent fix. |
Yes, @jean-m-cyr, it is Windows in my case. OK, @EoD, I can see how when you have only tested with AMD, and cannot recreate the problem, that it seems as if the CTRL-C problem can be dismissed as being strictly an NVIDIA driver problem. But, it could also be improper program cleanup (loose ends, loose threads, memory leaks, hanging semaphores, etc.) before termination that AMD happens to ignore. (One man's bug is another man's feature.) All I'm asking is "Is ethminer doing proper cleanup before exiting?" Should the NVIDIA device driver be immune to all possible program abuses? Maybe. But, given what a pain-in-the-ass it is to recover GPU cards disappearing off the PCIe bus, it would be great to avoid this problem if all it took was careful programming exit procedures -- that's not a work-around -- it's good programming standards. Just for the record, I really appreciate all the time and effort that people are contributing to support this project. |
Maybe just good luck so far, but haven't been noticing it as much/at all? on last few dev builds and 13 release. |
I think this issue can be closed. |
Should be fixed or at least improved by #331 |
Hi,
I have a 1060 3Gb with the latest
ethminer
from git.The GPU is overclocked -200 core offset, +950 RAM offset, power limit 70W, and 65C average temperature.
In general the miner seems to work Ok, but when I stop it with
Ctrl-C
-- sometimes the whole GPU falls off the bus:According to http://docs.nvidia.com/deploy/xid-errors/index.html Xid 62 is an internal microcontroller halt.
I wonder if it's hardware's problem, or the driver's, or the miner's, or something else?
It seems related to overclocking, but I'm not sure.
Regards,
Alex
The text was updated successfully, but these errors were encountered: