Skip to content
This repository was archived by the owner on Apr 24, 2022. It is now read-only.

Two ethminer V12 systems crashed on 11/12 at 01:01 (version V9 ran w/o error) #378

Closed
inprosys opened this issue Nov 12, 2017 · 6 comments
Closed

Comments

@inprosys
Copy link

Last night on 11/12 (Sun morning) at 01:01 I had two separate computer systems running ethminer V12 crash -- at the same time -- with the following error: CUDA error in func 'ethash-cuda-miner::search' at line 346:: unknown error. Both systems are running Windows 7 with Nvidia cards (GTX 1080 Ti and GTX 1080). I had one other system (also Win 7) running ethminer V9 with two AMD cards (570, 580) which continued to run without any problems.

At first I thought that it might be due to a switch to a new DAG block, but I believe that the switch from DAG #150 (block 4529999) to #151 (block 4530000) occurred on 11/10 -- is that correct?

So, what happened at 01:01 on the morning of 11/12 that killed two of my ethminer V12 NVIDIA systems? Any ideas would be appreciated.

@derubm
Copy link

derubm commented Nov 12, 2017

windows update ? ( just a wild guess) check your systemlog

@bmatthewshea
Copy link
Contributor

bmatthewshea commented Nov 12, 2017

Did you notice nVidia driver crash? You overclocking? I seem to recall that error in past, but I had overclocked too much and Windows display driver crashed (tray does popup) - anything mining on device or monitoring device will suddenly shows nothing (Ethminer crashes/freezes, GPU temp shows 0, Afterburner blank, etc) because card(s) 'disappears'. For me this requires a reboot to get it back.

Also, not a bad idea to make a script to monitor the output and restart if needed.

I have had it die under Linux occasionally and use a cronjob script.
In Windows you could probably just use task manager+batch script and have it run ever 5 mins or so and simply test for -any- output of ethminer process. (And/or watch for 'error' strings - or whatever.) Kill/restart (or even reboot if needed) it via same or another batch/cmd/ps1 file.

I also redirect all ethminer (or whatever miner) output to a 'log' file. On a restart it copies the last 100 lines to a different text file thus preserving the error and then restarts process with a new log file.

@inprosys
Copy link
Author

Thanks derubm for the idea to look at the system log. I don't allow Microsoft automatic updating. However, there was a "security audit logon" event logged around that time. A "security audit" event can occur for all sorts of reasons, e.g., just accessing a shared disc from another computer. Do not know why that would cause a CUDA error on two machines at the same time.

@inprosys
Copy link
Author

Thanks bmattewshea for the restart idea via task manager. I already log all output using "2>>logfile.txt" so the ">>" operator will cause output to be added to the log file and not overwrite it. I clear the log file every few days after collecting statistics if there are no unusual events or errors.

@dhjw
Copy link

dhjw commented Nov 15, 2017

Reduce overclock a little bit on whichever card goes offline.

@DeadManWalkingTO
Copy link
Contributor

I think this issue can be closed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants