Command to quit miner on error #97

reb0rn21 · 2017-06-30T20:53:00Z

If the mining card is OC-ed to the limit, miner just stop with some "cuda error" it would be nice to have command just to quit miner on error so we can use restart scrypt

DLS-bau · 2017-06-30T22:56:08Z

A much safer solution would be to have your restart script monitor the GPUs usage and restart the miner when any of the GPUs usage drops below 50% for several seconds. This is easily done on NVIDIA.

rizwansarwar · 2017-07-01T07:25:05Z

@DLS-bau Do you have working example of a script? May be you can share with rest of us, thanks

arensirb · 2017-07-01T18:12:27Z

I think there should be an exit command regardless of error or not.. Press Q for example and it shuts down properly instead of just killing the process..
Killing the process actually screws up my system more than when it crashes..

reb0rn21 · 2017-07-05T01:23:42Z

Yeah command -k or so that will quit miner on any error, so simple batch loop will restart it, I use it for most miners

murathai · 2017-07-12T11:05:03Z

reb0rn21 can you share a sample bat file for batch loop restart?

reb0rn21 · 2017-07-12T15:07:31Z

:loop
ethminer.exe
goto loop

just disable widows error reporting, thats what I do

Malapha · 2017-07-18T08:04:08Z

see here [Issue 72] There are some solutions with batch, powershell or php available..

AndreaLanfranchi · 2017-08-16T15:28:15Z

My very basic (but effective) solution is to monitor the miner with a bash watchdog script.
You have to redirect ethminer output (stdout & stderr) to a log file and then run this script.

#!/bin/bash
#
# minerwd.sh
# Author: Andrea Lanfranchi
#
# Monitors ethminer output log in search of errors.
# If any is found in last 10 rows then mining rig is restarted
#
# Pre-requistes
# apt-get install inotify-tools
#

while inotifywait -e modify ~/miner.log > /dev/null 2>&1 ; do

  # Lookup last 10 rows of log file in search of errors
  # Feel free to integrate grep pattern or create more conditions
  if tail -n10 ~/miner.log | grep -io "cuda error\|error cuda"; then
  
	# Send mail
	echo "Miner requires restart due to error" | mail -s "Miner WatchDog Restart" prospector@localhost
    
	# Restart mining rig
	sudo /sbin/shutdown -f -r +2
	
	# Abandon WatchDog
	exit
	
  fi
done

seymores · 2017-09-04T06:59:25Z

Here's something I am using for my nvidia cards.
Feel free to modify it to your needs.

#!/bin/sh

PREP_GPUS="/home/linus/set_overclocking.sh"
MINER_SCRIPT="/home/linus/start_miner.sh"

gpu0_ultilization=`nvidia-smi -i 0 --query-gpu=utilization.gpu --format=csv,noheader,nounits`

if [ $gpu0_ultilization -lt 50 ]
then
  echo "[alert] GPU seems to be down, restarting."
  $PREP_GPUS
  $MINER_SCRIPT
  echo "Done restarting miner script, going to sleep now"
else
  echo "[info] All normal"
fi

piotr-dobrogost · 2017-09-04T20:43:56Z

I'm using this with nvidia cards and tmux:

#!/bin/bash

file=/tmp/ethminer-restarts.log
POWER_THRESHOLD=50
PROBE_DELAY=30
STARTUP_DELAY=60
RED='\033[0;31m'
GREEN='\033[0;32m'
NC='\033[0m' # no color

while true
do
    sleep $PROBE_DELAY
    power_draw=$(nvidia-smi --id=0 --query-gpu=power.draw --format=csv,noheader,nounits)
    if (( $(echo "$power_draw < $POWER_THRESHOLD" | bc -l) ))
    then
      echo -ne " $RED$(date +'%H:%M') ✘$NC " | tee -a $file
      tmux respawn-pane -k -t ethminer:0.0
      sleep $STARTUP_DELAY
    else
      echo -ne "$(date +'%M') ${GREEN}✔$NC "
    fi
done

ddobreff · 2017-09-07T10:10:38Z

This method doesn't work everytime. If GPU fails nvidia-smi is executed in a loop without output. I am currently working on finding a better way to implement watchdog function.

hcbraun · 2017-09-07T11:40:23Z

@ddobreff When nvidia-smi stops working, the driver will log a XID error. You can check with:
journalctl _TRANSPORT=kernel | grep NVRM
So far i have not found a reliable why to recover from those failures. I just trigger a reboot on them (https://jjacky.com/journal-triggerd/)

ddobreff · 2017-09-07T11:52:35Z

We shouldn't be using this function at all, it may cause other dificulties like I forgot that I stopped the miner and while compiling the system rebooted...A better approach is to use miner as instructor for watchdog.

piotr-dobrogost · 2017-09-07T13:07:58Z

This method doesn't work everytime. If GPU fails nvidia-smi is executed in a loop without output.

True. I haven't tried it but I think checking exit code from nvidia-smi should allow to catch this. Another thing that should be accounted for is when nvidia-smi hangs (I think I've seen such cases).

DeadManWalkingTO · 2018-02-27T23:15:30Z

After #757 (added --exit parameter to exit whenever an error occurred) you can use a watchdog.

Try ETHminerWatchDogDmW Windows7/8/10 [32/64] & Linux (Any Dist/Any Ver/Any Arch) (#735).

Check and feedback please.
Thank you!

chfast added enhancement up-for-grabs labels Jul 3, 2017

AndreaLanfranchi mentioned this issue Sep 3, 2017

Please add a WATCHDOG function so miner can alert when error occurs. #274

Closed

piotr-dobrogost mentioned this issue Sep 8, 2017

Stop console at error instead continue print 0.00 Mh/s #299

Closed

MariusVanDerWijden mentioned this issue Feb 16, 2018

added --exit parameter to exit whenever an error occurred #757

Merged

This was referenced Feb 17, 2018

Simple Script WatchDog #735

Closed

Issues that can be closed (cleanup) #764

Closed

MariusVanDerWijden closed this as completed Mar 1, 2018

lesjokolat mentioned this issue Oct 14, 2018

All Shares rejected after sometime #1639

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Command to quit miner on error #97

Command to quit miner on error #97

reb0rn21 commented Jun 30, 2017

DLS-bau commented Jun 30, 2017 •

edited

Loading

rizwansarwar commented Jul 1, 2017

arensirb commented Jul 1, 2017

reb0rn21 commented Jul 5, 2017

murathai commented Jul 12, 2017

reb0rn21 commented Jul 12, 2017

Malapha commented Jul 18, 2017

AndreaLanfranchi commented Aug 16, 2017 •

edited

Loading

seymores commented Sep 4, 2017

piotr-dobrogost commented Sep 4, 2017

ddobreff commented Sep 7, 2017

hcbraun commented Sep 7, 2017

ddobreff commented Sep 7, 2017

piotr-dobrogost commented Sep 7, 2017 •

edited

Loading

DeadManWalkingTO commented Feb 27, 2018

Command to quit miner on error #97

Command to quit miner on error #97

Comments

reb0rn21 commented Jun 30, 2017

DLS-bau commented Jun 30, 2017 • edited Loading

rizwansarwar commented Jul 1, 2017

arensirb commented Jul 1, 2017

reb0rn21 commented Jul 5, 2017

murathai commented Jul 12, 2017

reb0rn21 commented Jul 12, 2017

Malapha commented Jul 18, 2017

AndreaLanfranchi commented Aug 16, 2017 • edited Loading

seymores commented Sep 4, 2017

piotr-dobrogost commented Sep 4, 2017

ddobreff commented Sep 7, 2017

hcbraun commented Sep 7, 2017

ddobreff commented Sep 7, 2017

piotr-dobrogost commented Sep 7, 2017 • edited Loading

DeadManWalkingTO commented Feb 27, 2018

DLS-bau commented Jun 30, 2017 •

edited

Loading

AndreaLanfranchi commented Aug 16, 2017 •

edited

Loading

piotr-dobrogost commented Sep 7, 2017 •

edited

Loading