Skip to content
This repository was archived by the owner on Apr 24, 2022. It is now read-only.

Command to quit miner on error #97

Closed
reb0rn21 opened this issue Jun 30, 2017 · 15 comments
Closed

Command to quit miner on error #97

reb0rn21 opened this issue Jun 30, 2017 · 15 comments

Comments

@reb0rn21
Copy link

If the mining card is OC-ed to the limit, miner just stop with some "cuda error" it would be nice to have command just to quit miner on error so we can use restart scrypt

@DLS-bau
Copy link

DLS-bau commented Jun 30, 2017

A much safer solution would be to have your restart script monitor the GPUs usage and restart the miner when any of the GPUs usage drops below 50% for several seconds. This is easily done on NVIDIA.

@rizwansarwar
Copy link

@DLS-bau Do you have working example of a script? May be you can share with rest of us, thanks

@arensirb
Copy link

arensirb commented Jul 1, 2017

I think there should be an exit command regardless of error or not.. Press Q for example and it shuts down properly instead of just killing the process..
Killing the process actually screws up my system more than when it crashes..

@reb0rn21
Copy link
Author

reb0rn21 commented Jul 5, 2017

Yeah command -k or so that will quit miner on any error, so simple batch loop will restart it, I use it for most miners

@murathai
Copy link

reb0rn21 can you share a sample bat file for batch loop restart?

@reb0rn21
Copy link
Author

:loop
ethminer.exe
goto loop

just disable widows error reporting, thats what I do

@Malapha
Copy link

Malapha commented Jul 18, 2017

see here [Issue 72] There are some solutions with batch, powershell or php available..

@AndreaLanfranchi
Copy link
Collaborator

AndreaLanfranchi commented Aug 16, 2017

My very basic (but effective) solution is to monitor the miner with a bash watchdog script.
You have to redirect ethminer output (stdout & stderr) to a log file and then run this script.

#!/bin/bash
#
# minerwd.sh
# Author: Andrea Lanfranchi
#
# Monitors ethminer output log in search of errors.
# If any is found in last 10 rows then mining rig is restarted
#
# Pre-requistes
# apt-get install inotify-tools
#

while inotifywait -e modify ~/miner.log > /dev/null 2>&1 ; do

  # Lookup last 10 rows of log file in search of errors
  # Feel free to integrate grep pattern or create more conditions
  if tail -n10 ~/miner.log | grep -io "cuda error\|error cuda"; then
  
	# Send mail
	echo "Miner requires restart due to error" | mail -s "Miner WatchDog Restart" prospector@localhost
    
	# Restart mining rig
	sudo /sbin/shutdown -f -r +2
	
	# Abandon WatchDog
	exit
	
  fi
done 

@seymores
Copy link

seymores commented Sep 4, 2017

Here's something I am using for my nvidia cards.
Feel free to modify it to your needs.

#!/bin/sh

PREP_GPUS="/home/linus/set_overclocking.sh"
MINER_SCRIPT="/home/linus/start_miner.sh"

gpu0_ultilization=`nvidia-smi -i 0 --query-gpu=utilization.gpu --format=csv,noheader,nounits`

if [ $gpu0_ultilization -lt 50 ]
then
  echo "[alert] GPU seems to be down, restarting."
  $PREP_GPUS
  $MINER_SCRIPT
  echo "Done restarting miner script, going to sleep now"
else
  echo "[info] All normal"
fi

@piotr-dobrogost
Copy link

I'm using this with nvidia cards and tmux:

#!/bin/bash

file=/tmp/ethminer-restarts.log
POWER_THRESHOLD=50
PROBE_DELAY=30
STARTUP_DELAY=60
RED='\033[0;31m'
GREEN='\033[0;32m'
NC='\033[0m' # no color

while true
do
    sleep $PROBE_DELAY
    power_draw=$(nvidia-smi --id=0 --query-gpu=power.draw --format=csv,noheader,nounits)
    if (( $(echo "$power_draw < $POWER_THRESHOLD" | bc -l) ))
    then
      echo -ne " $RED$(date +'%H:%M')$NC " | tee -a $file
      tmux respawn-pane -k -t ethminer:0.0
      sleep $STARTUP_DELAY
    else
      echo -ne "$(date +'%M') ${GREEN}$NC "
    fi
done

@ddobreff
Copy link
Collaborator

ddobreff commented Sep 7, 2017

This method doesn't work everytime. If GPU fails nvidia-smi is executed in a loop without output. I am currently working on finding a better way to implement watchdog function.

@hcbraun
Copy link

hcbraun commented Sep 7, 2017

@ddobreff When nvidia-smi stops working, the driver will log a XID error. You can check with:
journalctl _TRANSPORT=kernel | grep NVRM
So far i have not found a reliable why to recover from those failures. I just trigger a reboot on them (https://jjacky.com/journal-triggerd/)

@ddobreff
Copy link
Collaborator

ddobreff commented Sep 7, 2017

We shouldn't be using this function at all, it may cause other dificulties like I forgot that I stopped the miner and while compiling the system rebooted...A better approach is to use miner as instructor for watchdog.

@piotr-dobrogost
Copy link

piotr-dobrogost commented Sep 7, 2017

This method doesn't work everytime. If GPU fails nvidia-smi is executed in a loop without output.

True. I haven't tried it but I think checking exit code from nvidia-smi should allow to catch this. Another thing that should be accounted for is when nvidia-smi hangs (I think I've seen such cases).

@DeadManWalkingTO
Copy link
Contributor

After #757 (added --exit parameter to exit whenever an error occurred) you can use a watchdog.

Try ETHminerWatchDogDmW Windows7/8/10 [32/64] & Linux (Any Dist/Any Ver/Any Arch) (#735).

Check and feedback please.
Thank you!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests