-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trainer register etcd #3053
Trainer register etcd #3053
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a comment:
没有单测的话,如果确定是对的?
etcd client暂时没有好的方法单测。只能集成测试或者e2e test,如果有好的测试方法,欢迎PR~ |
} | ||
|
||
ctx, cancel := context.WithTimeout(context.Background(), timeout) | ||
_, err = c.Put(ctx, DefaultTrainerPath+"/"+clientUUID.String(), trainerIP, clientv3.WithLease(resp.ID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about write the pass status into etcd as the following?
type TrainerStatus struct {
trainerIP string
pass_num int
pass_status string
}
For the master check a pass status:
- master check trainer pass status:
1.1. if all trainer finished a pass, goto 2
1.2. else goto 1 - master move task from Done Queue to Todo Queue.
For the trainer check a pass status:
- fetch task from master
1.1. if NoTaskFound, Update status to Finished and goto 2
1.1. else training with the task - if pass_num == finished_pass_num; then finish the train; else goto 1.
@typhoonzero @gongweibao 有了glide之后我们可以用https://godoc.org/github.com/coreos/etcd/embed 了,欢迎PR哈! |
Maybe we don't need this PR yet due to the latest change in #2948 ? If so maybe we can close it, and reopen it once needed :) |
Closing, reopen when needed. |
Fix #3051