-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add backoff and jitter to retry #643
feat: add backoff and jitter to retry #643
Conversation
cluster.go
Outdated
|
||
if !isWaitingForRetry { | ||
shouldRetry := c.retryHandler.WaitUntilNextRetry( | ||
ctx, attempts, resp.Error(), | ||
) | ||
|
||
if !shouldRetry { | ||
continue | ||
} | ||
|
||
isWaitingForRetry = true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you put the waiting here? Instead of putting it at:
Lines 679 to 681 in fc348f5
if len(retries.m) != 0 { | |
goto retry | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we sleep not here, code is quite complicated. Because we have to distinguish retry or ask/moved in the L679. if retry, should wait. if not retry immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though there is randomness, i think waiting time for each goroutines, converges similarly. If you want exactly once to wait, i will change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this detail I overlooked.
I think a better way to handle this is if any move/ask redirection happens, we don't back off for a retry.
Therefore, how about we add a new counter field to the retries
struct? So that we can do this:
if len(retries.m) != 0 {
if retries.redirects != 0 {
retries.redirects = 0
goto retry
}
if c.retryHandler.WaitUntilNextRetry(...) {
goto retry
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We support configurable retry or not so should give error to RetryDelay
. Because of this i use channel to gather errors.
We can't use just int. because it is not thread-safe. I prefer atomic
in this case, because the type itself guarantees thread-safe. Even though doretry
and doretrycache
use mutex inside, but it is implicit to me to be thread-safe when load int value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @proost,
I think a retryErrCh := make(chan error, len(multi))
is too over kill. Let me find a better way for you later.
but it is implicit to me to be thread-safe when load int value.
It is actually thread-safe, because we read the field after a wg.Wait()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Atomic is ok too, but please just use uint32
which is smaller and has no alignment issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @proost,
I think a better alternative for retryErrCh
is calling RetryDelay
directly in the doresultfn
function and we store its maximum into the retries
struct. Could be something like this:

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is actually thread-safe, because we read the field after a wg.Wait().
Oh, i missed waiting using sync.WaitGroup
.
change to use retries.RetryDelay
.
f04b433
cluster.go
Outdated
for retryableErr := range retryErrCh { | ||
shouldRetry := c.retryHandler.WaitUntilNextRetry(ctx, attempts, retryableErr) | ||
if shouldRetry { | ||
attempts++ | ||
goto retry | ||
} | ||
} | ||
|
||
if retries.Redirects.Load() > 0 { | ||
retries.Redirects.Store(0) // reset | ||
goto retry | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if retries.Redirects.Load() > 0
should have a higher priority than for retryableErr := range retryErrCh
.
If there is a redirect response, we should not delay the retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR looks very good now. We can merge it. Thank you @proost for this great work! Really impressive.
One more thing I want to optimize is the positions of the newly added retryHandler
field:
Putting it after a bool can cause some space waste, could you rearrange the field in a followup PR? Also, can retryHandler
be just a struct instead of a pointer to struct?
@rueian |
close: #633
Adding retry policy.
default is exponential backoff and jitter.
clean logic for retry handler is not easy. If have a idea, give me review.