Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add backoff and jitter to retry #643

Merged
merged 11 commits into from
Oct 13, 2024

Conversation

proost
Copy link
Contributor

@proost proost commented Oct 3, 2024

close: #633

Adding retry policy.

default is exponential backoff and jitter.

clean logic for retry handler is not easy. If have a idea, give me review.

@proost proost requested a review from rueian October 5, 2024 08:49
@proost proost requested a review from rueian October 6, 2024 14:22
cluster.go Outdated
Comment on lines 619 to 630

if !isWaitingForRetry {
shouldRetry := c.retryHandler.WaitUntilNextRetry(
ctx, attempts, resp.Error(),
)

if !shouldRetry {
continue
}

isWaitingForRetry = true
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you put the waiting here? Instead of putting it at:

rueidis/cluster.go

Lines 679 to 681 in fc348f5

if len(retries.m) != 0 {
goto retry
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we sleep not here, code is quite complicated. Because we have to distinguish retry or ask/moved in the L679. if retry, should wait. if not retry immediately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though there is randomness, i think waiting time for each goroutines, converges similarly. If you want exactly once to wait, i will change it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this detail I overlooked.

I think a better way to handle this is if any move/ask redirection happens, we don't back off for a retry.

Therefore, how about we add a new counter field to the retries struct? So that we can do this:

	if len(retries.m) != 0 {
		if retries.redirects != 0 {
			retries.redirects = 0
			goto retry
		}
		if c.retryHandler.WaitUntilNextRetry(...) {
			goto retry
		}
	}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can increment the counter after the lock:
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2df785e

We support configurable retry or not so should give error to RetryDelay. Because of this i use channel to gather errors.

We can't use just int. because it is not thread-safe. I prefer atomic in this case, because the type itself guarantees thread-safe. Even though doretry and doretrycache use mutex inside, but it is implicit to me to be thread-safe when load int value.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @proost,

I think a retryErrCh := make(chan error, len(multi)) is too over kill. Let me find a better way for you later.

but it is implicit to me to be thread-safe when load int value.

It is actually thread-safe, because we read the field after a wg.Wait().

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atomic is ok too, but please just use uint32 which is smaller and has no alignment issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @proost,

I think a better alternative for retryErrCh is calling RetryDelay directly in the doresultfn function and we store its maximum into the retries struct. Could be something like this:

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is actually thread-safe, because we read the field after a wg.Wait().

Oh, i missed waiting using sync.WaitGroup.

change to use retries.RetryDelay.
f04b433

@proost proost requested a review from rueian October 7, 2024 14:00
cluster.go Outdated
Comment on lines 703 to 714
for retryableErr := range retryErrCh {
shouldRetry := c.retryHandler.WaitUntilNextRetry(ctx, attempts, retryableErr)
if shouldRetry {
attempts++
goto retry
}
}

if retries.Redirects.Load() > 0 {
retries.Redirects.Store(0) // reset
goto retry
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if retries.Redirects.Load() > 0 should have a higher priority than for retryableErr := range retryErrCh.

If there is a redirect response, we should not delay the retry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@proost proost requested a review from rueian October 12, 2024 09:57
Copy link
Collaborator

@rueian rueian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR looks very good now. We can merge it. Thank you @proost for this great work! Really impressive.

One more thing I want to optimize is the positions of the newly added retryHandler field:
image

Putting it after a bool can cause some space waste, could you rearrange the field in a followup PR? Also, can retryHandler be just a struct instead of a pointer to struct?

@proost
Copy link
Contributor Author

proost commented Oct 13, 2024

@rueian
Sure. I will change in a followup PR.

@rueian rueian merged commit 113c567 into redis:main Oct 13, 2024
27 checks passed
@proost proost deleted the feat-adding-backoff-and-jitter-to-retry branch October 14, 2024 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: retry backoffs
2 participants