Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consumer: redistributeRDY fails when connection remains active #179

Closed
meAmidos opened this issue May 20, 2016 · 3 comments · Fixed by #208
Closed

consumer: redistributeRDY fails when connection remains active #179

meAmidos opened this issue May 20, 2016 · 3 comments · Fixed by #208
Labels

Comments

@meAmidos
Copy link

While experimenting with NSQ I've encountered a problem that looks like a flaw in the consumer code.

My setup is:

  • There are two producers on different hosts.
  • Each producer periodically (interval = 3 secs) posts to the same topic/channel to it's local nsqd.
  • There is one consumer for this topic/channel. It discovers both nsqd nodes through nsqlookupd.
  • The consumer uses the default configuration, and, thus, has MaxInFlight = 1.

It occurs, that with this setup the consumer is able to read only from one nsqd of the two. The second nsqd never gets a chance to send anything because consumer constantly keeps the correspondent RDY = 0 for the second nsqd connection.

I found that for such cases there is a special treatment in the code - every once in a while the consumer tries to randomly redistribute RDY among connected nsqd nodes (method redistributeRDY). However, in practice it never works because of this condition in the code:

availableMaxInFlight := int64(maxInFlight) - atomic.LoadInt64(&r.totalRdyCount)
...
for len(possibleConns) > 0 && availableMaxInFlight > 0 {
<redistribution logic>
}

Debugging shows that availableMaxInFlight always equal 0 at this moment, and actual redistribution never happens.

In fact, even if I hack a little, and force this redistribution to happen it does not work either, because there is a similar condition in method updateRDY. When there is a lack of available RDY, this method sets up a timer to retry in 5 seconds, and this attempt to retry always fails, and the timer is rescheduled again and again.

So, the questions are:

  • Is it a bug or a feature?
  • How this redistribution is supposed to work in such cases where active producers never give other producers a chance to receive even a minimal share of RDY/MaxInFlight?
  • Am I supposed to be preventive when creating a consumer, and set up the value of MaxInFlight in such a way that it will most likely be bigger than a potential number of producing nsqd nodes? Does not seem as very viable solution in the long term.
@mreiferson
Copy link
Member

@meAmidos thanks for the detailed report!

I just refreshed my memory of this code and I think you might have found a bug. Yes, go-nsq strictly enforces max-in-flight, so we do expect the consumer to perform "redistribution". However, the redistribution logic currently assumes that one of the connections will (eventually) be idle. In your case it never is, since there is a steady incoming stream of messages from each nsqd. This is the problematic code.

@mreiferson mreiferson added the bug label May 20, 2016
@judwhite
Copy link
Contributor

This looks similar to an issue I attempted to address in the C# client, discussion here. The solution isn't optimal and requires a flag to be set because it will exceed max-in-flight as it's floating RDY's around looking for another nsqd with the current implementation.

@mreiferson mreiferson changed the title consumer: question about redistributeRDY consumer: redistributeRDY fails when connection remains active Nov 19, 2016
@mreiferson
Copy link
Member

see #208

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants