Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Aws::SQS::QueuePoller? #92

Closed
sschepens opened this issue Apr 13, 2015 · 9 comments
Closed

Using Aws::SQS::QueuePoller? #92

sschepens opened this issue Apr 13, 2015 · 9 comments

Comments

@sschepens
Copy link
Contributor

Instead of manually sleeping :delay seconds for every queue which doesn't have messages why not use Aws::SQS::QueuePoller which is part of aws-sdk? http://docs.aws.amazon.com/sdkforruby/api/Aws/SQS/QueuePoller.html
I think it's a great alternative and could save us from maintaining some code, like queue pausing.
I'm trying to port current code to using it, but i don't understand yet if there could be problems if for example Fetcher's fetch method blocked polling and dispatching messages until Aws::SQS::QueuePoller ended for some reason or an exception was raised.

Maybe the creators/maintainers could check this up.

Cheers!
Sebastian

@phstc
Copy link
Collaborator

phstc commented Apr 13, 2015

Hey @sschepens could you have a look at #30

I will try to document / sum up this in the wiki.

@sschepens
Copy link
Contributor Author

Thanks for the link, one question i have is, is loadblancing between queues actually necessary? If so, it could be enabled somewhere else, keep one (or more?) open poller for every queue and maybe loadbalance on dispatch time or spawn more workers of those who have more priority.

@sschepens
Copy link
Contributor Author

I realize that it's not a big cost to do a lot o SQS requests, you're not taking account networking costs which shouldn't be much anyway, but it consumes much more network bandwidth to make requests all the time and may impact application performance.
Also, long_polling gets you the message as soon as there is one, with the current delay system we're just sleep flat seconds to retry later, increasing the time to receive a message.

@phstc
Copy link
Collaborator

phstc commented Apr 28, 2015

Not sure if I got your point. But basically the motivation on that was to have only one process shoryuken being able to consume message from N queues. And if you use a fetcher per queue (because of the long poll), you need to balance the processors across fetchers etc. During the Shoryuken conception I did some experiments and I chose the simpler implementation as the other didn't result in much benefit compared with its complexity.

Imagine you have 10 processors, and you do a long poll in 5 queues, and you receive 10 messages from each. How can you process those messages?

@sschepens
Copy link
Contributor Author

Yes, I realized that, that's why I went for the approach of having a separate fetcher for each queue and a configurable pool of processors per queue.

@phstc
Copy link
Collaborator

phstc commented Apr 29, 2015

If you have 10 queues, 25 processors each, and messages only in one queue. You will have 225 processors waiting, and only 25 working hard?

Or if you have 10 queues, 25 processors each, and all queues full of messages, will your container handle 250 processors up & running?

That's the why Shoryuken load balances the processors across queues.

And the approach you wanted you can do using multiple Shoryuken processes, having only one queue per process. But I believe that would be a waste of resources in most cases.

@phstc
Copy link
Collaborator

phstc commented Apr 29, 2015

BTW I'm not saying that your approach is wrong or less efficient. I'm just saying that both have pros & cons.

@sschepens
Copy link
Contributor Author

Processors with no work should be suspended until messages arrive, this leaves space for other threads to run, so it shouldn't be much of a problem.
Fetchers are suspended while doing IO so long polling should cause them to get suspended very often (according to configurable wait time on request) if messages are not available.
Yes, having more threads and more actors ends up consuming more memory, but that shouldn't be that much as they are plain ruby objects and don't store any data.
Thread scheduling is optimized at kernel/language level and should be smarter than manually loadbalancing, after all the kernel or language know when a thread can be suspended and when it should be brought back.
Processors most definitely do IO so they should be scheduled perfectly and allow each other to run, causing little disturbance between processors of different queues.

The only thing I don't like, but it seems like there's no way around it is that a Celluloid Actor runs on it's own thread, instead of being scheduled in a pool of threads. concurrent-ruby has Actors too and are scheduled in a pool of threads, but doing IO in an Actor will block threads, this is a bummer. Something like erlang would be great, where an actor doing IO will get suspended as if it were a thread of it's own, and leave the OS thread free to run another Actor.

No approach is better than the other, it's just a matter of trade offs.

@phstc
Copy link
Collaborator

phstc commented Sep 2, 2016

@sschepens thanks for the update.

No approach is better than the other, it's just a matter of trade offs.

I couldn't agree more ^

Probably this #236 will give more room to have different strategies.

@phstc phstc closed this as completed Sep 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants