-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abort release builds if already re-queued #864
Conversation
There seems to be a race of some kind regarding the triggering of downstream jobs and the execution of jobs already in the queue. What I've observed is that if a build is in the queue but is waiting for a single dependency to finish building, when the dependency finishes, the build starts executing immediately. The dependency's build, however, triggers another build of that same job that is immediately re-added to the queue. One theory is that there is some delay between when a new build is "triggered" and when that build is actually enqueued. So the sequence may be: 1. A is building, while B is waiting for A 2. A finishes building and requests a build of downstream project B 3. B is dequeued and begins execution 4. A's build request for B is processed and B is enqueued again An alternative strategy to avoid this strange behavior might be to enact some sort of quiet period AFTER a build triggers new downstream builds but before the build is actually regarded as 'complete'.
I have an example of this on the staging farm. If we look through build logs, we may be able to find more occurrences of this double-build pattern. The two builds happen less than 1:30 apart. This rapid re-build might be something we could search for using the Jenkins script console to find out how often this is happening, and how effective this mitigation is. |
I've seen this double building pattern a lot with releases in Kinetic of the relatively long running and highly interlinked jsk packages. There's a ticket #475 with more information and what we believe to be the upstream Jenkins bug that's still unresolved a decade later. This workaround sounds like as good a solution as possible. It might be worth putting a comment into the snippet referencing the ticket and linking to the issues so that some day in the future if it's resolved people will know what it's working around and when it can be removed. |
This is actually happening far more often than I originally thought. I just deployed brand-new jobs for RHEL on build.ros2.org, which yielded a great opportunity to see how often we're re-building successful jobs. I wrote the following Groovy to count the number of jobs that have a single success, and the number of jobs that have more than one success:
The initial build isn't done yet, but there are enough builds to yield significant statistics:
So that's |
Done in 7a28b17, thanks @tfoote Another observation: this seems to happen most often with bottleneck packages when there is free executor space. This is supported by the abundance of ament packages in the list above. I also just observed this behavior on a majority of the message packages, which brought the excess up to ~24%. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent write up of a solid investigation. Thanks!
There seems to be a race of some kind regarding the triggering of downstream jobs and the execution of jobs already in the queue. What I've observed is that if a build is in the queue but is waiting for a single dependency to finish building, when the dependency finishes, the build starts executing immediately. The dependency's build, however, triggers another build of that same job that is immediately re-added to the queue.
One theory is that there is some delay between when a new build is "triggered" and when that build is actually enqueued. So the sequence may be:
An alternative strategy to avoid this strange behavior might be to enact some sort of quiet period AFTER a build triggers new downstream builds but before the build is actually regarded as 'complete'.