-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubeadm stuck updating etcd static pod #2718
Comments
It seems that changing Problem with kubelet maybe? Not seeing any errors from journalctl. |
Our upgrade CI is green / passing. |
Thats probably because it was rolled back. |
No, I looked before it was rolled back. I also made a manual change to the file, and observed the results. The pod is restarted, but There are no relevant errors in the etcd container. Kind of seems weird. |
Here is the logs from my journalctl after making a single change to the manifest. There appear to be a lot of logs for just a single change. My growing suspicion is that the error is hidden here somewhere, and that kubelet may be the culprit. The ""Nameserver limits exceeded"" messages are not an issue. Should try to work out how to avoid them though. I think this might be related to the fact I have listed both IPv4 and IPv6 nameservers. Wondering about these messages though. Are these an indication of a problem, or is this just normal when restarting etcd? i.e. error might be that it is trying to contact the etcd which was just shut down.
But later after retrying several times it thinks it is OK:
So maybe just a red hearing. |
If I make a change to another manifest, e.g. Wondering if the problem is it is writing to the manifest to the etcd daemon that then gets shutdown, and it gets shutdown before it can sync its changes with the rest of the cluster. So kubelet ends up retrieving the old manifest data when trying to restart the pod. Or something crazy like that. |
Sorry but I don't see a kubeadm bug here. We normally don't provide support in this issue tracker. Try #kubeadm #etc or other support channels like stackoverflow /support |
Hello, @brianmay 🤖 👋 You seem to have troubles using Kubernetes and kubeadm. Please see: |
Of course, if there is a confirmed reproducible bug let's reopen. |
Hello, @brianmay 🤖 👋 You seem to have troubles using Kubernetes and kubeadm. Please see: |
Finally found the problem. I had a While this is not a bug, I would actually suggest that kubeadm should check for the existence of typical backup editor files that could cause problems for the upgrade. And warn the operator if anything is found. Also am a bit surprised that Kubernetes will read such files, I would normally have assumed it should limit files it processed to those matching *.yaml This is the stack overflow answer that helped me: https://stackoverflow.com/a/56326068/5766144 |
i think this is related to this report / PR for the kubelet: |
Is this a BUG REPORT or FEATURE REQUEST?
BUG REPORT
Versions
kubeadm version (use
kubeadm version
):kubeadm version: &version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.8", GitCommit:"a12b886b1da059e0190c54d09c5eab5219dd7acf", GitTreeState:"clean", BuildDate:"2022-06-16T05:56:32Z", GoVersion:"go1.17.11", Compiler:"gc", Platform:"linux/amd64"}
Note: Similar results with 1.24.2
Environment:
kubectl version
):Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-04-14T08:49:13Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-04-14T08:43:11Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"}
uname -a
): Linux kube-master 5.10.0-15-amd64 kubeadm join on slave node fails preflight checks #1 SMP Debian 5.10.120-1 (2022-06-09) x86_64 GNU/LinuxWhat happened?
Tried to upgrade to 1.24.2, it failed to restart etcd. No obvious errors.
Tried to upgrade to 1.23.8, similar issues.
What you expected to happen?
etcd should upgrade.
How to reproduce it (as minimally and precisely as possible)?
Upgrade kubernetes from 1.23.6 to anything.
Anything else we need to know?
As far as I can tell, the pod is not encountering any errors, and there doesn't actually appear to be anything going wrong.
The text was updated successfully, but these errors were encountered: