Closed
Description
Bug Report
I quite frequently find the operatorhub-catalog process in a crashloop backoff with no useful information for debugging how it ended up in that state. It would be nice if we could get some more diagnostic information as to why this pod winds up in that state.
What did you do?
Deployed OLM using the provided helm template and wait for some time.
olm operatorhubio-catalog-t66gt 0/1 CrashLoopBackOff 3848 8d
What did you expect to see?
I would expect the process to run healthy unless there was an issue. Upon encountering an issue, I would expect the log of operatorhub-catalog to communicate what is wrong with the pod instead of serving a single starting gRPC server message.
Environment
- operator-lifecycle-manager version:
- Kubernetes version information:
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.0", GitCommit:"2bd9643cee5b3b3a5ecbd3af49d09018f0773c77", GitTreeState:"clean", BuildDate:"2019-09-18T14:36:53Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:50Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
- Kubernetes cluster kind: (openstack, bare metal, and aws)
Possible Solution
Additional context
Name: operatorhubio-catalog-5jtrd
Namespace: olm
Priority: 0
Node: admin-kcp-primary-0/192.168.200.9
Start Time: Fri, 25 Oct 2019 09:51:57 -0500
Labels: olm.catalogSource=operatorhubio-catalog
Annotations: cni.projectcalico.org/podIP: 10.42.1.2/32
kubernetes.io/psp: default-psp
Status: Running
IP: 10.42.1.2
IPs: <none>
Containers:
registry-server:
Container ID: docker://37233b694ec50f4963d23cd9447fd458a19cb3f36013ca53521a500e1fceba4d
Image: quay.io/operator-framework/upstream-community-operators:latest
Image ID: docker-pullable://quay.io/operator-framework/upstream-community-operators@sha256:95a59849ea594e97742264d66b80dcc2a8ac3515ff22cf64538b21101f345111
Port: 50051/TCP
Host Port: 0/TCP
State: Running
Started: Fri, 25 Oct 2019 09:55:03 -0500
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Fri, 25 Oct 2019 09:54:23 -0500
Finished: Fri, 25 Oct 2019 09:55:01 -0500
Ready: True
Restart Count: 4
Requests:
cpu: 10m
memory: 50Mi
Liveness: exec [grpc_health_probe -addr=localhost:50051] delay=10s timeout=1s period=10s #success=1 #failure=3
Readiness: exec [grpc_health_probe -addr=localhost:50051] delay=5s timeout=1s period=10s #success=1 #failure=3
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-5hwc7 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-5hwc7:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-5hwc7
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m1s default-scheduler Successfully assigned olm/operatorhubio-catalog-5jtrd to admin-kcp-primary-0
Normal Started 2m54s (x2 over 3m49s) kubelet, admin-kcp-primary-0 Started container registry-server
Warning Unhealthy 2m20s (x6 over 3m20s) kubelet, admin-kcp-primary-0 Readiness probe failed: timeout: failed to connect service "localhost:50051" within 1s
Normal Killing 2m17s (x2 over 2m57s) kubelet, admin-kcp-primary-0 Container registry-server failed liveness probe, will be restarted
Warning Unhealthy 2m17s (x6 over 3m17s) kubelet, admin-kcp-primary-0 Liveness probe failed: timeout: failed to connect service "localhost:50051" within 1s
Normal Pulling 2m16s (x3 over 3m55s) kubelet, admin-kcp-primary-0 Pulling image "quay.io/operator-framework/upstream-community-operators:latest"
Normal Pulled 2m15s (x3 over 3m51s) kubelet, admin-kcp-primary-0 Successfully pulled image "quay.io/operator-framework/upstream-community-operators:latest"
Normal Created 2m15s (x3 over 3m50s) kubelet, admin-kcp-primary-0 Created container registry-server