Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] redis cluster create error shardings.shard.phase Unsupported value: Abnormal #8941

Open
JashBook opened this issue Feb 18, 2025 · 4 comments · May be fixed by #9010
Open

[BUG] redis cluster create error shardings.shard.phase Unsupported value: Abnormal #8941

JashBook opened this issue Feb 18, 2025 · 4 comments · May be fixed by #9010
Assignees
Labels
kind/bug Something isn't working
Milestone

Comments

@JashBook
Copy link
Collaborator

Describe the bug

 kbcli version
Kubernetes: v1.28.3-vke.17
KubeBlocks: 1.0.0-beta.28
kbcli: 1.0.0-beta.13

To Reproduce
Steps to reproduce the behavior:

  1. create cluster
apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
  name: rcluster-ztjeza
  namespace: default
spec:
  terminationPolicy: Delete
  shardings:
    - name: shard
      shards: 3
      template:
        name: redis
        componentDef: redis-cluster-7-1.0.0-alpha.0
        serviceVersion: 7.2.4
        replicas: 2
        resources:
          limits:
            cpu: 100m
            memory: 0.5Gi
          requests:
            cpu: 100m
            memory: 0.5Gi
        volumeClaimTemplates:
          - name: data
            spec:
              accessModes:
                - ReadWriteOnce
              resources:
                 requests:
                  storage: 20Gi
  1. See error
➜  ~ kubectl get cluster rcluster-ztjeza
NAME              CLUSTER-DEFINITION   TERMINATION-POLICY   STATUS     AGE
rcluster-ztjeza                        Delete               Creating   7m19s
➜  ~ 
➜  ~ kbcli cluster list-instances rcluster-ztjeza
NAME                          NAMESPACE   CLUSTER           COMPONENT          STATUS    ROLE     ACCESSMODE   AZ              CPU(REQUEST/LIMIT)   MEMORY(REQUEST/LIMIT)   STORAGE     NODE                      CREATED-TIME                 
rcluster-ztjeza-shard-7m7-0   default     rcluster-ztjeza   shard(shard-7m7)   Running   <none>                cn-shanghai-a   100m / 100m          512Mi / 512Mi           data:20Gi   172.16.0.13/172.16.0.13   Feb 18,2025 15:17 UTC+0800   
rcluster-ztjeza-shard-7m7-1   default     rcluster-ztjeza   shard(shard-7m7)   Running   <none>                cn-shanghai-a   100m / 100m          512Mi / 512Mi           data:20Gi   172.16.0.33/172.16.0.33   Feb 18,2025 15:18 UTC+0800   
rcluster-ztjeza-shard-mqk-0   default     rcluster-ztjeza   shard(shard-mqk)   Running   <none>                cn-shanghai-a   100m / 100m          512Mi / 512Mi           data:20Gi   172.16.0.13/172.16.0.13   Feb 18,2025 15:17 UTC+0800   
rcluster-ztjeza-shard-mqk-1   default     rcluster-ztjeza   shard(shard-mqk)   Running   <none>                cn-shanghai-a   100m / 100m          512Mi / 512Mi           data:20Gi   172.16.0.33/172.16.0.33   Feb 18,2025 15:17 UTC+0800   
rcluster-ztjeza-shard-xjx-0   default     rcluster-ztjeza   shard(shard-xjx)   Running   <none>                cn-shanghai-a   100m / 100m          512Mi / 512Mi           data:20Gi   172.16.0.13/172.16.0.13   Feb 18,2025 15:17 UTC+0800   
rcluster-ztjeza-shard-xjx-1   default     rcluster-ztjeza   shard(shard-xjx)   Running   <none>                cn-shanghai-a   100m / 100m          512Mi / 512Mi           data:20Gi   172.16.0.30/172.16.0.30   Feb 18,2025 15:17 UTC+0800  

describe cluster

kubectl describe cluster rcluster-ztjeza 
Name:         rcluster-ztjeza
Namespace:    default
Labels:       <none>
Annotations:  kubeblocks.io/crd-api-version: apps.kubeblocks.io/v1
API Version:  apps.kubeblocks.io/v1
Kind:         Cluster
Metadata:
  Creation Timestamp:  2025-02-18T07:17:21Z
  Finalizers:
    cluster.kubeblocks.io/finalizer
  Generation:        1
  Resource Version:  46238772
  UID:               d0230517-bac0-4455-8ef2-8faa90e4f66e
Spec:
  Shardings:
    Name:    shard
    Shards:  3
    Template:
      Component Def:  redis-cluster-7-1.0.0-alpha.0
      Name:           redis
      Replicas:       2
      Resources:
        Limits:
          Cpu:     100m
          Memory:  0.5Gi
        Requests:
          Cpu:          100m
          Memory:       0.5Gi
      Service Version:  7.2.4
      Volume Claim Templates:
        Name:  data
        Spec:
          Access Modes:
            ReadWriteOnce
          Resources:
            Requests:
              Storage:  20Gi
  Termination Policy:   Delete
Status:
  Conditions:
    Last Transition Time:  2025-02-18T07:19:38Z
    Message:               Cluster.apps.kubeblocks.io "rcluster-ztjeza" is invalid: shardings.shard.phase: Unsupported value: "Abnormal": supported values: "Creating", "Deleting", "Updating", "Stopping", "Starting", "Running", "Stopped", "Failed"
    Reason:                ApplyResourcesFailed
    Status:                False
    Type:                  ApplyResources
    Last Transition Time:  2025-02-18T07:17:22Z
    Message:               The operator has started the provisioning of Cluster: rcluster-ztjeza
    Observed Generation:   1
    Reason:                PreCheckSucceed
    Status:                True
    Type:                  ProvisioningStarted
  Observed Generation:     1
  Phase:                   Creating
  Shardings:
    Shard:
      Phase:  Creating
Events:
  Type     Reason                           Age                     From                Message
  ----     ------                           ----                    ----                -------
  Warning  Warning                          7m36s                   cluster-controller  BackupPolicy.dataprotection.kubeblocks.io "rcluster-ztjeza-shard-backup-policy" is invalid: spec: Invalid value: "object": either spec.target or spec.targets
  Normal   ApplyResourcesSucceed            7m36s                   cluster-controller  Successfully applied for resources
  Normal   PreCheckSucceed                  7m36s                   cluster-controller  The operator has started the provisioning of Cluster: rcluster-ztjeza
  Normal   ClusterComponentPhaseTransition  5m59s (x22 over 7m36s)  cluster-controller  cluster sharding shard is Creating
  Normal   ClusterComponentPhaseTransition  27s (x19 over 5m20s)    cluster-controller  cluster sharding shard is Abnormal

logs kubeblocks

2025-02-18T07:21:47.755Z	ERROR	STATUS *v1.Cluster error	{"controller": "cluster", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Cluster", "Cluster": {"name":"rcluster-ztjeza","namespace":"default"}, "namespace": "default", "name": "rcluster-ztjeza", "reconcileID": "fd0f82b4-f8f2-48de-b290-2703f395539e", "cluster": {"name":"rcluster-ztjeza","namespace":"default"}, "error": "Cluster.apps.kubeblocks.io \"rcluster-ztjeza\" is invalid: shardings.shard.phase: Unsupported value: \"Abnormal\": supported values: \"Creating\", \"Deleting\", \"Updating\", \"Stopping\", \"Starting\", \"Running\", \"Stopped\", \"Failed\""}
2025-02-18T07:21:47.763Z	INFO	Cluster.apps.kubeblocks.io "rcluster-ztjeza" is invalid: shardings.shard.phase: Unsupported value: "Abnormal": supported values: "Creating", "Deleting", "Updating", "Stopping", "Starting", "Running", "Stopped", "Failed"	{"controller": "cluster", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Cluster", "Cluster": {"name":"rcluster-ztjeza","namespace":"default"}, "namespace": "default", "name": "rcluster-ztjeza", "reconcileID": "fd0f82b4-f8f2-48de-b290-2703f395539e", "cluster": {"name":"rcluster-ztjeza","namespace":"default"}}
2025-02-18T07:21:47.763Z	ERROR	Reconciler error	{"controller": "cluster", "controllerGroup": "apps.kubeblocks.io", "controllerKind": "Cluster", "Cluster": {"name":"rcluster-ztjeza","namespace":"default"}, "namespace": "default", "name": "rcluster-ztjeza", "reconcileID": "fd0f82b4-f8f2-48de-b290-2703f395539e", "error": "Cluster.apps.kubeblocks.io \"rcluster-ztjeza\" is invalid: shardings.shard.phase: Unsupported value: \"Abnormal\": supported values: \"Creating\", \"Deleting\", \"Updating\", \"Stopping\", \"Starting\", \"Running\", \"Stopped\", \"Failed\""}

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@Y-Rookie
Copy link
Collaborator

Y-Rookie commented Feb 20, 2025

roleProbe send event failed because Unauthorized, so pod has no role, and component & cluster status failed

0/probe"}
2025-02-20T08:21:43Z	INFO	register service to server	{"service": "Streaming", "method": "POST", "uri": "/v1.0/streaming"}
2025-02-20T08:21:43Z	INFO	probe started	{"probe": "roleProbe", "config": {"instance":"rcluster-tuned-shard-4jt","action":"roleProbe","periodSeconds":1}}
2025-02-20T08:21:43Z	INFO	starting the streaming server
2025-02-20T08:21:44Z	INFO	send probe event	{"probe": "roleProbe", "probe": "roleProbe", "code": 0, "output": "secondary", "message": ""}
2025-02-20T08:26:44Z	ERROR	failed to send event	{"probe": "roleProbe", "reason": "roleProbe", "message": "{\"instance\":\"rcluster-tuned-shard-4jt\",\"probe\":\"roleProbe\",\"code\":0,\"output\":\"c2Vjb25kYXJ5\"}", "error": "failed to handle event after 30 attempts: Unauthorized", "errorVerbose": "Unauthorized\nfailed to handle event after 30 attempts\ngithub.com/apecloud/kubeblocks/pkg/kbagent/util.createOrUpdateEvent\n\t/src/pkg/kbagent/util/event.go:127\ngithub.com/apecloud/kubeblocks/pkg/kbagent/util.SendEventWithMessage.func1\n\t/src/pkg/kbagent/util/event.go:47\ngithub.com/apecloud/kubeblocks/pkg/kbagent/util.SendEventWithMessage.func2\n\t/src/pkg/kbagent/util/event.go:59\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"}
github.com/apecloud/kubeblocks/pkg/kbagent/util.SendEventWithMessage.func1
	/src/pkg/kbagent/util/event.go:49
github.com/apecloud/kubeblocks/pkg/kbagent/util.SendEventWithMessage.func2
	/src/pkg/kbagent/util/event.go:59

@Y-Rookie
Copy link
Collaborator

Y-Rookie commented Feb 20, 2025

Image

Image

serviceAccount has been unexpected deleted. Under the latest implementation, ServiceAccounts and RoleBindings should be at the engine-cmpd level and should not be associated with any specific cluster. However, currently the labels on the ServiceAccounts and RoleBindings are linked to a non-existent cluster, which may be the reason why ServiceAccounts are being unexpectedly deleted.

@cjc7373
Copy link
Contributor

cjc7373 commented Feb 24, 2025

I can't reproduce this problem locally. I'll look into it if such problem happen again.

@cjc7373
Copy link
Contributor

cjc7373 commented Mar 5, 2025

I think maybe this is because of a stale read. Suppose the following scenario:

  1. comp1 and comp2 share one SA. comp1 owns SA.
  2. comp1 gets deleted, so it transfers SA ownership to comp2
  3. comp2 gets deleted. It lists its secondery resources (stale read). It doesn't know it owns SA now.
  4. k8s gc finds out comp2 is deleted, and tries to delete SA.
  5. SA stuck in deleting because of the finalizer.

I think it's fine to just remove the finalizer, since the function of a finalizer is to clean up resources the deleted object owned. But a SA doesn't own any other resource.

EDIT: if we remove the finalizer, there's still a chance that an ownership would be assigned to a deleted component, causing serviceaccount to recreate. In such cases, pod will still use old sa tokens until it expires.

@cjc7373 cjc7373 marked this as a duplicate of #8989 Mar 5, 2025
@cjc7373 cjc7373 linked a pull request Mar 6, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants