Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bitnami/redis] New installs can get stuck #8662

Closed
jtslear opened this issue Jan 13, 2022 · 3 comments
Closed

[bitnami/redis] New installs can get stuck #8662

jtslear opened this issue Jan 13, 2022 · 3 comments

Comments

@jtslear
Copy link
Contributor

jtslear commented Jan 13, 2022

Which chart:

https://github.com/bitnami/charts/tree/master/bitnami/redis

Version: 15.7.5

Introduced in: #8641

Describe the bug

Often, new installations of Redis with Sentinel will fail due to an initialization failure with Redis during start up. #8641 introduced a retry which is performing its job, but now we are running into the upper limits provided by the default chart for the liveness probe leading to an occasional failure for the Redis deployment to initialize. This is leading to the Statefulset getting hung up on bringing online the first pod as either a Primary or Replica. If the right set of conditions occur, we don't see this, however, often this leads to a nearly infinite loop as that first Pod cannot come online.

If we are lucky, the Endpoints are populated at the VERY last second, allowing our deployment to come online. Though Kubernetes intervenes and removes that first Pod due to hitting the limits of the liveness probe.

I suspect this may also impact CI testing of this helm chart.

Workaround!

Bump the liveness initial delay to something higher than default, Example:

replica
  livenessProbe:
    initialDelaySeconds: 60

To Reproduce

Steps to reproduce the behavior:

  1. helm upgrade a bitnami/redis --install --set auth.password=hunter5 --set sentinel.enabled=true
  2. View the Redis container logs of the first Pod and observe one of the following:
    • Connection refused repeated until the Pod is killed
    • Connection refused and then Redis starts immediately followed by Kubernetes killing the Pod - resulting in this process starting over

Eventually we land in a CrashLoopBackoff.

Expected behavior

A Redis Deployment shall have zero Pod restarts when a Redis Deployment is being first created.

Additional context

The retry_while function is using the default parameters Reference: https://github.com/bitnami/bitnami-docker-redis/blob/3fbfed26472877c478ef24436d3face461bf1d36/6.2/debian-10/prebuildfs/opt/bitnami/scripts/libos.sh

This means that for this one call get_sentinel_master_info, we'll wait upwards of 60 seconds (5 seconds between each try, repeat 12 times) before the retry finally fails resulting in us being able to continue with the initialization of the start script providing in a healthy Pod. However, Kubernetes kills the Pod after only 45 seconds with a default configuration. This means that for the time we are waiting for Redis to become ready, that way it can be added to the Endpoints object to respond to itself, we're killing the Pod unnecessarily.


cc @ricosega and @carrodher as Author and Acceptor of what I suspect is root cause.

@ricosega
Copy link
Contributor

ricosega commented Jan 17, 2022

@jtslear , I found the problem.
At the moment I created the PR #8641 this part of the code added in PR #8563 was not merged in the start-node.sh script:
Link to file blame

    # check if there is a master
    get_sentinel_master_info
    redisRetVal=$?
    if [[ $redisRetVal -ne 0 ]]; then
        # there is no master yet, master by default

So it will never end.
I am checking how to make both changes valid

@javsalgar
Copy link
Contributor

Thanks for letting us know! We will check the PR

@ricosega
Copy link
Contributor

PR #8641 has been reverted so this issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants