Kubelet-Controller Manager communication Profiles (WorkerLatencyProfiles)

mburke5678 · mburke5678 · commit aaf9bf776e41 · 2022-07-19T13:06:22.000-04:00
diff --git a/_topic_maps/_topic_map.yml b/_topic_maps/_topic_map.yml
@@ -2030,6 +2030,9 @@ Topics:
   - Name: Enabling features using FeatureGates
     File: nodes-cluster-enabling-features
     Distros: openshift-enterprise,openshift-origin
+  - Name: Improving cluster stability in high latency environments using worker latency profiles 
+    File: nodes-cluster-worker-latency-profiles
+    Distros: openshift-enterprise,openshift-origin
 - Name: Remote worker nodes on the network edge
   Dir: edge
   Distros: openshift-enterprise
diff --git a/modules/nodes-cluster-worker-latency-profiles-about.adoc b/modules/nodes-cluster-worker-latency-profiles-about.adoc
@@ -0,0 +1,113 @@
+// Module included in the following assemblies:
+//
+// * nodes/clusters/nodes-cluster-worker-latency-profiles
+// * nodes/edge/nodes-edge-remote-workers. ??
+// * post_installation_configuration/cluster-tasks ??
+
+:_content-type: CONCEPT
+[id="nodes-cluster-worker-latency-profiles-about_{context}"]
+= Understanding worker latency profiles
+
+Worker latency profiles are multiple sets of carefully-tuned values for the `node-status-update-frequency`, `node-monitor-grace-period`, `default-not-ready-toleration-seconds` and `default-unreachable-toleration-seconds` parameters. These parameters let you control the reaction of the cluster to latency issues without needing to determine the best values manually.
+
+All worker latency profiles configure the following parameters:
+
+--
+* `node-status-update-frequency`. Specifies the amount of time in seconds that a kubelet updates its status to the Kubernetes Controller Manager Operator.
+*  `node-monitor-grace-period`.  Specifies the amount of time in seconds that the Kubernetes Controller Manager Operator waits for an update from a kubelet before marking the node unhealthy and adding the `node.kubernetes.io/not-ready` or `node.kubernetes.io/unreachable` taint to the node.
+* `default-not-ready-toleration-seconds`. Specifies the amount of time in seconds after marking a node unhealthy that the Kubernetes Controller Manager Operator waits before evicting pods from that node. 
+* `default-unreachable-toleration-seconds`. Specifies the amount of time in seconds after marking a node unreachable that the Kubernetes Controller Manager Operator waits before evicting pods from that node.
+--
+
+[IMPORTANT]
+====
+Manually modifying the `node-monitor-grace-period` parameter is not supported.
+====
+
+While the default configuration works in most cases, {product-title} offers two other worker latency profiles for situations where the network is experiencing higher latency than usual. The three worker latency profiles are described in the following sections:
+
+Default worker latency profile:: With the `Default` profile, each kubelet reports its node status to the Kubelet Controller Manager Operator (kube controller) every 10 seconds. The Kubelet Controller Manager Operator checks the kubelet for a status every 5 seconds. 
++
+The Kubernetes Controller Manager Operator waits 40 seconds for a status update before considering that node unhealthy. It marks the node with the `node.kubernetes.io/not-ready` or `node.kubernetes.io/unreachable` taint and evicts the pods on that node. If a pod on that node has the `NoExecute` toleration, the pod gets evicted in 300 seconds. If the pod has the `tolerationSeconds` parameter, the eviction waits for the period specified by that parameter.
++
+[cols="2,1,2,1"]
+|===
+| Profile | Component | Parameter | Value
+
+.4+| Default
+| kubelet
+| `node-status-update-frequency`
+| 10s
+
+| Kubelet Controller Manager
+| `node-monitor-grace-period`
+| 40s
+
+| Kubernetes API Server
+| `default-not-ready-toleration-seconds`
+| 300s
+
+| Kubernetes API Server
+| `default-unreachable-toleration-seconds`
+| 300s
+
+|===
+
+Medium worker latency profile:: Use the `MediumUpdateAverageReaction` profile if the network latency is slightly higher than usual.
++
+The `MediumUpdateAverageReaction` profile reduces the frequency of kubelet updates to 20 seconds and changes the period that the Kubernetes Controller Manager Operator waits for those updates to 2 minutes. The pod eviction period for a pod on that node is reduced to 60 seconds. If the pod has the `tolerationSeconds` parameter, the eviction waits for the period specified by that parameter.
++
+The Kubernetes Controller Manager Operator waits for 2 minutes to consider a node unhealthy. In another minute, the eviction process starts.
++
+[cols="2,1,2,1"]
+|===
+| Profile | Component | Parameter | Value
+
+.4+| MediumUpdateAverageReaction
+| kubelet
+| `node-status-update-frequency`
+| 20s
+
+| Kubelet Controller Manager
+| `node-monitor-grace-period`
+| 2m
+
+| Kubernetes API Server
+| `default-not-ready-toleration-seconds`
+| 60s
+
+| Kubernetes API Server
+| `default-unreachable-toleration-seconds`
+| 60s
+
+|===
+
+Low worker latency profile:: Use the `LowUpdateSlowReaction` profile if the network latency is extremely high.
++
+The `LowUpdateSlowReaction` profile reduces the frequency of kubelet updates to 1 minute and changes the period that the Kubernetes Controller Manager Operator waits for those updates to 5 minutes. The pod eviction period for a pod on that node is reduced to 60 seconds. If the pod has the `tolerationSeconds` parameter, the eviction waits for the period specified by that parameter.
++
+The Kubernetes Controller Manager Operator waits for 5 minutes to consider a node unhealthy. In another minute, the eviction process starts.
++
+[cols="2,1,2,1"]
+|===
+| Profile | Component | Parameter | Value
+
+.4+| LowUpdateSlowReaction  
+| kubelet
+| `node-status-update-frequency`
+| 1m
+
+| Kubelet Controller Manager
+| `node-monitor-grace-period`
+| 5m
+
+| Kubernetes API Server
+| `default-not-ready-toleration-seconds`
+| 60s
+
+| Kubernetes API Server
+| `default-unreachable-toleration-seconds`
+| 60s
+
+|===
+
diff --git a/modules/nodes-cluster-worker-latency-profiles-using.adoc b/modules/nodes-cluster-worker-latency-profiles-using.adoc
@@ -0,0 +1,138 @@
+// Module included in the following assemblies:
+//
+// * nodes/clusters/nodes-cluster-worker-latency-profiles
+// * Need to determine if these are good locations: 
+// * nodes/edge/nodes-edge-remote-workers
+// * post_installation_configuration/cluster-tasks
+
+:_content-type: PROCEDURE
+[id="nodes-cluster-worker-latency-profiles-using_{context}"]
+= Using worker latency profiles
+
+To implement a worker latency profile to deal with network latency, edit the `node.config` object to add the name of the profile. You can change the profile at any time as latency increases or decreases. 
+
+You must move one worker latency profile at a time. For example, you cannot move directly from the `Default` profile to the `LowUpdateSlowReaction` worker latency profile. You must move from the `default` worker latency profile to the `MediumUpdateAverageReaction` profile first, then to `LowUpdateSlowReaction`. Similarly, when returning to the default profile, you must move from the low profile to the medium profile first, then to the default.
+
+[NOTE]
+====
+You can also configure worker latency profiles upon installing an {product-title} cluster.
+====
+
+.Procedure
+
+To move from the default worker latency profile:
+
+. Move to the medium worker latency profile:
+
+.. Edit the `node.config` object:
++
+[source,terminal]
+----
+$ oc edit nodes.config/cluster
+----
+
+.. Add `spec.workerLatencyProfile: MediumUpdateAverageReaction`:
++
+.Example `node.config` object
+[source,yaml]
+----
+apiVersion: config.openshift.io/v1
+kind: Node
+metadata:
+  annotations:
+    include.release.openshift.io/ibm-cloud-managed: "true"
+    include.release.openshift.io/self-managed-high-availability: "true"
+    include.release.openshift.io/single-node-developer: "true"
+    release.openshift.io/create-only: "true"
+  creationTimestamp: "2022-07-08T16:02:51Z"
+  generation: 1
+  name: cluster
+  ownerReferences:
+  - apiVersion: config.openshift.io/v1
+    kind: ClusterVersion
+    name: version
+    uid: 36282574-bf9f-409e-a6cd-3032939293eb
+  resourceVersion: "1865"
+  uid: 0c0f7a4c-4307-4187-b591-6155695ac85b
+spec:
+  workerLatencyProfile: MediumUpdateAverageReaction <1>
+
+ ...
+----
+<1> Specifies the medium worker latency policy.
++
+Scheduling on each worker node is disabled as the change is being applied.
++
+When all nodes return to the `Ready` condition, you can use the following command to look in the Kubernetes Controller Manager to ensure it was applied:
++
+[source,terminal]
+----
+$ oc get KubeControllerManager -o yaml | grep -i workerlatency -A 5 -B 5
+----
++
+.Example output
+[source,terminal]
+----
+ ...
+    - lastTransitionTime: "2022-07-11T19:47:10Z"
+      reason: ProfileUpdated
+      status: "False"
+      type: WorkerLatencyProfileProgressing
+    - lastTransitionTime: "2022-07-11T19:47:10Z" <1>
+      message: all static pod revision(s) have updated latency profile
+      reason: ProfileUpdated
+      status: "True"
+      type: WorkerLatencyProfileComplete
+    - lastTransitionTime: "2022-07-11T19:20:11Z"
+      reason: AsExpected
+      status: "False"
+      type: WorkerLatencyProfileDegraded
+    - lastTransitionTime: "2022-07-11T19:20:36Z"
+      status: "False"
+ ...
+----
+<1> Specifies that the profile is applied and active.
+
+. Optional: Move to the low worker latency profile:
+
+.. Edit the `node.config` object:
++
+[source,terminal]
+----
+$ oc edit nodes.config/cluster
+----
+
+.. Change the `spec.workerLatencyProfile` value to `LowUpdateSlowReaction`:
++
+.Example `node.config` object
+[source,yaml]
+----
+apiVersion: config.openshift.io/v1
+kind: Node
+metadata:
+  annotations:
+    include.release.openshift.io/ibm-cloud-managed: "true"
+    include.release.openshift.io/self-managed-high-availability: "true"
+    include.release.openshift.io/single-node-developer: "true"
+    release.openshift.io/create-only: "true"
+  creationTimestamp: "2022-07-08T16:02:51Z"
+  generation: 1
+  name: cluster
+  ownerReferences:
+  - apiVersion: config.openshift.io/v1
+    kind: ClusterVersion
+    name: version
+    uid: 36282574-bf9f-409e-a6cd-3032939293eb
+  resourceVersion: "1865"
+  uid: 0c0f7a4c-4307-4187-b591-6155695ac85b
+spec:
+  workerLatencyProfile: LowUpdateSlowReaction <1>
+
+ ...
+----
+<1> Specifies to use the low worker latency policy.
++
+Scheduling on each worker node is disabled as the change is being applied.
+
+To change the low profile to medium or change the medium to low, edit the `node.config` object and set the `spec.workerLatencyProfile` parameter to the appropriate value.
+
diff --git a/nodes/clusters/nodes-cluster-worker-latency-profiles.adoc b/nodes/clusters/nodes-cluster-worker-latency-profiles.adoc
@@ -0,0 +1,26 @@
+:_content-type: ASSEMBLY
+:context: nodes-cluster-worker-latency-profiles
+[id="nodes-cluster-worker-latency-profiles"]
+= Improving cluster stability in high latency environments using worker latency profiles  
+include::_attributes/common-attributes.adoc[]
+
+toc::[]
+
+
+
+// The following include statements pull in the module files that comprise
+// the assembly. Include any combination of concept, procedure, or reference
+// modules required to cover the user story. You can also include other
+// assemblies.
+
+
+include::snippets/worker-latency-profile-intro.adoc[]
+
+These worker latency profiles are three sets of parameters that are pre-defined with carefully tuned values that control the reaction of the cluster to latency issues  without needing to determine the best values manually.  
+
+You can configure worker latency profiles when installing a cluster or at any time you notice increased latency in your cluster network.
+
+include::modules/nodes-cluster-worker-latency-profiles-about.adoc[leveloffset=+1]
+
+include::modules/nodes-cluster-worker-latency-profiles-using.adoc[leveloffset=+1]
+
diff --git a/post_installation_configuration/cluster-tasks.adoc b/post_installation_configuration/cluster-tasks.adoc
@@ -503,6 +503,15 @@ include::modules/machineset-delete-policy.adoc[leveloffset=+2]
 
 include::modules/nodes-scheduler-node-selectors-cluster.adoc[leveloffset=+2]
 
+[id="post-worker-latency-profiles"]
+== Improving cluster stability in high latency environments using worker latency profiles 
+
+include::snippets/worker-latency-profile-intro.adoc[]
+
+include::modules/nodes-cluster-worker-latency-profiles-about.adoc[leveloffset=+2]
+
+include::modules/nodes-cluster-worker-latency-profiles-using.adoc[leveloffset=+2]
+
 [id="post-install-creating-infrastructure-machinesets-production"]
 == Creating infrastructure machine sets for production environments
 
diff --git a/snippets/worker-latency-profile-intro.adoc b/snippets/worker-latency-profile-intro.adoc
@@ -0,0 +1,22 @@
+// Text snippet included in the following modules:
+//
+// * nodes/clusters/nodes-cluster-worker-latency-profiles
+// * nodes/edge/nodes-edge-remote-workers
+// * post_installation_configuration/cluster-tasks
+
+
+:_content-type: SNIPPET
+
+All nodes send heartbeats to the Kubernetes Controller Manager Operator (kube controller) in the {product-title} cluster every 10 seconds, by default. If the cluster does not receive heartbeats from a node, {product-title} responds using several default mechanisms. 
+
+For example, if the Kubernetes Controller Manager Operator loses contact with a node after a configured period: 
+
+. The node controller on the control plane updates the node health to `Unhealthy` and marks the node `Ready` condition as `Unknown`. 
+
+. In response, the scheduler stops scheduling pods to that node. 
+
+. The on-premise node controller adds a `node.kubernetes.io/unreachable` taint with a `NoExecute` effect to the node and schedules any pods on the node for eviction after five minutes, by default.
+
+This behavior can cause problems if your network is prone to latency issues, especially if you have nodes at the network edge. In some cases, the Kubernetes Controller Manager Operator might not receive an update from a healthy node due to network latency. The Kubernetes Controller Manager Operator would then evict pods from the node even though the node is healthy. To avoid this problem, you can use _worker latency profiles_ to adjust the frequency that the kubelet and the Kubernetes Controller Manager Operator wait for status updates before taking action. These adjustments help to ensure that your cluster runs properly in the event that network latency between the control plane and the worker nodes is not optimal.
+
+These worker latency profiles are three sets of parameters that are pre-defined with carefully tuned values that let you control the reaction of the cluster to latency issues  without needing to determine the best values manually.