title | description | services | ms.topic | ms.custom | ms.date |
---|---|---|---|---|---|
Upgrade an Azure Kubernetes Service (AKS) cluster |
Learn how to upgrade an Azure Kubernetes Service (AKS) cluster to get the latest features and security updates. |
container-service |
article |
event-tier1-build-2022 |
12/17/2020 |
Part of the AKS cluster lifecycle involves performing periodic upgrades to the latest Kubernetes version. It’s important you apply the latest security releases, or upgrade to get the latest features. This article shows you how to check for, configure, and apply upgrades to your AKS cluster.
For AKS clusters that use multiple node pools or Windows Server nodes, see Upgrade a node pool in AKS.
This article requires Azure CLI version 2.0.65 or later. Run az --version
to find the version. If you need to install or upgrade, see Install Azure CLI.
Warning
An AKS cluster upgrade triggers a cordon and drain of your nodes. If you have a low compute quota available, the upgrade may fail. For more information, see increase quotas
To check which Kubernetes releases are available for your cluster, use the az aks get-upgrades command. The following example checks for available upgrades to myAKSCluster in myResourceGroup:
az aks get-upgrades --resource-group myResourceGroup --name myAKSCluster --output table
Note
When you upgrade a supported AKS cluster, Kubernetes minor versions can't be skipped. All upgrades must be performed sequentially by major version number. For example, upgrades between 1.14.x -> 1.15.x or 1.15.x -> 1.16.x are allowed, however 1.14.x -> 1.16.x is not allowed.
Skipping multiple versions can only be done when upgrading from an unsupported version back to a supported version. For example, an upgrade from an unsupported 1.10.x -> a supported 1.15.x can be completed if available.
The following example output shows that the cluster can be upgraded to versions 1.19.1 and 1.19.3:
Name ResourceGroup MasterVersion Upgrades
------- --------------- --------------- --------------
default myResourceGroup 1.18.10 1.19.1, 1.19.3
The following output shows that no upgrades are available:
ERROR: Table output unavailable. Use the --query option to specify an appropriate query. Use --debug for more info.
Important
If no upgrade is available, create a new cluster with a supported version of Kubernetes and migrate your workloads from the existing cluster to the new cluster. Attempting to upgrade a cluster to a newer Kubernetes version when az aks get-upgrades
shows no upgrades available is not supported.
Important
Node surges require subscription quota for the requested max surge count for each upgrade operation. For example, a cluster that has 5 node pools, each with a count of 4 nodes, has a total of 20 nodes. If each node pool has a max surge value of 50%, additional compute and IP quota of 10 nodes (2 nodes * 5 pools) is required to complete the upgrade.
If using Azure CNI, validate there are available IPs in the subnet as well to satisfy IP requirements of Azure CNI.
By default, AKS configures upgrades to surge with one extra node. A default value of one for the max surge settings will enable AKS to minimize workload disruption by creating an extra node before the cordon/drain of existing applications to replace an older versioned node. The max surge value may be customized per node pool to enable a trade-off between upgrade speed and upgrade disruption. By increasing the max surge value, the upgrade process completes faster, but setting a large value for max surge may cause disruptions during the upgrade process.
For example, a max surge value of 100% provides the fastest possible upgrade process (doubling the node count) but also causes all nodes in the node pool to be drained simultaneously. You may wish to use a higher value such as this for testing environments. For production node pools, we recommend a max_surge setting of 33%.
AKS accepts both integer values and a percentage value for max surge. An integer such as "5" indicates five extra nodes to surge. A value of "50%" indicates a surge value of half the current node count in the pool. Max surge percent values can be a minimum of 1% and a maximum of 100%. A percent value is rounded up to the nearest node count. If the max surge value is lower than the current node count at the time of upgrade, the current node count is used for the max surge value.
During an upgrade, the max surge value can be a minimum of 1 and a maximum value equal to the number of nodes in your node pool. You can set larger values, but the maximum number of nodes used for max surge won't be higher than the number of nodes in the pool at the time of upgrade.
Important
The max surge setting on a node pool is persistent. Subsequent Kubernetes upgrades or node version upgrades will use this setting. You may change the max surge value for your node pools at any time. For production node pools, we recommend a max-surge setting of 33%.
Use the following commands to set max surge values for new or existing node pools.
# Set max surge for a new node pool
az aks nodepool add -n mynodepool -g MyResourceGroup --cluster-name MyManagedCluster --max-surge 33%
# Update max surge for an existing node pool
az aks nodepool update -n mynodepool -g MyResourceGroup --cluster-name MyManagedCluster --max-surge 5
With a list of available versions for your AKS cluster, use the az aks upgrade command to upgrade. During the upgrade process, AKS will:
- add a new buffer node (or as many nodes as configured in max surge) to the cluster that runs the specified Kubernetes version.
- cordon and drain one of the old nodes to minimize disruption to running applications (if you're using max surge it will cordon and drain as many nodes at the same time as the number of buffer nodes specified).
- When the old node is fully drained, it will be reimaged to receive the new version and it will become the buffer node for the following node to be upgraded.
- This process repeats until all nodes in the cluster have been upgraded.
- At the end of the process, the last buffer node will be deleted, maintaining the existing agent node count and zone balance.
[!INCLUDE alias minor version callout]
az aks upgrade \
--resource-group myResourceGroup \
--name myAKSCluster \
--kubernetes-version KUBERNETES_VERSION
It takes a few minutes to upgrade the cluster, depending on how many nodes you have.
Important
Ensure that any PodDisruptionBudgets
(PDBs) allow for at least 1 pod replica to be moved at a time otherwise the drain/evict operation will fail.
If the drain operation fails, the upgrade operation will fail by design to ensure that the applications are not disrupted. Please correct what caused the operation to stop (incorrect PDBs, lack of quota, and so on) and re-try the operation.
To confirm that the upgrade was successful, use the az aks show command:
az aks show --resource-group myResourceGroup --name myAKSCluster --output table
The following example output shows that the cluster now runs 1.18.10:
Name Location ResourceGroup KubernetesVersion ProvisioningState Fqdn
------------ ---------- --------------- ------------------- ------------------- ----------------------------------------------
myAKSCluster eastus myResourceGroup 1.18.10 Succeeded myakscluster-dns-379cbbb9.hcp.eastus.azmk8s.io
When you upgrade your cluster, the following Kubenetes events may occur on each node:
- Surge – Create surge node.
- Drain – Pods are being evicted from the node. Each pod has a 30-minute timeout to complete the eviction.
- Update – Update of a node has succeeded or failed.
- Delete – Deleted a surge node.
Use kubectl get events
to show events in the default namespaces while running an upgrade. For example:
kubectl get events
The following example output shows some of the above events listed during an upgrade.
...
default 2m1s Normal Drain node/aks-nodepool1-96663640-vmss000001 Draining node: [aks-nodepool1-96663640-vmss000001]
...
default 9m22s Normal Surge node/aks-nodepool1-96663640-vmss000002 Created a surge node [aks-nodepool1-96663640-vmss000002 nodepool1] for agentpool %!s(MISSING)
...
In addition to manually upgrading a cluster, you can set an auto-upgrade channel on your cluster. The following upgrade channels are available:
Channel | Action | Example |
---|---|---|
none |
disables auto-upgrades and keeps the cluster at its current version of Kubernetes | Default setting if left unchanged |
patch |
automatically upgrade the cluster to the latest supported patch version when it becomes available while keeping the minor version the same. | For example, if a cluster is running version 1.17.7 and versions 1.17.9, 1.18.4, 1.18.6, and 1.19.1 are available, your cluster is upgraded to 1.17.9 |
stable |
automatically upgrade the cluster to the latest supported patch release on minor version N-1, where N is the latest supported minor version. | For example, if a cluster is running version 1.17.7 and versions 1.17.9, 1.18.4, 1.18.6, and 1.19.1 are available, your cluster is upgraded to 1.18.6. |
rapid |
automatically upgrade the cluster to the latest supported patch release on the latest supported minor version. | In cases where the cluster is at a version of Kubernetes that is at an N-2 minor version where N is the latest supported minor version, the cluster first upgrades to the latest supported patch version on N-1 minor version. For example, if a cluster is running version 1.17.7 and versions 1.17.9, 1.18.4, 1.18.6, and 1.19.1 are available, your cluster first is upgraded to 1.18.6, then is upgraded to 1.19.1. |
node-image |
automatically upgrade the node image to the latest version available. | Microsoft provides patches and new images for image nodes frequently (usually weekly), but your running nodes won't get the new images unless you do a node image upgrade. Turning on the node-image channel will automatically update your node images whenever a new version is available. |
Note
Cluster auto-upgrade only updates to GA versions of Kubernetes and will not update to preview versions.
Automatically upgrading a cluster follows the same process as manually upgrading a cluster. For more information, see Upgrade an AKS cluster.
To set the auto-upgrade channel when creating a cluster, use the auto-upgrade-channel parameter, similar to the following example.
az aks create --resource-group myResourceGroup --name myAKSCluster --auto-upgrade-channel stable --generate-ssh-keys
To set the auto-upgrade channel on existing cluster, update the auto-upgrade-channel parameter, similar to the following example.
az aks update --resource-group myResourceGroup --name myAKSCluster --auto-upgrade-channel stable
If you’re using Planned Maintenance and Auto-Upgrade, your upgrade will start during your specified maintenance window. For more information on Planned Maintenance, see Use Planned Maintenance to schedule maintenance windows for your Azure Kubernetes Service (AKS) cluster (preview).
AKS uses best-effort zone balancing in node groups. During an Upgrade surge, zone(s) for the surge node(s) in virtual machine scale sets is unknown ahead of time. This can temporarily cause an unbalanced zone configuration during an upgrade. However, AKS deletes the surge node(s) once the upgrade has been completed and preserves the original zone balance. If you desire to keep your zones balanced during upgrade, increase the surge to a multiple of three nodes. Virtual machine scale sets will then balance your nodes across Availability Zones with best-effort zone balancing.
If you have PVCs backed by Azure LRS Disks, they’ll be bound to a particular zone and may fail to recover immediately if the surge node doesn’t match the zone of the PVC. This could cause downtime on your application when the Upgrade operation continues to drain nodes but the PVs are bound to a zone. To handle this case and maintain high availability, configure a Pod Disruption Budget on your application. This allows Kubernetes to respect your availability requirements during Upgrade's drain operation.
This article showed you how to upgrade an existing AKS cluster. To learn more about deploying and managing AKS clusters, see the set of tutorials.
[!div class="nextstepaction"] AKS tutorials