Hello Greg,
From what you’ve described, the behavior where new nodes are getting created but are missing the AKSLinuxExtension and not joining the cluster usually points to a failure during the node provisioning stage in AKS. This extension is a required part of the bootstrap process, and if it doesn’t install successfully, the node will not attach to the cluster, which is why scaling appears to fail.
To understand what’s going wrong, I’d suggest starting with a quick check on the VMSS instances to see whether the extension is missing, failed, or stuck:
az vmss list-instances --resource-group <rg> --vmss-name <vmss-name> --query "[].{id:instanceId,extState:resources[].provisioningState,extName:resources[].name}" -o table
It would also help to validate connectivity from one of the new instances. If the node is unable to reach the required endpoints, the extension installation can fail due to network or DNS restrictions:
az vmss run-command invoke -g <MC_resource-group> -n <vmss-name> --command-id RunShellScript --instance-id <instance-id> --scripts "nc -vz <cluster-fqdn> 443"
In parallel, please check the Activity Logs for the VMSS resource around the time the scale operation was attempted. If there are any extension-related failures (for example, VMExtensionProvisioningError), they usually give a clear indication of the root cause.
From similar cases, one common reason we see is a package manager lock (dpkg) during the extension installation. This can happen when background OS updates are running at the same time, which prevents the AKSLinuxExtension from completing within the expected time. This tends to occur more often on older node images. Upgrading the node image usually resolves this:
az aks nodepool upgrade --resource-group <resource-group> --cluster-name <cluster-name> --name <nodepool-name> --node-image-only
If your cluster version is also on the older side, it would be a good idea to upgrade that as well to a supported version.
We’ve also recently seen a platform-side issue where the extension installation was unintentionally tied to the cluster’s maintenance window. In those cases, nodes created outside the maintenance window would come up without the extension.
A fix has already been rolled out, but if your cluster was impacted earlier, it would be good to confirm whether the issue still persists.
As a temporary workaround, If your cluster has a Maintenance Window configured and the above diagnostic steps do not reveal another cause, please try the following workaround to confirm whether this is the issue you are hitting:
- Temporarily remove the maintenance window
- Create a new node pool
- Verify whether the AKSLinuxExtension gets installed
Additionally, could you please check the private message and provide necessary details