This page documents the list of known issues in Kubermatic KubeOne along with possible workarounds and recommendations.
This list applies to KubeOne 1.7 releases. For KubeOne 1.6, please consider the v1.6 version of this document. For earlier releases, please consult the appropriate changelog.
Status | Fixed in KubeOne 1.7.2 |
Severity | Critical |
GitHub issue | https://github.com/kubermatic/kubeone/issues/2976 |
This issue affects only OpenStack clusters. The following OpenStack users are affected by this issue:
kubeone apply
two or more timesThe OpenStack CCM and Cinder CSI are taking the cluster name property which is used upon creating OpenStack Load Balancers and Volumes. The cluster name property is provided as a flag on OpenStack CCM DaemonSet and Cinder CSI Controller Deployment. This cluster name property is used:
cinder.csi.openstack.org/cluster
tagDue to a bug introduced in KubeOne 1.7.0, the cluster name property is
unconditionally set to kubernetes
instead of the desired cluster’s name.
As a result:
cinder.csi.openstack.org/cluster
tag having a wrong valueIn general, the cluster name property must equal to the cluster name provided
to KubeOne (either via KubeOneCluster manifest (kubeone.yaml
by default) or
via the cluster_name
Terraform variable). This is especially important if you
have multiple Kubernetes clusters in the same OpenStack project.
You might be affected only if you’re using KubeOne 1.7.
Run the following kubectl
command, with kubectl
pointing to your
potentially affected cluster:
kubectl get daemonset \
--namespace kube-system \
openstack-cloud-controller-manager \
--output=jsonpath='{.spec.template.spec.containers[?(@.name=="openstack-cloud-controller-manager")].env[?(@.name=="CLUSTER_NAME")].value}'
If you get the following output:
kubernetes
: you’re affected by this issueRegardless if you’re affected or not, we strongly recommend upgrading to KubeOne 1.7.2 or newer as soon as possible!
If you’re affected by this issue, we strongly recommend taking the mitigation steps.
Please be aware that changing the cluster name might make some Octavia Load Balancers fail to reconcile. Volumes shouldn’t be affected.
First, determine your desired cluster name. The safest way is to dump the whole
KubeOneCluster manifest using the kubeone config dump
command
(make sure to replace tf.json
and kubeone.yaml
with valid files before
running the command):
kubeone config dump -t tf.json -m kubeone.yaml | grep "name:"
You’ll get output such as:
- name: default-storage-class
hostname: test-1-cp-0
sshUsername: ubuntu
hostname: test-1-cp-1
sshUsername: ubuntu
hostname: test-1-cp-2
sshUsername: ubuntu
- name: test-1-pool1
name: test-1
Note the top-level name
value, in this case test-1
– this is your desired
cluster name.
The next step is to patch the OpenStack CCM DaemonSet and Cinder CSI Deployment
(replace <<REPLACE_ME>>
with your cluster’s name in the following two
commands):
kubectl patch --namespace kube-system daemonset openstack-cloud-controller-manager --type='strategic' --patch='
{
"spec": {
"template": {
"spec": {
"containers": [
{
"name": "openstack-cloud-controller-manager",
"env": [
{
"name": "CLUSTER_NAME",
"value": "<<REPLACE_ME>>"
}
]
}
]
}
}
}
}'
kubectl patch --namespace kube-system deployment openstack-cinder-csi-controllerplugin --type='strategic' --patch='
{
"spec": {
"template": {
"spec": {
"containers": [
{
"name": "cinder-csi-plugin",
"env": [
{
"name": "CLUSTER_NAME",
"value": "<<REPLACE_ME>>"
}
]
}
]
}
}
}
}'
You should see the following output from these two commands:
daemonset.apps/openstack-cloud-controller-manager patched
deployment.apps/openstack-cinder-csi-controllerplugin patched
At this point, you need to remediate errors and failed reconcilations that might be caused by this change. As mentioned earlier, volumes are not affected by this change, but Octavia Load Balancers might be.
The easiest way to determine if you have Load Balancers affected by this change
is to look for SyncLoadBalancerFailed
events. You can do that using the
following command:
kubectl get events --all-namespaces --field-selector reason=SyncLoadBalancerFailed
You might get output like this:
NAMESPACE LAST SEEN TYPE REASON OBJECT MESSAGE
default 2s Warning SyncLoadBalancerFailed service/nginx-2 Error syncing load balancer: failed to ensure load balancer: the listener port 80 already exists
default 4h49m Warning SyncLoadBalancerFailed service/nginx Error syncing load balancer: failed to ensure load balancer: the listener port 80 already exists
default 3h7m Warning SyncLoadBalancerFailed service/nginx Error syncing load balancer: failed to ensure load balancer: the listener port 80 already exists
default 89m Warning SyncLoadBalancerFailed service/nginx Error syncing load balancer: failed to ensure load balancer: the listener port 80 already exists
default 22m Warning SyncLoadBalancerFailed service/nginx Error syncing load balancer: failed to ensure load balancer: the listener port 80 already exists
default 3m1s Warning SyncLoadBalancerFailed service/nginx Error syncing load balancer: failed to ensure load balancer: the listener port 80 already exists
Only events that are last seen after you made the cluster name change are
relevant. Other events can be ignored, although you might want to describe
those Services and ensure that you see EnsuredLoadBalancer
event.
For Services that are showing SyncLoadBalancerFailed
, you will need to take
steps depending on the error message. For example, if the error message is
the listener port 80 already exists, you can manually delete the listener and
OpenStack CCM will create a valid one again after some time.
Note: Load Balancers that were created with the cluster name set to
kubernetes
will NOT be removed from OpenStack upon deleting the Service
object in Kubernetes. The affected Load Balancers must be manually removed
from OpenStack after deleting the Service object.
Note 2: If you delete listeners manually, it might take up to 5 minutes for OpenStack CCM to recreate listeners. For that time, the Load Balancer will be unreachable, which can be unacceptable for some environments. The process can be sped up by manually removing all OpenStack CCM pods, which is going to retrigger reconciliation for all Load Balancers once the new CCM pods are up and running. The CCM pods can be deleted using the following command:
kubectl delete pod --namespace kube-system --selector="k8s-app=openstack-cloud-controller-manager"
Status | Fixed in KubeOne 1.7.2 |
Severity | Low |
GitHub issue | https://github.com/kubermatic/kubeone/issues/2971 |
Users who used KubeOne 1.6 or earlier to provision a cluster running on Microsoft Azure are affected by this issue.
The AzureDisk CSI driver got updated to a newer version in KubeOne 1.7.
This upgrade accidentally changed the csi-azuredisk-node-secret-binding
ClusterRoleBinding object so that the referenced role (roleRef
) is
csi-azuredisk-node-role
instead of csi-azuredisk-node-secret-role
.
Given that the referenced role is immutable, KubeOne wasn’t able to
upgrade the AzureDisk CSI driver when upgrading KubeOne from 1.6
to 1.7.0 or 1.7.1.
If you’re affected by this issue, it’s recommended to upgrade to KubeOne 1.7.2
or newer. KubeOne 1.7.2 removes the csi-azuredisk-node-secret-binding
ClusterRoleBinding object if the referenced role is
csi-azuredisk-node-secret-role
to allow the upgrade process to proceed.
The issue can also be mitigated manually by removing the ClusterRoleBinding object if KubeOne is stuck trying to upgrade the AzureDisk CSI driver:
kubectl delete clusterrolebinding csi-azuredisk-node-secret-binding
Status | Known Issue |
Severity | Low |
GitHub issue | N/A |
Cilium CNI is not supported on CentOS 7 because it’s using too older kernel version which is not supported by Cilium itself. For more details, consider the official Cilium documentation.
Please consider using an operating system with a newer kernel version, such as Ubuntu, Rocky Linux, and Flatcar. See the official Cilium documentation for a list of operating systems and versions supported by Cilium.
Status | Workaround available |
Severity | Low |
GitHub issue | https://github.com/cilium/cilium/issues/21801 |
Internal error occurred: failed calling webhook "webhook-name": failed to call webhook: Post "https://webhook-service-name.namespace.svc:443/webhook-endpoint": context deadline exceeded
On recent enough VMware hardware compatibility version (i.e >= 15 or maybe >= 14), CNI connectivity breaks because of hardware segmentation offload. cilium-health status
has ICMP connectivity working, but not TCP connectivity. cilium-health status may also fail completely.
sudo ethtool -K ens192 tx-udp_tnl-segmentation off
sudo ethtool -K ens192 tx-udp_tnl-csum-segmentation off
These flags are related to the hardware segmentation offload done by the vSphere driver VMXNET3. We have observed this issue for both Cilium and Canal CNI running on Ubuntu 22.04.
We have two options to configure these flags for KubeOne installations: