KKP fully supports VMware vSphere as a tier 1 provider. It enables automatic provisioning of user cluster nodes and storage management by integrating vSphere CCM and vSphere CSI.
The Kubernetes vSphere driver contains bugs related to detaching volumes from offline nodes. See the Volume detach bug section for more details.
When creating worker nodes for a user cluster, the user can specify an existing image. Defaults may be set in the seed configuration spec.datacenters.EXAMPLEDC.vsphere.endpoint
.
qemu-img convert -f qcow2 -O vmdk Ubuntu-x86_64-GenericCloud.qcow2 Ubuntu-x86_64-GenericCloud.vmdk
Modifications like network, disk size, etc. must be done in the ova template before creating a worker node from it. If user clusters have dedicated networks, all user clusters therefore need a custom template.
During creation of a user cluster Kubermatic Kubernetes Platform (KKP) creates a dedicated VM folder in the root path on the Datastore (Defined in the seed cluster spec.datacenters.EXAMPLEDC.vsphere.datastore
).
That folder will contain all worker nodes of a user cluster.
KKP utilises provider credentials to create and manage infrastructure on the respective cloud provider. For vSphere, permissions are needed to manage VMs, storage, networking and tags.
The vSphere provider allows to split permissions into two sets of credentials:
If such a split is not desired, one set of credentials used for both use cases can be provided instead. Providing two sets of credentials is optional.
The vsphere users has to have to following permissions on the correct resources. Note that if a shared set of credentials is used, roles for both use cases need to be assigned to the technical user which will be used for credentials.
Note: Below roles were updated based on vsphere-storage-plugin-roles for external CCM which is available from kkp v2.18+ and vsphere v7.0.2+
For the Cloud Controller Manager (CCM) and CSI components used to provide cloud provider and storage integration to the user cluster,
a technical user (e.g. cust-ccm-cluster
) is needed. The user should be assigned all roles listed below:
k8c-ccm-storage-vmfolder-propagate
$ govc role.ls k8c-ccm-storage-vmfolder-propagate
Folder.Create
Folder.Delete
VirtualMachine.Config.AddExistingDisk
VirtualMachine.Config.AddNewDisk
VirtualMachine.Config.AddRemoveDevice
VirtualMachine.Config.RemoveDisk
k8c-ccm-storage-datastore-propagate
$ govc role.ls k8c-ccm-storage-datastore-propagate
Datastore.AllocateSpace
Datastore.FileManagement
k8c-ccm-storage-cns
$ govc role.ls k8c-ccm-storage-cns
Cns.Searchable
Read-only
(predefined)$ govc role.ls ReadOnly
System.Anonymous
System.Read
System.View
For infrastructure (e.g. VMs, tags and networking) provisioning actions of KKP in the scope of a user cluster, the following roles have to be added to the existing user (if a single set of credentials is used) or an additional technical user (e.g. cust-infra-user-cluster
) is needed that has the following roles attached:
k8c-user-vcenter
$ govc role.ls k8c-user-vcenter
Cns.Searchable
InventoryService.Tagging.AttachTag
InventoryService.Tagging.CreateCategory
InventoryService.Tagging.CreateTag
InventoryService.Tagging.DeleteCategory
InventoryService.Tagging.DeleteTag
InventoryService.Tagging.EditCategory
InventoryService.Tagging.EditTag
InventoryService.Tagging.ModifyUsedByForCategory
InventoryService.Tagging.ModifyUsedByForTag
InventoryService.Tagging.ObjectAttachable
StorageProfile.View
System.Anonymous
System.Read
System.View
VirtualMachine.Provisioning.ModifyCustSpecs
VirtualMachine.Provisioning.ReadCustSpecs
k8c-user-datacenter
$ govc role.ls k8c-user-datacenter
Datastore.AllocateSpace
Datastore.Browse
Datastore.DeleteFile
Datastore.FileManagement
InventoryService.Tagging.ObjectAttachable
System.Anonymous
System.Read
System.View
VApp.ApplicationConfig
VApp.InstanceConfig
VirtualMachine.Config.CPUCount
VirtualMachine.Config.Memory
VirtualMachine.Config.Settings
VirtualMachine.Inventory.CreateFromExisting
k8c-user-cluster-propagate
cloud-init.iso
(Ubuntu) or defining the Ignition config into Guestinfo (CoreOS)$ govc role.ls k8c-user-cluster-propagate
AutoDeploy.Rule.Create
AutoDeploy.Rule.Delete
AutoDeploy.Rule.Edit
Folder.Create
Host.Config.Storage
Host.Config.SystemManagement
Host.Inventory.EditCluster
Host.Local.ReconfigVM
InventoryService.Tagging.ObjectAttachable
Resource.AssignVMToPool
Resource.ColdMigrate
Resource.HotMigrate
VApp.ApplicationConfig
VApp.InstanceConfig
k8c-network-attach
$ govc role.ls k8c-network-attach
InventoryService.Tagging.ObjectAttachable
Network.Assign
System.Anonymous
System.Read
System.View
k8c-user-datastore-propagate
Note: If a category id is assigned to a user cluster, KKP would claim the ownership of any tags it creates. KKP would try to delete tags assigned to the cluster upon cluster deletion. Thus, make sure that the assigned category isn’t shared across other lingering resources.
Note: Tags can be attached to machine deployments regardless if the tags are created via KKP or not. If a tag was not attached to the user cluster, machine controller will only detach it.
$ govc role.ls k8c-user-datastore-propagate
Datastore.AllocateSpace
Datastore.Browse
Datastore.FileManagement
InventoryService.Tagging.ObjectAttachable
System.Anonymous
System.Read
System.View
k8c-user-folder-propagate
$ govc role.ls k8c-user-folder-propagate
Folder.Create
Folder.Delete
Global.SetCustomField
InventoryService.Tagging.CreateTag
InventoryService.Tagging.DeleteTag
InventoryService.Tagging.AttachTag
InventoryService.Tagging.ObjectAttachable
System.Anonymous
System.Read
System.View
VirtualMachine.Config.AddExistingDisk
VirtualMachine.Config.AddNewDisk
VirtualMachine.Config.AddRemoveDevice
VirtualMachine.Config.AdvancedConfig
VirtualMachine.Config.Annotation
VirtualMachine.Config.CPUCount
VirtualMachine.Config.ChangeTracking
VirtualMachine.Config.DiskExtend
VirtualMachine.Config.DiskLease
VirtualMachine.Config.EditDevice
VirtualMachine.Config.HostUSBDevice
VirtualMachine.Config.ManagedBy
VirtualMachine.Config.Memory
VirtualMachine.Config.MksControl
VirtualMachine.Config.QueryFTCompatibility
VirtualMachine.Config.QueryUnownedFiles
VirtualMachine.Config.RawDevice
VirtualMachine.Config.ReloadFromPath
VirtualMachine.Config.RemoveDisk
VirtualMachine.Config.Rename
VirtualMachine.Config.ResetGuestInfo
VirtualMachine.Config.Resource
VirtualMachine.Config.Settings
VirtualMachine.Config.SwapPlacement
VirtualMachine.Config.ToggleForkParent
VirtualMachine.Config.Unlock
VirtualMachine.Config.UpgradeVirtualHardware
VirtualMachine.GuestOperations.Execute
VirtualMachine.GuestOperations.Modify
VirtualMachine.GuestOperations.ModifyAliases
VirtualMachine.GuestOperations.Query
VirtualMachine.GuestOperations.QueryAliases
VirtualMachine.Interact.AnswerQuestion
VirtualMachine.Interact.Backup
VirtualMachine.Interact.ConsoleInteract
VirtualMachine.Interact.CreateScreenshot
VirtualMachine.Interact.CreateSecondary
VirtualMachine.Interact.DefragmentAllDisks
VirtualMachine.Interact.DeviceConnection
VirtualMachine.Interact.DisableSecondary
VirtualMachine.Interact.DnD
VirtualMachine.Interact.EnableSecondary
VirtualMachine.Interact.GuestControl
VirtualMachine.Interact.MakePrimary
VirtualMachine.Interact.Pause
VirtualMachine.Interact.PowerOff
VirtualMachine.Interact.PowerOn
VirtualMachine.Interact.PutUsbScanCodes
VirtualMachine.Interact.Record
VirtualMachine.Interact.Replay
VirtualMachine.Interact.Reset
VirtualMachine.Interact.SESparseMaintenance
VirtualMachine.Interact.SetCDMedia
VirtualMachine.Interact.SetFloppyMedia
VirtualMachine.Interact.Suspend
VirtualMachine.Interact.TerminateFaultTolerantVM
VirtualMachine.Interact.ToolsInstall
VirtualMachine.Interact.TurnOffFaultTolerance
VirtualMachine.Inventory.Create
VirtualMachine.Inventory.CreateFromExisting
VirtualMachine.Inventory.Delete
VirtualMachine.Inventory.Move
VirtualMachine.Inventory.Register
VirtualMachine.Inventory.Unregister
VirtualMachine.Provisioning.Clone
VirtualMachine.Provisioning.CloneTemplate
VirtualMachine.Provisioning.CreateTemplateFromVM
VirtualMachine.Provisioning.Customize
VirtualMachine.Provisioning.DeployTemplate
VirtualMachine.Provisioning.DiskRandomAccess
VirtualMachine.Provisioning.DiskRandomRead
VirtualMachine.Provisioning.FileRandomAccess
VirtualMachine.Provisioning.GetVmFiles
VirtualMachine.Provisioning.MarkAsTemplate
VirtualMachine.Provisioning.MarkAsVM
VirtualMachine.Provisioning.ModifyCustSpecs
VirtualMachine.Provisioning.PromoteDisks
VirtualMachine.Provisioning.PutVmFiles
VirtualMachine.Provisioning.ReadCustSpecs
VirtualMachine.State.CreateSnapshot
VirtualMachine.State.RemoveSnapshot
VirtualMachine.State.RenameSnapshot
VirtualMachine.State.RevertToSnapshot
The described permissions have been tested with vSphere 8.0.2 and might be different for other vSphere versions.
Datastores in vSphere are an abstraction for storage. A Datastore Cluster is a collection of datastores with shared resources and a shared management interface.
In KKP Datastores are used for two purposes:
default-datastore
value, that is the default
datastore for dynamic volume provisioning.Datastore Clusters can only be used for the first purpose as it cannot be specified directly in vSphere cloud configuration.
There are three places where Datastores and Datastore Clusters can be configured in KKP:
spec.cloud.vsphere.datastore
and
spec.cloud.vsphere.datastoreCluster
fields.These settings can also be configured as part of the “Advanced Settings” step when creating a user cluster from the KKP dashboard.
After a node is powered-off, the Kubernetes vSphere driver doesn’t detach disks associated with PVCs mounted on that node. This makes it impossible to reschedule pods using these PVCs until the disks are manually detached in vCenter.
Upstream Kubernetes has been working on the issue for a long time now and tracking it under the following tickets:
Internal error occurred: failed calling webhook "webhook-name": failed to call webhook: Post "https://webhook-service-name.namespace.svc:443/webhook-endpoint": context deadline exceeded
On recent enough VMware hardware compatibility version (i.e >=15 or maybe >=14), CNI connectivity breaks because of hardware segmentation offload. cilium-health status
has ICMP connectivity working, but not TCP connectivity. cilium-health status may also fail completely.
sudo ethtool -K ens192 tx-udp_tnl-segmentation off
sudo ethtool -K ens192 tx-udp_tnl-csum-segmentation off
These flags are related to the hardware segmentation offload done by the vSphere driver VMXNET3. We have observed this issue for both Cilium and Canal CNI running on Ubuntu 22.04.
We have two options to configure these flags for KKP installations: