Automatic merge from submit-queue (batch tested with PRs 41223, 40892, 41220, 41207, 41242)
Fixes#40819 and Fixes#33114
**What this PR does / why we need it**:
Start looking up the virtual machine by it's UUID in vSphere again. Looking up by IP address is problematic and can either not return a VM entirely, or could return the wrong VM.
Retrieves the VM's UUID in one of two methods - either by a `vm-uuid` entry in the cloud config file on the VM, or via sysfs. The sysfs route requires root access, but restores the previous functionality.
Multiple VMs in a vCenter cluster can share an IP address - for example, if you have multiple VM networks, but they're all isolated and use the same address range. Additionally, flannel network address ranges can overlap.
vSphere seems to have a limitation of reporting no more than 16 interfaces from a virtual machine, so it's possible that the IP address list on a VM is completely untrustworthy anyhow - it can either be empty (because the 16 interfaces it found were veth interfaces with no IP address), or it can report the flannel IP.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Fixes#40819Fixes#33114
**Special notes for your reviewer**:
**Release note**:
```release-note
Reverts to looking up the current VM in vSphere using the machine's UUID, either obtained via sysfs or via the `vm-uuid` parameter in the cloud configuration file.
```
Start looking up the virtual machine by it's UUID in vSphere again. Looking up by IP address is problematic and can either not return a VM entirely, or could return the wrong VM.
Retrieves the VM's UUID in one of two methods - either by a `vm-uuid` entry in the cloud config file on the VM, or via sysfs. The sysfs route requires root access, but restores the previous functionality.
Multiple VMs in a vCenter cluster can share an IP address - for example, if you have multiple VM networks, but they're all isolated and use the same address range. Additionally, flannel network address ranges can overlap.
vSphere seems to have a limitation of reporting no more than 16 interfaces from a virtual machine, so it's possible that the IP address list on a VM is completely untrustworthy anyhow - it can either be empty (because the 16 interfaces it found were veth interfaces with no IP address), or it can report the flannel IP.
Automatic merge from submit-queue (batch tested with PRs 41037, 40118, 40959, 41084, 41092)
Fix for detach volume when node is not present/ powered off
Fixes#33061
When a vm is reported as no longer present in cloud provider and is deleted by node controller, there are no attempts to detach respective volumes. For example, if a VM is powered off or paused, and pods are migrated to other nodes. In the case of vSphere, the VM cannot be started again because the VM still holds mount points to volumes that are now mounted to other VMs.
In order to re-join this node again, you will have to manually detach these volumes from the powered off vm before starting it.
The current fix will make sure the mount points are deleted when the VM is powered off. Since all the mount points are deleted, the VM can be powered on again.
This is a workaround proposal only. I still don't see the kubernetes issuing a detach request to the vsphere cloud provider which should be the case. (Details in original issue #33061 )
@luomiao @kerneltime @pdhamdhere @jingxu97 @saad-ali
addressed review comments
Addressed review comment for pv_reclaimpolicy.go to verify content of the volume
addressed 2nd round of review comments
addressed 3rd round of review comments from jeffvance
Automatic merge from submit-queue
Changed default scsi controller type in vSphere Cloud Provider
This PR changes default scsi controller to ```pvscsi``` in vSphere Cloud Provider. Fixes#37527
Automatic merge from submit-queue
Bad conditional in vSphereLogin function
```release-note
Fixes NotAuthenticated errors that appear in the kubelet and kube-controller-manager due to never logging in to vSphere
```
With this conditional being == instead of !=, a login would never actually be attempted by this provider, and disk attachments would fail with a NotAuthenticated error from vSphere.
This method has been unused by k8s for some time, and yet is the last
piece of the cloud provider API that encourages provider names to be
human-friendly strings (this method applies a regex to instance names).
Actually removing this deprecated method is part of a long effort to
migrate from instance names to instance IDs in at least the OpenStack
provider plugin.
Automatic merge from submit-queue (batch tested with PRs 36543, 38189, 38289, 38291, 36724)
context.Context should be the first parameter of a function in vsphere
**What this PR does / why we need it**:
Change the position of the context.Context parameter.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
golint
**Release note**:
```release-note
```
Signed-off-by: yupeng <yu.peng36@zte.com.cn>
Check for error conditions from the vSphere API and return the err if one occurs. The vSphere API does not return an err for unauthenticated users, it just returns a nil user object.
At master volume reconciler, the information about which volumes are
attached to nodes is cached in actual state of world. However, this
information might be out of date in case that node is terminated (volume
is detached automatically). In this situation, reconciler assume volume
is still attached and will not issue attach operation when node comes
back. Pods created on those nodes will fail to mount.
This PR adds the logic to periodically sync up the truth for attached volumes kept in the actual state cache. If the volume is no longer attached to the node, the actual state will be updated to reflect the truth. In turn, reconciler will take actions if needed.
To avoid issuing many concurrent operations on cloud provider, this PR
tries to add batch operation to check whether a list of volumes are
attached to the node instead of one request per volume.
More details are explained in PR #33760
Automatic merge from submit-queue
vSphere cloud provider: re-use session for vCenter logins
This change allows for the re-use of a vCenter client session. Addresses #34491
We had another bug where we confused the hostname with the NodeName.
To avoid this happening again, and to make the code more
self-documenting, we use types.NodeName (a typedef alias for string)
whenever we are referring to the Node.Name.
A tedious but mechanical commit therefore, to change all uses of the
node name to use types.NodeName
Also clean up some of the (many) places where the NodeName is referred
to as a hostname (not true on AWS), or an instanceID (not true on GCE),
etc.
Addresses #33215.
When vCenter returns error vm not found, this is now being translated to
the appropriate error 'cloudprovider.InstanceNotFound' which indicates
to Kubernetes node controller that the VM is in fact not found.
Automatic merge from submit-queue
update pkg/cloudprovider OWNERS to spread the review load
This is going to make the mungebot start assigning reviews in your cloudprovider packages.
fyi @runseb @dagnello @imkin @anguslees @dagnello
Automatic merge from submit-queue
Fix: Dynamic provisioning for vSphere
This PR does the following,
1. Fixes an error 'A specified parameter was not correct:' occurs while dynamically provisioning the volumes.
2. Adds VSAN support for dynamic provisioning.
Automatic merge from submit-queue
vSphere Cloud provider null pointer exception
This PR addresses issue #31823.
SelectByType function in govmomi will panic if deviceType is not Array,
Chan, Map, Ptr, or Slice. Also checking if vmDevices or vm are nil,
there is nothing to cleanup.
Automatic merge from submit-queue
Make a vSphere cluster the failure_zone
vSphere cloud provider returns the FailureZone as Cluster, if the VM belongs to a ResourcePool under a Cluster.
fixes: #30933
* Currently the vSphere cloud provider treats Datacenter as the failure
Zone. This doesn't necessarily work since in the current implemention
Kubernetes nodes cannot span Datacenters.
* This change introduces Clusters as the failure zone, while treating
Datacenters as Regions
* Also updated tests for Zones
SelectByType function in govmomi will panic if deviceType is not Array,
Chan, Map, Ptr, or Slice. Also checking if vmDevices or vm are nil,
there is nothing to cleanup.
* Currently the vSphere cloud provider treats Datacenter as the failure
Zone. This doesn't necessarily work since in the current implemention
Kubernetes nodes cannot span Datacenters.
* This change introduces Clusters as the failure zone, while treating
Datacenters as Regions
* Also updated tests for Zones
Automatic merge from submit-queue
Run goimport for the whole repo
While removing GOMAXPROC and running goimports, I noticed quite a lot of other files also needed a goimport format. Didn't commit `*.generated.go`, `*.deepcopy.go` or files in `vendor`
This is more for testing if it builds.
The only strange thing here is the gopkg.in/gcfg.v1 => github.com/scalingdata/gcfg replace.
cc @jfrazelle @thockin