Automatic merge from submit-queue
vSphere Cloud provider null pointer exception
This PR addresses issue #31823.
SelectByType function in govmomi will panic if deviceType is not Array,
Chan, Map, Ptr, or Slice. Also checking if vmDevices or vm are nil,
there is nothing to cleanup.
Automatic merge from submit-queue
Make a vSphere cluster the failure_zone
vSphere cloud provider returns the FailureZone as Cluster, if the VM belongs to a ResourcePool under a Cluster.
fixes: #30933
* Currently the vSphere cloud provider treats Datacenter as the failure
Zone. This doesn't necessarily work since in the current implemention
Kubernetes nodes cannot span Datacenters.
* This change introduces Clusters as the failure zone, while treating
Datacenters as Regions
* Also updated tests for Zones
Automatic merge from submit-queue
Typos and englishify pkg/cloudprovider + pkg/dns + pkg/kubectl
**What this PR does / why we need it**: Just fixed some typos + "englishify" in pkg/cloudprovider + pkg/dns + pkg/kubectl
**Which issue this PR fixes** : None
**Special notes for your reviewer**: It's just fixes typos
**Release note**: `NONE`
Automatic merge from submit-queue
retry oauth token fetch in gce cloudprovider
Fixes https://github.com/kubernetes/kubernetes/issues/31560
The oauth client fetches a token on the initial request of that client. Let's warm the cache.
cc @goltermann @lavalamp
SelectByType function in govmomi will panic if deviceType is not Array,
Chan, Map, Ptr, or Slice. Also checking if vmDevices or vm are nil,
there is nothing to cleanup.
* Currently the vSphere cloud provider treats Datacenter as the failure
Zone. This doesn't necessarily work since in the current implemention
Kubernetes nodes cannot span Datacenters.
* This change introduces Clusters as the failure zone, while treating
Datacenters as Regions
* Also updated tests for Zones
Automatic merge from submit-queue
Fix AWS reporting "The parameter KmsKeyId requires the parameter Encrypted to be set."
- use aws.String/Int/Bool functions
- don't set the key to empty string, use nil instead
@justinsb @kubernetes/sig-storage
The problem is that attachments are now done on the master, and we are
only caching the attachment map persistently for the local instance. So
there is now a race, because the attachment map is cleared every time.
Issue #29324
Automatic merge from submit-queue
support Azure data disk volume
This is a WIP of supporting azure data disk volume. Will add test and dynamic provisioning support once #29006 is merged
replace #25915fix#23259
@kubernetes/sig-storage
@colemickens @brendandburns
Automatic merge from submit-queue
fix Openstack provider to allow more than one service port for lbaas v2
This resolves bug #30477 where if a service defines multiple ports for load balancer, the plugin will fail with multiple ports are not supported.
@anguslees @jianhuiz
Automatic merge from submit-queue
Implements Attacher Plugin Interface for vSphere
This PR does the following,
Fixes#29028 (vsphere volume should implement attacher interface): Implements Attacher Plugin Interface for vSphere.
See file:
pkg/volume/vsphere_volume/vsphere_volume.go. - Removed attach and detach calls from SetupAt and TearDownAt.
pkg/volume/vsphere_volume/attacher.go. - Implements Attacher & Detacher Plugin Interface for vSphere. (Ref :- GCE_PD & AWS attacher.go)
pkg/cloudproviders/provider/vsphere.go - Added DiskIsAttach method.
The vSphere plugin code needs clean up. (ex: The code for getting vSphere instance is repeated in file pkg/cloudprovider/providers/vsphere.go). I will fix this in next PR.
Automatic merge from submit-queue
AWS: More ELB attributes via service annotations
Replaces #25015 and addresses all of @justinsb's feedback therein. This is a new PR because I was unable to reopen#25015 to amend it.
I noticed recently that there is existing (but undocumented) precedent for the AWS cloud provider to manage ELB-specifc load balancer configuration based on service annotations. In particular, one can _already_ designate an ELB as "internal" or enable PROXY protocol.
This PR extends this capability to the management of ELB attributes, which includes the following items:
* Access logs:
* Enabled / disabled
* Emit interval
* S3 bucket name
* S3 bucket prefix
* Connection draining:
* Enabled / disabled
* Timeout
* Connection:
* Idle timeout
* Cross-zone load balancing:
* Enabled / disabled
Some of these are possibly more useful than others. Use cases that immediately come to mind:
* Enabling cross-zone load balancing is potentially useful for "Ubernetes Light," or anyone otherwise attempting to spread worker nodes around multiple AZs.
* Increasing idle timeout is useful for the benefit of anyone dealing with long-running requests. An example I personally care about would be git pushes to Deis' builder component.
Automatic merge from submit-queue
AWS: Support HTTP->HTTP mode for ELB
**What this PR does / why we need it**:
Right now it is not possible to create an AWS ELB that listens for HTTP and where the backend pod also listens for HTTP.
I asked @justinsb in slack and he said that this seems to be an oversight, so I'd like to use this PR as a step towards solving this.
**Special notes for your reviewer**:
I've only added a simple unit test. Are any integration tests needed? I'm not familiar with the code base.
cc @therc
This commit adds logic for allocating and associating a public IP, if the `—load-balancer-ip` option is not used. It will do proper management of IP’s that are allocated by this provider, so IP’s that are no longer needed/used will also be released again.
Additionally the provider can now also work with CloudStack projects and advanced (VPC) networks.
Lastly the Zone interface now returns an actual zone (supplied by the cloud config), a few logical errors are fixed and the first few tests are added.
All the functionality is extensively tested against both basic and advanced (VPC) networks.
Automatic merge from submit-queue
openstack: Autodetect LBaaS v1 vs v2
```release-note
* openstack: autodetect LBaaS v1/v2 by querying for available extensions. For most installs, this effectively changes the default from v1 to v2. Existing installs can add "lb-version = v1" to the provider config file to continue to use v1.
```
<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/29726)
<!-- Reviewable:end -->
This removes the need to manually specify the version in all but unusual
cases.
For most installs this will effectively flip the default from
v1 (deprecated) to v2 so conservative existing installs may want to
manually configure "lb-version = v1" before upgrading.
Automatic merge from submit-queue
Run goimport for the whole repo
While removing GOMAXPROC and running goimports, I noticed quite a lot of other files also needed a goimport format. Didn't commit `*.generated.go`, `*.deepcopy.go` or files in `vendor`
This is more for testing if it builds.
The only strange thing here is the gopkg.in/gcfg.v1 => github.com/scalingdata/gcfg replace.
cc @jfrazelle @thockin
Automatic merge from submit-queue
TestLoadBalancer() test v1 not v2
TestLoadBalancer() should test v1 and TestLoadBalancerV2() test v2, but In TestLoadBalancerV() there are codes:
cfg.LoadBalancer.LBVersion = "v2"
We have a few functions that predate aws-sdk-go, but they have natural
equivalents in aws-sdk-go. Document them as deprecated, and replace
the implementation with the equivalent in aws-sdk-go to make it obvious
that they are the same.
The requirement that ExternalID returns InstanceNotFound when the
instance not found was incorrectly documented on InstanceID and
InstanceType. This requirement arises from the node controller, which
is the only place that checks for the InstanceNotFound error.
Automatic merge from submit-queue
Information is opposite to real meaning to express
master is not equal to expectedMaster, the meaning should be the master is unexpected:
master, err := mesosCloud.Master(clusterName)
if master != expectedMaster {
t.Fatalf("Master returns the expected value: (expected: %#v, actual: %#v", expectedMaster, master)
Automatic merge from submit-queue
format number not consistent with real variable number
glog.Infof format number not consistent with real variable number, should add %s for second var because loadBalancerName is string:
func (c *Cloud) ensureLoadBalancer(namespacedName types.NamespacedName, loadBalancerName string, ...
Automatic merge from submit-queue
AWS: Added experimental option to skip zone check
This pull request resolves#28380. In the vast majority of cases, it is appropriate to validate the AWS region against a known set of regions. However, there is the edge case where this is undesirable as Kubernetes may be deployed in an AWS-like environment where the region is not one of the known regions.
By adding the optional **DisableStrictZoneCheck true** to the **[Global]** section in the aws.conf file (e.g. /etc/aws/aws.conf) one can bypass the ragion validation.
This allows the user the set "working-dir" in their vsphere.cfg file.
The value should be a path in the vSphere datastore in which the
provider will look for vms.
Automatic merge from submit-queue
vSphere provider - Getting node data by ip instead of uuid
To get the uuid we need the service to be running as root. This change
allows us to run the controller-manager and api server as non-root.
Automatic merge from submit-queue
Adding SCSI controller type filter for vSphere disk attach
Hot plug of disks to a SCSI controller of type lsilogic doesn't work as expected. When a device is detached from the controller, it fails to remove the device from the /dev path which makes the subsequent attaches to the node to fail. With scsi controller types lsilogic-sas or paravirtual this seems to work well. This patch filters the existing controller for these types, and if it doesn't find one, it creates a new controller for disk attach.
This PR is dependent on https://github.com/kubernetes/kubernetes/pull/26658 (1st commit) also targeting this for 1.3
Automatic merge from submit-queue
Removing name field from Member for compatibility with OpenStack Liberty
In OpenStack Mitaka, the name field for members was added as an optional field but does not exist in Liberty. Therefore the current implementation for lbaas v2 will not work in Liberty.
Automatic merge from submit-queue
AWS/GCE: Spread PetSet volume creation across zones, create GCE volumes in non-master zones
Long term we plan on integrating this into the scheduler, but in the
short term we use the volume name to place it onto a zone.
We hash the volume name so we don't bias to the first few zones.
If the volume name "looks like" a PetSet volume name (ending with
-<number>) then we use the number as an offset. In that case we hash
the base name.
Automatic merge from submit-queue
GCE provider: Create TargetPool with 200 instances, then update with rest
GCE provider: Create TargetPool with 200 instances, then update with rest
Tested with 2000 nodes, this actually meets the GCE API specifications (which is nutty). Previous PR (#25178) was based on a mistaken understanding of a poorly documented set of limitations, and even poorer testing, for which I am embarassed.
Also includes the revert of #25178 (review commits separately).
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]()
Tested with 2000 nodes, this actually meets the GCE API specifications
(which is nutty). Previous PR (#25178) was based on a mistaken
understanding of a poorly documented set of limitations, and even
poorer testing, for which I am embarassed.
In OpenStack Mitaka, the name field for members was added as an optional
field but does not exist in Liberty. Therefore the current
implementation for lbaas v2 will not work in Liberty.
Lots of comments describing the heuristics, how it fits together and the
limitations.
In particular, we can't guarantee correct volume placement if the set of
zones is changing between allocating volumes.
Filters can't exceed 4k, and GET requests against the GCE API are also
limited, so these break down in different ways at different cluster
counts. Fix it by introducing an advisory node-instance-prefix
configuration in the GCE provider that can hint the
EnsureLoadBalancer/UpdateLoadBalancer code (and the firewall
creation/update code). If it's not there, or wrong (a hostname that's
registered violates it), just ignore it and grab the whole project.
Hot attach of disk to a scsi controller will work only if the
controller type is lsilogic-sas or paravirtual.This patch filters
the existing controller for these types, if it doesn't find one it
creates a new scsi controller.
Automatic merge from submit-queue
AWS volumes: Use /dev/xvdXX names with EC2
We are using HVM style names, which cannot be paravirtual style names.
See
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html
This also fixes problems introduced when moving volume mounting to KCM.
Fix#27534
Automatic merge from submit-queue
refuse to create a firewall rule with no target tag
fixes#25145
This modification in gce.firewallObject() will return error when trying
to create or update firewall rule if no node tag can be found. Also add
unit test for this modification.
We had a long-lasting bug which prevented creation of volumes in
non-master zones, because the cloudprovider in the volume label
admission controller is not initialized with the multizone setting
(issue #27656).
This implements a simple workaround: if the volume is created with the
failure-domain zone label, we look for the volume in that zone. This is
more efficient, avoids introducing a new semantic, and allows users (and
the dynamic provisioner) to create volumes in non-master zones.
Fixes#27657
Long term we plan on integrating this into the scheduler, but in the
short term we use the volume name to place it onto a zone.
We hash the volume name so we don't bias to the first few zones.
If the volume name "looks like" a PetSet volume name (ending with
-<number>) then we use the number as an offset. In that case we hash
the base name.
Fixes#27256
- replaces probeVolume with scsiHostRescan to scan hot attached disks
- fixes substring match of UUID returned from AttachDisk
- changes DetachDisk to take volumePath argument instead of diskID
- fixes delayed failure at mount rather than attach disk
- removes cloning of virtual disk in AttachDisk
Automatic merge from submit-queue
AWS: cache instances during service reload to avoid rate limiting on restart
Fixes#25610 by reducing redundant calls to DescribeInstances()
```release-note
* The AWS cloudprovider will cache results from DescribeInstances() if the set of nodes hasn't changed
```
Also move int/stringSlicesEqual from servicecontroller.go to pkg/util/slice
Automatic merge from submit-queue
AWS: support mixed plaintext/encrypted ports in ELBs via service.beta.kubernetes.io/aws-load-balancer-ssl-ports annotation
Fixes#26268
Implements the second SSL ELB annotation, per #24978
`service.beta.kubernetes.io/aws-load-balancer-ssl-ports=*` (comma-separated list of port numbers or e.g. `https`)
If not specified, all ports are secure (SSL or HTTPS).
Automatic merge from submit-queue
LBaaS v2 Support for Openstack Cloud Provider Plugin
Resolves#19774.
This work is based on Gophercloud support for LBaaS v2 currently in review (this will have to merge first):
https://github.com/rackspace/gophercloud/pull/575
These changes includes the addition of a new loadbalancer configuration option: **LBVersion**. If this configuration attribute is missing or anything other than "v2", lbaas v1 implementation will be used.
Fixes#26268
Implements the second SSL ELB annotation, per #24978
service.beta.kubernetes.io/aws-load-balancer-ssl-ports=* (or e.g. https)
If not specified, all ports are secure (SSL or HTTPS).
Add ELB proxy protocol support via the annotation
"service.beta.kubernetes.io/aws-load-balancer-proxy-protocol". This
allows servers like Nginx and Haproxy to retrieve the real IP address of
a remote client.
Implements #25145
This modification in gce.firewallObject() will return error when trying
to create or update firewall rule if no node tag can be found. Also add
unit test for this modification.
Instead of N pointers, we were returning N null pointers, followed by the real
thing. It's not clear why we didn't trip on this until now, maybe there is a
new server-side check for empty subnetID strings.
Instead of just rate limits to operation polling, send all API calls
through a rate limited RoundTripper.
This isn't a perfect solution, since the QPS is obviously getting
split between different controllers, etc., but it's also spread across
different APIs, which, in practice, rate limit differently.
Fixes#26119 (hopefully)
Automatic merge from submit-queue
AWS: Move enforcement of attached AWS device limit from kubelet to scheduler
Limit of nr. of attached EBS volumes to a node is now enforced by scheduler. It can be adjusted by `KUBE_MAX_PD_VOLS` env. variable there. Therefore we don't need the same check in kubelet. If the system admin wants to attach more, we should allow it.
Kubelet limit is now 650 attached volumes ('ba'..'zz').
Note that the scheduler counts only *pods* assigned to a node. When a pod is deleted and a new pod is scheduled on a node, kubelet start (slowly) detaching the old volume and (slowly) attaching the new volume. Depending on AWS speed **it may happen that more than KUBE_MAX_PD_VOLS volumes are actually attached to a node for some time!** Kubelet will clean it up in few seconds / minutes (both attach/detach is quite slow).
Fixes#22994
Automatic merge from submit-queue
AWS: Allow cross-region image pulling with ECR
Fixes#23298
Definitely should be in the release notes; should maybe get merged in 1.2 along with #23594 after some soaking. Documentation changes to follow.
cc @justinsb @erictune @rata @miguelfrde
This is step two. We now create long-lived, lazy ECR providers in all regions.
When first used, they will create the actual ECR providers doing the work
behind the scenes, namely talking to ECR in the region where the image lives,
rather than the one our instance is running in.
Also:
- moved the list of AWS regions out of the AWS cloudprovider and into the
credentialprovider, then exported it from there.
- improved logging
Behold, running in us-east-1:
```
aws_credentials.go:127] Creating ecrProvider for us-west-2
aws_credentials.go:63] AWS request: ecr:GetAuthorizationToken in us-west-2
aws_credentials.go:217] Adding credentials for user AWS in us-west-2
Successfully pulled image "123456789012.dkr.ecr.us-west-2.amazonaws.com/test:latest"
```
*"One small step for a pod, one giant leap for Kube-kind."*
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/24369)
<!-- Reviewable:end -->
Automatic merge from submit-queue
AWS: SSL support for ELB listeners through annotations
In the API, ports have only either TCP or UDP as their protocols, but ELB distinguishes HTTPS->HTTP[S]? from SSL->(SSL|TCP).
Per #24978, this is implemented through two separate annotations:
`service.beta.kubernetes.io/aws-load-balancer-ssl-cert=arn:aws:acm:us-east-1:123456789012:certificate/12345678-1234-1234-1234-123456789012`
`service.beta.kubernetes.io/aws-load-balancer-backend-protocol=(https|http|ssl|tcp)`
Mixing plain-text and encrypted listeners will be in a separate PR, implementing #24978's `aws-load-balancer-ssl-ports=LIST`
Automatic merge from submit-queue
vSphere Cloud Provider Implementation
This is the first PR towards implementation for vSphere cloud provider support in Kubernetes (ref. issue #23932).
Use constructor for ecrProvider
Rename package to "credentials" like golint requests
Don't wrap the lazy provider with a caching provider
Add immedita compile-time interface conformance checks for the interfaces
Added comments
This is step two. We now create long-lived, lazy ECR providers in all regions.
When first used, they will create the actual ECR providers doing the work
behind the scenes, namely talking to ECR in the region where the image lives,
rather than the one our instance is running in.
Also:
- moved the list of AWS regions out of the AWS cloudprovider and into the
credentialprovider, then exported it from there.
- improved logging
Behold, running in us-east-1:
```
aws_credentials.go:127] Creating ecrProvider for us-west-2
aws_credentials.go:63] AWS request: ecr:GetAuthorizationToken in us-west-2
aws_credentials.go:217] Adding credentials for user AWS in us-west-2
Successfully pulled image 123456789012.dkr.ecr.us-west-2.amazonaws.com/test:latest"
```
*"One small step for a pod, one giant leap for Kube-kind."*
When vSphere cloud provider object is instantiated, the VM name of the
Node where this object is being create in needs to be set. This patch
also includes vSphere as part of the cloud provider package.
This patch includes implementation for the following Instance object
interfaces:
* NodeAddresses
* ExternalID
* InstanceID
Also minor refactoring in overall Instance implementation.
Automatic merge from submit-queue
AWS: Add support for ap-northeast-2 region (Seoul)
This PR does:
- Support AWS Seoul region: ap-northeast-2.
Currently, I can not setup Kubernetes on AWS Seoul.
Error Messages:
>
> ip-10-0-0-50 core # docker logs 0697db
> I0419 07:57:44.569174 1 aws.go:466] Zone not specified in configuration file; querying AWS metadata service
> F0419 07:57:44.570380 1 controllermanager.go:279] Cloud provider could not be initialized: could not init cloud provider "aws": not a valid AWS zone (unknown region): ap-northeast-2a
Automatic merge from submit-queue
Rackspace improvements (OpenStack Cinder)
This adds PV support via Cinder on Rackspace clusters. Rackspace Cloud Block Storage is pretty much vanilla OpenStack Cinder, so there is no need for a separate Volume Plugin. Instead I refactored the Cinder/OpenStack interaction a bit (by introducing a CinderProvider Interface and moving the device path detection logic to the OpenStack part).
Right now this is limited to `AttachDisk` and `DetachDisk`. Creation and deletion of Block Storage is not in scope of this PR.
Also the `ExternalID` and `InstanceID` cloud provider methods have been implemented for Rackspace.
Limit of nr. of attached EBS volumes to a node is now enforced by scheduler. It
can be adjusted by KUBE_MAX_PD_VOLS env. variable there.
Therefore we don't need the same check in kubelet. If the system admin wants to
attach more, we should allow it.
Kubelet limit is now 650 attached volumes ('ba'..'zz').
This is a better abstraction than passing in specific pieces of the
Service that each of the cloudproviders may or may not need. For
instance, many of the providers don't need a region, yet this is passed
in. Similarly many of the providers want a string IP for the load
balancer, but it passes in a converted net ip. Affinity is unused by
AWS. A provider change may also require adding a new parameter which has
an effect on all other cloud provider implementations.
Further, this will simplify adding provider specific load balancer
options, such as with labels or some other metadata. For example, we
could add labels for configuring the details of an AWS elastic load
balancer, such as idle timeout on connections, whether it is
internal or external, cross-zone load balancing, and so on.
Authors: @chbatey, @jsravn
The previous logic was incorrect; if we saw two untagged security groups
before seeing the first tagged security, we would incorrectly return an
error.
Fix#23339
AWS has soft support limit for 40 attached EBS devices. Assuming there is just
one root device, use the rest for persistent volumes.
The devices will have name /dev/xvdba - /dev/xvdcm, leaving /dev/sda - /dev/sdz
to the system.
Also, add better error handling and propagate error
"Too many EBS volumes attached to node XYZ" to a pod.
We have previously tried building a full cloudprovider in e2e for AWS;
this wasn't the best idea, because e2e runs on a different machine than
normal operations, and often doesn't even run in AWS. In turn, this
meant that the cloudprovider had to do extra work and have extra code,
which we would like to get rid of. Indeed, I got rid of some code which
tolerated not running in AWS, and this broke e2e.
There are known issues with the attached-volume state cache that just aren't
possible to fix with the current interface.
Replace it with a map of the active attach jobs (that was the original
requirement, to avoid a nasty race condition).
This costs us an extra DescribeInstance call on attach/detach, but that
seems worth it if it ends this class of bugs.
Fix#15073