This applies a cross-request time delay when we observe
RequestLimitExceeded errors, unlike the default library behaviour which
only applies a *per-request* backoff.
Issue #12121
In the AWS API (generally) we tag things we create, and then we filter
to find them. However, creation & tagging are typically two separate
calls. So there is a chance that we will create an object, but fail to
tag it.
We fix this (done here in the case of security groups, but we can do
this more generally) by retrieving the resource without a tag filter.
If the retrieved resource has the correct tags, great. If it has the
tags for another cluster, that's a problem, and we raise an error. If
it has no tags at all, we add the tags.
This only works where the resource is uniquely named (or we can
otherwise retrieve it uniquely). For security groups, the SG name comes
from the service UUID, so that's unique.
Fixes#11324
Volume names have now format <cluster-name>-dynamic-<pv-name>.
pv-name is guaranteed to be unique in Kubernetes cluster, adding
<cluster-name> ensures we don't conflict with any running cluster
in the cloud project (kube-controller-manager --cluster-name=XXX).
'kubernetes' is the default cluster name.
AWS doesn't support type=LoadBalancer with UDP services. For now, we
simply skip over the test with type=LoadBalancer on AWS for the UDP
service.
Fix#20911
This commit allows the AWS cloud provider plugin to work on EC2 instances
that do not have a public IP. The EC2 metadata service returns a 404 for the
'public-ipv4' endpoint for private instances, and the plugin was bubbling this
up as a fatal error.
We are (sadly) using a copy-and-paste of the GCE PD code for AWS EBS.
This code hasn't been updated in a while, and it seems that the GCE code
has some code to make volume mounting more robust that we should copy.
GCE disks don't have tags, we must encode the tags into Description field.
It's encoded as JSON, which is both human and machine readable:
description: '{"kubernetes.io/created-for/pv/name":"pv-gce-oxwts","kubernetes.io/created-for/pvc/name":"myclaim","kubernetes.io/created-for/pvc/namespace":"default"}'
The ip permission method now checks for containment, not equality, so
order of parameters matter. This change fixes
`removeSecurityGroupIngress` to pass in the removal permission first to
compare against the existing permission.
Change isEqualIPPermission to consider the entire list of security group
ids on when checking if a security group id has already been added.
This is used for example when adding and removing ingress rules to the
cluster nodes from an elastic load balancer. Without this, once there
are multiple load balancers, the method as it stands incorrectly returns
false even if the security group id is in the list of group ids. This
causes a few problems: dangling security groups which fill up an
account's limit since they don't get removed, and inability to recreate
load balancers in certain situations (receiving an
InvalidPermission.Duplicate from AWS when adding the same security
group).
According to AWS, the ELB healthy threshold is "Number of consecutive health check successes before declaring an EC2 instance healthy." It has an unusual interaction with Kubernetes, since all nodes will enter either an unhealthy state or a healthy state together depending on the service's healthiness as a whole.
We have observed that if our service goes down for the unhealthy threshold (which is 2 checks at 30 second intervals = 60 seconds), then the ELB will stop serving traffic to all nodes in the cluster, and will wait for the healthy threshold (currently 10 * 30 = 300 seconds) AFTER the service is restored to add back the cluster nodes, meaning it remains unreachable for an extra 300 seconds.
With the new settings, the ELB will continue to timeout dead nodes after 60 seconds, but will restore healthy nodes after 20 seconds. The minimum value for healthyThreshold is 2, and the minimum value for interval is 5 seconds. I went for 10 seconds instead of the minimum sort of arbitrarily because I was not sure how much this value may affect the scalability of clusters in EC2, as it does put some extra load on the kube-proxy.
We don't cope well if a PD is in multiple zones, but this is actually
fairly easy to detect. This is probably justified purely on the basis
that we never want to delete the wrong volume (DeleteDisk), but also
because this means that we now warn on creation if a disk is in multiple
zones (with the labeling admission controller).
This also means that with the scheduling predicate in place, that many
of our volume problems "go away" in practice: you still can't create or
delete a volume when it is ambiguous, but thereafter the volume will be
labeled with the zone, that will match it only to nodes with the same
zone, and then we query for the volume in that zone when we
attach/detach it.