This method has been unused by k8s for some time, and yet is the last
piece of the cloud provider API that encourages provider names to be
human-friendly strings (this method applies a regex to instance names).
Actually removing this deprecated method is part of a long effort to
migrate from instance names to instance IDs in at least the OpenStack
provider plugin.
At master volume reconciler, the information about which volumes are
attached to nodes is cached in actual state of world. However, this
information might be out of date in case that node is terminated (volume
is detached automatically). In this situation, reconciler assume volume
is still attached and will not issue attach operation when node comes
back. Pods created on those nodes will fail to mount.
This PR adds the logic to periodically sync up the truth for attached volumes kept in the actual state cache. If the volume is no longer attached to the node, the actual state will be updated to reflect the truth. In turn, reconciler will take actions if needed.
To avoid issuing many concurrent operations on cloud provider, this PR
tries to add batch operation to check whether a list of volumes are
attached to the node instead of one request per volume.
More details are explained in PR #33760
Automatic merge from submit-queue
Loadbalanced client src ip preservation enters beta
Sounds like we're going to try out the proposal (https://github.com/kubernetes/kubernetes/issues/30819#issuecomment-249877334) for annotations -> fields on just one feature in 1.5 (scheduler). Or do we want to just convert to fields right now?
'names' is an array of FQDNs. 'instances' is a map indexed by canonicalized
name. Clearly these two won't always match, so when building the final
instance array to return, make sure to look up map entries by their canonicalized
name.
In the below example, "ocp-master-5pob" is clearly found as a GCE instance
but when building the final instance array it cannot be matched as the code
is looking for "ocp-master-5pob.c.ose-refarch.internal" instead. The node
is then deleted from the cluster as it cannot be found by the cloud provider.
gce.go:2519] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]): initial node prefix ocp-
gce.go:2530] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]): looking for instances map[ocp-master-5pob:<nil>]
gce.go:2533] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]): getting zone 'europe-west1-c' (remaining 1)
gce.go:2563] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]): instance name <omitted> not requested
gce.go:2563] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]): instance name <omitted> not requested
gce.go:2533] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]): getting zone 'europe-west1-b' (remaining 1)
gce.go:2563] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]): instance name <omitted> not requested
gce.go:2576] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]): found instance 'ocp-master-5pob' remaining 0
gce.go:2563] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]): instance name <omitted> not requested
gce.go:2533] ### getInstancesByNames([ocp-master-5pob.c.ose-refarch.internal]): getting zone 'europe-west1-d' (remaining 0)
gce.go:2588] Failed to retrieve instance: "ocp-master-5pob.c.ose-refarch.internal"
gce.go:2624] ### getInstanceByName(ocp-master-5pob.c.ose-refarch.internal): got []: instance not found
gce.go:2626] getInstanceByName/multiple-zones: failed to get instance ocp-master-5pob.c.ose-refarch.internal; err: instance not found
nodecontroller.go:587] Deleting node (no longer present in cloud provider): ocp-master-5pob.c.ose-refarch.internal
nodecontroller.go:664] Recording Deleting Node ocp-master-5pob.c.ose-refarch.internal because it's not present according to cloud provider event message for node ocp-master-5pob.c.ose-refarch.internal
- test pd-ssd and pd-standard on GCE,
- test all four volume types on AWS
- test just the default volume type on OpenStack (right now, there is no API
to get list of them)
Contination of #1111
I tried to keep this PR down to just a simple search-n-replace to keep
things simple. I may have gone too far in some spots but its easy to
roll those back if needed.
I avoided renaming `contrib/mesos/pkg/minion` because there's already
a `contrib/mesos/pkg/node` dir and fixing that will require a bit of work
due to a circular import chain that pops up. So I'm saving that for a
follow-on PR.
I rolled back some of this from a previous commit because it just got
to big/messy. Will follow up with additional PRs
Signed-off-by: Doug Davis <dug@us.ibm.com>
We had another bug where we confused the hostname with the NodeName.
To avoid this happening again, and to make the code more
self-documenting, we use types.NodeName (a typedef alias for string)
whenever we are referring to the Node.Name.
A tedious but mechanical commit therefore, to change all uses of the
node name to use types.NodeName
Also clean up some of the (many) places where the NodeName is referred
to as a hostname (not true on AWS), or an instanceID (not true on GCE),
etc.
Bump version of golang.org/x/oauth2
Vendor google.golang.org/cloud/
Vendor google.golang.org/api/
Vendor cloud.google.com/go/compute/
Replace google.golang.org/cloud with cloud.google.com/go/
Fixes#30069
Automatic merge from submit-queue
Do not allow creation of GCE PDs in unmanaged zones.
Such volumes then couldn't be deleted as `getDiskByNameUnknownZone` goes through managed zones only.
Fixes: #31948
@kubernetes/rh-storage
@saad-ali, PTAL.
Automatic merge from submit-queue
Typos and englishify pkg/cloudprovider + pkg/dns + pkg/kubectl
**What this PR does / why we need it**: Just fixed some typos + "englishify" in pkg/cloudprovider + pkg/dns + pkg/kubectl
**Which issue this PR fixes** : None
**Special notes for your reviewer**: It's just fixes typos
**Release note**: `NONE`
Automatic merge from submit-queue
AWS/GCE: Spread PetSet volume creation across zones, create GCE volumes in non-master zones
Long term we plan on integrating this into the scheduler, but in the
short term we use the volume name to place it onto a zone.
We hash the volume name so we don't bias to the first few zones.
If the volume name "looks like" a PetSet volume name (ending with
-<number>) then we use the number as an offset. In that case we hash
the base name.
Tested with 2000 nodes, this actually meets the GCE API specifications
(which is nutty). Previous PR (#25178) was based on a mistaken
understanding of a poorly documented set of limitations, and even
poorer testing, for which I am embarassed.
Lots of comments describing the heuristics, how it fits together and the
limitations.
In particular, we can't guarantee correct volume placement if the set of
zones is changing between allocating volumes.
Filters can't exceed 4k, and GET requests against the GCE API are also
limited, so these break down in different ways at different cluster
counts. Fix it by introducing an advisory node-instance-prefix
configuration in the GCE provider that can hint the
EnsureLoadBalancer/UpdateLoadBalancer code (and the firewall
creation/update code). If it's not there, or wrong (a hostname that's
registered violates it), just ignore it and grab the whole project.
Automatic merge from submit-queue
refuse to create a firewall rule with no target tag
fixes#25145
This modification in gce.firewallObject() will return error when trying
to create or update firewall rule if no node tag can be found. Also add
unit test for this modification.
We had a long-lasting bug which prevented creation of volumes in
non-master zones, because the cloudprovider in the volume label
admission controller is not initialized with the multizone setting
(issue #27656).
This implements a simple workaround: if the volume is created with the
failure-domain zone label, we look for the volume in that zone. This is
more efficient, avoids introducing a new semantic, and allows users (and
the dynamic provisioner) to create volumes in non-master zones.
Fixes#27657
Long term we plan on integrating this into the scheduler, but in the
short term we use the volume name to place it onto a zone.
We hash the volume name so we don't bias to the first few zones.
If the volume name "looks like" a PetSet volume name (ending with
-<number>) then we use the number as an offset. In that case we hash
the base name.
Fixes#27256
Implements #25145
This modification in gce.firewallObject() will return error when trying
to create or update firewall rule if no node tag can be found. Also add
unit test for this modification.
Instead of just rate limits to operation polling, send all API calls
through a rate limited RoundTripper.
This isn't a perfect solution, since the QPS is obviously getting
split between different controllers, etc., but it's also spread across
different APIs, which, in practice, rate limit differently.
Fixes#26119 (hopefully)
This is a better abstraction than passing in specific pieces of the
Service that each of the cloudproviders may or may not need. For
instance, many of the providers don't need a region, yet this is passed
in. Similarly many of the providers want a string IP for the load
balancer, but it passes in a converted net ip. Affinity is unused by
AWS. A provider change may also require adding a new parameter which has
an effect on all other cloud provider implementations.
Further, this will simplify adding provider specific load balancer
options, such as with labels or some other metadata. For example, we
could add labels for configuring the details of an AWS elastic load
balancer, such as idle timeout on connections, whether it is
internal or external, cross-zone load balancing, and so on.
Authors: @chbatey, @jsravn
Had to move other things around too to avoid a weird api ->
cloudprovider dependency.
Also adding fixes per code reviews.
(This is a squash of the previously approved commits)
This refactors #21431 to pull a lot of the code into cloudprovider so it
can be reused by AWS.
It also changes the name of the annotation to be non-GCE specific:
service.beta.kubernetes.io/load-balancer-source-ranges
Fix#21651
for Instance.List and Routes.List which we will definitely have
more than 500 of when supporting 1000 nodes.
Add TODOs for other GCE List API calls to do similar fixes.
Add more logging to GCE's routecontroller.go when creating or deleting routes.
Follow up from #20731. I have no way of testing this.
There's an additional group of functions (Get|Delete|Reserve)GlobalStaticIP that can create an IP without the
service description, but those are not called anywhere in the Kubernetes codebase and are probably for the
Ingress project. I'm leaving those alone for now.
GCE disks don't have tags, we must encode the tags into Description field.
It's encoded as JSON, which is both human and machine readable:
description: '{"kubernetes.io/created-for/pv/name":"pv-gce-oxwts","kubernetes.io/created-for/pvc/name":"myclaim","kubernetes.io/created-for/pvc/namespace":"default"}'
We don't cope well if a PD is in multiple zones, but this is actually
fairly easy to detect. This is probably justified purely on the basis
that we never want to delete the wrong volume (DeleteDisk), but also
because this means that we now warn on creation if a disk is in multiple
zones (with the labeling admission controller).
This also means that with the scheduling predicate in place, that many
of our volume problems "go away" in practice: you still can't create or
delete a volume when it is ambiguous, but thereafter the volume will be
labeled with the zone, that will match it only to nodes with the same
zone, and then we query for the volume in that zone when we
attach/detach it.
This removes a panic I mistakenly introduced when an instance is not
found, and also restores the exact prior behaviour for
getInstanceByName, where it returns cloudprovider.InstanceNotFound when
the instance is not found.
We adapt the existing code to work across all zones in a region.
We require a feature-flag to enable Ubernetes-Lite
Reasons:
* There are some behavioural changes if users create volumes with
the same name in two zones.
* We don't want to make one API call per zone if we're not running
Ubernetes-Lite.
* Ubernetes-Lite is still experimental.
There isn't a parallel flag implemented for AWS, because at the moment
there would be no behaviour changes from this.
them, not in the steady state once they've been created. This makes it
much less likely that users will run into static IP quota issues.
Also add slightly more parallelism to the deletion of load balancers
now that I realize the static IPs can be deleted in parallel with
forwarding rules :)
Previously we'd just tear everything down and recreate it, which makes
for a pretty bad experience because it causes downtime whenever the
service controller restarts and has to make sure everything is in the
desired state.
This adds more code than I'd prefer, but makes it much cleaner and more
organized than it was before, in my opinion. I didn't bother
parallelizing anything because it's complex enough as it is, right now.
It's consistently passing the existing e2es and worked when I tested
manually, but this could definitely use additional e2e tests and/or some
serious refactoring to make real unit tests feasible. I'll follow up
with one or two e2e tests that make sense (updating an LB or killing the
controller manager, perhaps).
This will cut down on the amount of time it takes to delete an external
load balancer, which should reduce the likelihood of resource leaks when
clusters are deleted.
This code was in rough shape, so I've fixed the issues with the original
PR as well as a few other changes:
1. Clarify the error messages related to the "gce Addresses" to make it
clear we're talking about static IP addresses
2. Fix the bug in the original PR, which was a nil pointer dereference
from passing op to waitForRegionOp when the address doesn't exist.
3. Rearrange the steps of EnsureTCPLoadBalancerDeleted to be the reverse
of EnsureCreated, which mostly just seems like good practice to me.
This is also supported by the following two bugs I found :(
4. Fix an independent bug of returning too early if the target pool
doesn't exist, effectively stranding the firewall. This was likely
introduced because target pools used to be the last thing deleted,
so it was previously safe to return there.
5. Fix an independent bug of not returning an error waiting for the
target pool to be deleted failed. This was very possibly causing
target pool leaks in our e2e tests. This was similarly due to
assuming that the target pool was the last thing deleted in the
function, then having the firewall deletion stuck in after it.
This code was in rough shape, so I've fixed the issues with the original
PR as well as a few other changes:
1. Clarify the error messages related to the "gce Addresses" to make it
clear we're talking about static IP addresses
2. Fix the bug in the original PR, which was a nil pointer dereference
from passing op to waitForRegionOp when the address doesn't exist.
3. Rearrange the steps of EnsureTCPLoadBalancerDeleted to be the reverse
of EnsureCreated, which mostly just seems like good practice to me.
This is also supported by the following two bugs I found :(
4. Fix an independent bug of returning too early if the target pool
doesn't exist, effectively stranding the firewall. This was likely
introduced because target pools used to be the last thing deleted,
so it was previously safe to return there.
5. Fix an independent bug of not returning an error waiting for the
target pool to be deleted failed. This was very possibly causing
target pool leaks in our e2e tests. This was similarly due to
assuming that the target pool was the last thing deleted in the
function, then having the firewall deletion stuck in after it.
A lot of packages use StringSet, but they don't use anything else from
the util package. Moving StringSet into another package will shrink
their dependency trees significantly.
Previously the servicecontroller would do the delete, but by having the cloudprovider
take that task on, we can later remove it from the servicecontroller, and the
cloudprovider can do something more efficient.