GCE disks don't have tags, we must encode the tags into Description field.
It's encoded as JSON, which is both human and machine readable:
description: '{"kubernetes.io/created-for/pv/name":"pv-gce-oxwts","kubernetes.io/created-for/pvc/name":"myclaim","kubernetes.io/created-for/pvc/namespace":"default"}'
We don't cope well if a PD is in multiple zones, but this is actually
fairly easy to detect. This is probably justified purely on the basis
that we never want to delete the wrong volume (DeleteDisk), but also
because this means that we now warn on creation if a disk is in multiple
zones (with the labeling admission controller).
This also means that with the scheduling predicate in place, that many
of our volume problems "go away" in practice: you still can't create or
delete a volume when it is ambiguous, but thereafter the volume will be
labeled with the zone, that will match it only to nodes with the same
zone, and then we query for the volume in that zone when we
attach/detach it.
This removes a panic I mistakenly introduced when an instance is not
found, and also restores the exact prior behaviour for
getInstanceByName, where it returns cloudprovider.InstanceNotFound when
the instance is not found.
We adapt the existing code to work across all zones in a region.
We require a feature-flag to enable Ubernetes-Lite
Reasons:
* There are some behavioural changes if users create volumes with
the same name in two zones.
* We don't want to make one API call per zone if we're not running
Ubernetes-Lite.
* Ubernetes-Lite is still experimental.
There isn't a parallel flag implemented for AWS, because at the moment
there would be no behaviour changes from this.
them, not in the steady state once they've been created. This makes it
much less likely that users will run into static IP quota issues.
Also add slightly more parallelism to the deletion of load balancers
now that I realize the static IPs can be deleted in parallel with
forwarding rules :)
Previously we'd just tear everything down and recreate it, which makes
for a pretty bad experience because it causes downtime whenever the
service controller restarts and has to make sure everything is in the
desired state.
This adds more code than I'd prefer, but makes it much cleaner and more
organized than it was before, in my opinion. I didn't bother
parallelizing anything because it's complex enough as it is, right now.
It's consistently passing the existing e2es and worked when I tested
manually, but this could definitely use additional e2e tests and/or some
serious refactoring to make real unit tests feasible. I'll follow up
with one or two e2e tests that make sense (updating an LB or killing the
controller manager, perhaps).
This will cut down on the amount of time it takes to delete an external
load balancer, which should reduce the likelihood of resource leaks when
clusters are deleted.
This code was in rough shape, so I've fixed the issues with the original
PR as well as a few other changes:
1. Clarify the error messages related to the "gce Addresses" to make it
clear we're talking about static IP addresses
2. Fix the bug in the original PR, which was a nil pointer dereference
from passing op to waitForRegionOp when the address doesn't exist.
3. Rearrange the steps of EnsureTCPLoadBalancerDeleted to be the reverse
of EnsureCreated, which mostly just seems like good practice to me.
This is also supported by the following two bugs I found :(
4. Fix an independent bug of returning too early if the target pool
doesn't exist, effectively stranding the firewall. This was likely
introduced because target pools used to be the last thing deleted,
so it was previously safe to return there.
5. Fix an independent bug of not returning an error waiting for the
target pool to be deleted failed. This was very possibly causing
target pool leaks in our e2e tests. This was similarly due to
assuming that the target pool was the last thing deleted in the
function, then having the firewall deletion stuck in after it.
This code was in rough shape, so I've fixed the issues with the original
PR as well as a few other changes:
1. Clarify the error messages related to the "gce Addresses" to make it
clear we're talking about static IP addresses
2. Fix the bug in the original PR, which was a nil pointer dereference
from passing op to waitForRegionOp when the address doesn't exist.
3. Rearrange the steps of EnsureTCPLoadBalancerDeleted to be the reverse
of EnsureCreated, which mostly just seems like good practice to me.
This is also supported by the following two bugs I found :(
4. Fix an independent bug of returning too early if the target pool
doesn't exist, effectively stranding the firewall. This was likely
introduced because target pools used to be the last thing deleted,
so it was previously safe to return there.
5. Fix an independent bug of not returning an error waiting for the
target pool to be deleted failed. This was very possibly causing
target pool leaks in our e2e tests. This was similarly due to
assuming that the target pool was the last thing deleted in the
function, then having the firewall deletion stuck in after it.
A lot of packages use StringSet, but they don't use anything else from
the util package. Moving StringSet into another package will shrink
their dependency trees significantly.
Previously the servicecontroller would do the delete, but by having the cloudprovider
take that task on, we can later remove it from the servicecontroller, and the
cloudprovider can do something more efficient.