This is intended to make it easier to define the interaction between cluster selection and scheduling preferences in the sync controller when used for workload types.
Automatic merge from submit-queue (batch tested with PRs 42042, 46139, 46126, 46258, 46312)
[Federation] Use service accounts instead of the user's credentials when accessing joined clusters' API servers.
Fixes#41267.
Release notes:
```release-note
Modifies kubefed to create and the federation controller manager to use credentials associated with a service account rather than the user's credentials.
```
Automatic merge from submit-queue
[Federation][kubefed]: Move server image definition to cmd
This enables consumers like openshift to provide a different default without editing the kubefed init logic.
cc: @kubernetes/sig-federation-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 40234, 45885, 42975)
Fed target cluster by label for sync controller
[use clusterselector w/ federated configmap deploys](667dc77444)
**What this PR does / why we need it**: adds the ability to indicate objects are sent to subsets of federated clusters ref #29887
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes
**Special notes for your reviewer**:
**Release note**:
```release-note
```
fix test error formatting
updates from comments
update gofmt
simplify tests
add to new sync controller
add tests
remove configmap changes due to rebase
updates from review
refactor tests to be based on operations
improvements from review
updates from rebase
rebase to #45374
updates from review
refactor SendToCluster for tests
fix import order
rebase to upstream
Automatic merge from submit-queue (batch tested with PRs 42895, 45940)
[Federation] Automate configuring nameserver in cluster-dns for CoreDNS provider
Addresses issue #42894#42822
**Release note**:
```
[Federation] CoreDNS server will be automatically added to nameserver resolv.conf chain When using CoreDNS as dns provider for federation during federation join.
```
cc @madhusudancs @kubernetes/sig-federation-bugs
Automatic merge from submit-queue (batch tested with PRs 45247, 45810, 45034, 45898, 45899)
[Federation] Segregate DNS related code to separate controller
**What this PR does / why we need it**:
This is the continuation of service controller re-factor work as outlined in #41253
This PR segregates DNS related code from service controller to another controller `service-dns controller` which manages the DNS records on the configured DNS provider.
`service-dns controller` monitors the federated services for the ingress annotations and create/update/delete DNS records accordingly.
`service-dns controller` can be optionally disabled and DNS record management could be done by third party components by monitoring the ingress annotations on federated services. (This would enable something like federation middleware for CoreDNS where federation api server could be used as a backend to CoreDNS eliminating the need for etcd storage.)
**Special notes for your reviewer**:
**Release note**:
```
Federation: A new controller for managing DNS records is introduced which can be optionally disabled to enable third party components to manage DNS records for federated services.
```
cc @kubernetes/sig-federation-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 45374, 44537, 45739, 44474, 45888)
[Federation] Refactor sync controller's reconcile method for maintainability
This PR refactors the sync controllers reconcile method for maintainability with the goal of eliminating the need for type-specific controller unit tests. The unit test coverage for reconcile is not complete, but I think it's a good start.
cc: @kubernetes/sig-federation-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 45860, 45119, 44525, 45625, 44403)
coredns: support IPv6 record set
Added support for AAAA record for coredns and included unit test.
Refactored common test code to reduce duplication from added test and
existing tests.
Fixed function names in comments for Google and AWS tests to match
actual test name in this area.
**What this PR does / why we need it**:
Adding IPv6 support to kubernetes, once piece at a time. :)
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#44351
**Special notes for your reviewer**:
In addition to the change and unit test method, I did some minor refactoring (since the UT was a near clone of an existing test). Fixed typos in related test methods' comment lines. Please let me know if this is OK (I was thinking it was a small change, but don't know the protocol here), or if I need to break it into multiple commits.
**Release note**:
```NONE
```
Automatic merge from submit-queue (batch tested with PRs 45860, 45119, 44525, 45625, 44403)
[Federation] Move annotations and related parsing code as common code
This PR moves some code, which was duplicate, around as common code.
Changes the names of structures used for annotations to common names.
s/FederatedReplicaSetPreferences/ReplicaAllocationPreferences/
s/ClusterReplicaSetPreferences/PerClusterPreferences/
This can be reused in job controller and hpa controller code.
**Special notes for your reviewer**:
@kubernetes/sig-federation-misc
**Release note**:
```NONE
```
Added support for AAAA record for coredns and included unit test.
Fixed function names in comments for Google and AWS tests to match
actual test name in this area.
Automatic merge from submit-queue (batch tested with PRs 44626, 45641)
Update Google Cloud DNS provider Rrset.Get(name) method to return a list and change the `Rrset.List()` implementation to perform a paged walk
Some federated service e2e tests and a few ingress tests would become flaky after a few hundred runs. @csbell spent quite a lot of time debugging this and found out that this flakiness was due to a bug in the federated service controller deletion logic. Deletion of a federated service object triggers a logic in the controller to update the DNS records corresponding to that object. This DNS record update logic would return an error in failed runs which would in-turn cause the controller to reschedule the operation. This led to an infinite retry-failure cycle that never gave the API server a chance to garbage collect the deleted service object.
A couple of days ago we started seeing a correlation between the number of resource records in a DNS managed zone and these test failures. If you look at the test runs before and after run 2900 in the test grid - https://k8s-testgrid.appspot.com/cluster-federation#gce, you will notice that the grid became super green at 2900. That's when I deleted all the dangling DNS records from the past runs.
After some investigation yesterday, we found that `ResourceRecordSet.Get()` interface and its implementation, and `ResourceRecordSet.List()` implementation at least for Google Cloud DNS were incorrect.
This PR makes minimal set of changes (read: least invasive) in Google Cloud DNS provider implementation to fix these problems:
1. Modifies DNS provider Rrset.Get(name) interface to return multiple records and updates federated service controller.
There can be multiple DNS resource records for a given name. They can vary by type, ttl, rrdata and a number of various other parameters. It is incorrect to return a single resource record for a given name.
This change updates the Get interface to return multiple records for a given name and uses this list in the federated service controller to perform DNS operations.
2. Update Google Cloud DNS List implementation to perform a paged walk of lists to aggregate all the DNS records.
The current `List()` implementation just lists the DNS resorce records in a given managed zone once and retruns the list. It neither performs a paged walk nor does it consider the `page_token` in the returned response.
This change walks all the pages and aggregates the records in the pages and returns the aggregated list. This is potentially dangerous as it can blow up memory if there are a huge number of records in the given managed zone. But this is the best we can do without changing the provider interface too much.
Next step is to define a new paged list interface and implement it.
**Release note**:
```release-note
NONE
```
/assign @csbell
cc @justinsb @shashidharatd @quinton-hoole @kubernetes/sig-federation-pr-reviews
When we fetch the dns records by name, we get a list of records that match
the given name. As an optimization we look up to see if the new record we
want to create is already in the returned list to avoid performing any updates.
However, when the new record we want to create isn't in the returned list, it
is hard to say if the returned list contains the list of records that we want
to retain. For example, we might get a list of A records and we want to create
a CNAME record. Creating a new CNAME record without removing the A records is
a DNS misconfiguration. So to play safe we just remove all the existing records
in the list and create the new desired record.
**Note**: This is the opposite of what I said here - https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/44626#-Ki9xQOzybryHvsxNrra.
Automatic merge from submit-queue (batch tested with PRs 45556, 45561, 45256)
[Federation] Replace the indexing lister with a regular store in the replicaset controller
This is part of the refactoring work to allow the replicaset controller to use the generic sync controller.
None of the other controllers use a lister, including the deployment controller
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 45382, 45384, 44781, 45333, 45543)
[Federation] Provide updater timeout to instance rather than to Update()
This PR changes the federated updater to receive its timeout at construction rather than on every call to Update(). This provides a slight decrease in coupling by removing the need for the deletion handler to be provided the timeout along with the updater.
cc: @kubernetes/sig-federation-pr-reviews @perotinus
Automatic merge from submit-queue
[Federation] Improve the logging and user feedback in 'kubefed init'
This is a follow-up to #41849, which added some status information. This PR is based off of that one, and includes its changes as well.
See #41725.
```release-note
None
```
The current `List()` implementation just lists the DNS resorce records in
a given managed zone once and retruns the list. It neither performs a paged
walk nor does it consider the `page_token` in the returned response.
This change walks all the pages and aggregates the records in the pages
and returns the aggregated list. This is potentially dangerous as it can
blow up memory if there are a huge number of records in the given
managed zone. But this is the best we can do without changing the
provider interface too much. Next step is to define a new paged list
interface and implement it.
There can be multiple DNS resource records for a given name. They can
vary by type, ttl, rrdata and a number of various other parameters. It
is incorrect to return a single resource record for a given name.
This change updates the Get interface to return multiple records for a given
name and uses this list in the federated service controller to perform
DNS operations.
Automatic merge from submit-queue
Keep UserAgentName style consistent
Keep using UserAgentName for controllers and add some logs for debugging
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 40544, 44338, 45225)
[Federation]Update event type
Use EventTypeWarning instead of EventTypeNormal when get failure
Left behind after pr #40296
Automatic merge from submit-queue (batch tested with PRs 45100, 45152, 42513, 44796, 45222)
[Federation] Generate the secret name in kubefed join.
Addresses part of #42324. A follow-up PR will address annotating Federation resources.
```release-note
Remove the `--secret-name` flag from `kubefed join`, instead generating the secret name arbitrarily.
```
Automatic merge from submit-queue
Remove GetClientsetForCluster()
The newClusterClientset() has insteaded of GetClientsetForCluster(),
and GetClientsetForCluster() run wrong. Let's remove it.
Automatic merge from submit-queue
Ignore IsNotFound error
IsNotFound error is fine since that means the object is
deleted already, so we should check err and ignore err
before returning.
Automatic merge from submit-queue (batch tested with PRs 42740, 44980, 45039, 41627, 45044)
[Federation] Convert Daemonset to use the generic sync controller
To be rebased on master when @perotinus's configmaps PR merges.
Tested integration and e2e.
Automatic merge from submit-queue (batch tested with PRs 44942, 41258)
[Federation] Use federated informer for service controller and annotations to store lb ingress
**What this PR does / why we need it**:
This is breaking up of the PR #40296 into smaller one. please refer to #41253
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Handles 2 tasks in #41253
Fixes issues in #27623, #35827
**Special notes for your reviewer**:
**Release note**:
```
NONE
```
cc @quinton-hoole @nikhiljindal @kubernetes/sig-federation-pr-reviews
Automatic merge from submit-queue
Fix ensureDnsRecords comments for federated services
I went to look at the source comments, because the documentation is not exhaustive about what kind of DNS records are created for federated services (and http://blog.kubernetes.io/2016/07/cross-cluster-services.html is wrong...).
It turns out that even the comment is not in sync with the code: two out of three records listed use `.federation`, while the author probably meant `.mydomain.com` (which has less chance of getting mixed up with `myfed`). I fixed those, as well as a few spelling and parenthesis errors. Hopefully this will help others save time and not scratch their heads.
cc @quinton-hoole
Automatic merge from submit-queue (batch tested with PRs 44645, 44639, 43510)
[Federation][kubefed]: Set apiserver to bind securely to 8443 instead of 443
On platforms like OpenShift that don't run containers as root by default, binding to ports < 1000 is not permitted. Having the apiserver bind to a high port means it can run with reduced privileges. The service will still expose the apiserver on 443, so this change shouldn't impact clients of the federation api.
cc: @kubernetes/sig-federation-pr-reviews @perotinus
On platforms like OpenShift that don't run containers as root by
default, binding to ports < 1000 is not permitted. Having the
apiserver bind to a high port means it can run with reduced
privileges. The service will still expose the apiserver on 443, so
this change shouldn't impact clients of the federation api.
Automatic merge from submit-queue
[Federation] Print out status updates while `kubefed init` is running
This is not an ideal final state–it does not address the appearance of hanging during long-running commands, for example–but it provides some level of information when the operations are successful.
See #41725.
**Release note**:
```release-note
Prints out status updates when running `kubefed init`
```
Automatic merge from submit-queue (batch tested with PRs 44469, 44566, 44467, 44526)
[Federation]Fix panic: index out of range
When the number of clusterStatusNew's Conditions is different from
the number of clusterStatusOld's Conditions, clustercontroller
will panic. We should check it before comparing.
When the number of clusterStatusNew's Conditions is different from
the number of clusterStatusOld's Conditions, clustercontroller
will panic. We should check it before comparing.
Automatic merge from submit-queue
dnsprovider: Avoid panic if fields are nil
The aws-sdk has some helper functions which should generally be used
whenever dereferencing an AWS provided pointer, in case the pointer is
nil, which would otherwise be a panic.
Issue https://github.com/kubernetes/kops/issues/2347
```release-note
dnsprovider: avoid panic if route53 fields are nil
```
The aws-sdk has some helper functions which should generally be used
whenever dereferencing an AWS provided pointer, in case the pointer is
nil, which would otherwise be a panic.
Issue https://github.com/kubernetes/kops/issues/2347
I went to look at the source comments, because the documentation is not exhaustive about what kind of DNS records are created for federated services (and http://blog.kubernetes.io/2016/07/cross-cluster-services.html is wrong...).
It turns out that even the comment is not in sync with the code: two out of three records listed use `.federation`, while the author probably meant `.mydomain.com` (which has less chance of getting mixed up with `myfed`). I fixed those, as well as a few spelling and parenthesis errors. Hopefully this will help others save time and not scratch their heads.
Automatic merge from submit-queue
[Federation][kubefed] Annotate all Federation API objects with the federation name and (if applicable) the cluster name.
Address part of #42324.
```release-note
Adds annotations to all Federation objects created by kubefed.
```
This change uses an adapter class to abstracts the interaction of the
secret controller with the secret api type. This is the first step to
creating a generic controller that can target any type for which an
adapter exists.
Automatic merge from submit-queue
[Federation] Add integration test for secrets
This PR adds an integration test for secrets that:
- performs create/read/update/delete on federation resources and validates that the changes are propagated to member clusters.
- uses an abstraction layer (fixture and adapter) to minimize the code required to support each federated type
- It should be possible to replace a test-specific adapter with a runtime adapter in the future (as per #41050)
- reuses fixture (federation api and clusters) across different resource types to minimize setup overhead
- on a fast machine, setup takes ~4s, and validating each type takes ~2s
- uses the [Subtest feature added in Go 1.7](https://blog.golang.org/subtests) to allow the test for a specific controller to be run in isolation
- ``make test-integration WHAT="federation -test.run=TestFederationCRUD/secret"``
Once this PR merges the test can be extended to target other federated types.
This PR targets #40705
cc: @kubernetes/sig-federation-pr-reviews @derekwaynecarr
Automatic merge from submit-queue
[Federation] Use cascading deletion when deleting resources in underlying clusters
The Federation control plane issues a delete command unless it wants to orphan the underlying per-cluster resource. When issuing that command, always set the orphanDependents to false.
/release-note-none
/sig-federation
Automatic merge from submit-queue (batch tested with PRs 44084, 42964)
Removing both finalizers in federation controllers in a single update
Fixes https://github.com/kubernetes/kubernetes/issues/43828
There is a bug right now where the controller fails to delete the object if one finalizer is removed and the second isnt.
This updates the code so that both the finalizers are removed in a single API call. Kept the code changes minimum to enable cherrypick in 1.6.x
cc @csbell @kubernetes/sig-federation-bugs
The Federation control plane issues a delete command unless it wants to orphan the underlying per-cluster resource. When issuing that command, always set the orphanDependents to false.
Automatic merge from submit-queue (batch tested with PRs 41297, 42638, 42666, 43039, 42567)
Delete offline restclient from clusterKubeClientMap
When federation controller manager checks cluster status, it will
delete the offline cluster from clusterSet, but do not delete the
corresponding restclient from the map clusterKubeClientMap for
the offline cluster. This patch can fix it.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue
[Federation] Remove TODOs that are already implemented or are irrelevant now.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Update the description to match function
The description of NewReplicaSetController() does not match
its function, and the description of NewDeploymentController()
does not match its function. Let's update their descriptions.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue
[Federation] Fix deletion logic in service controller
This is a regression from 1.5 exposed by cascading deletions. In order to apply updates, the service controller locks access to a cached service and spawns go routines without waiting for them. When updates and deletions arrive in quick succession, previous goroutines remain active and race with the deletion logic. Coupled with this, the service_helper was not re-evaluating the value of the DeletionTimestamp.
Without this patch, federation will sometimes leak resources at destruction time about half the time.
In e2e land, about 4-5 test runs cause service tests to eat up all global fwd-ing rules and in turn, every subsequent ingress test will fail until we manually clean up leaked resources. No possibility to go green in fed e2e until this is merged.
This is a regression from 1.5 exposed by cascading deltions. In order to apply updates, the service controller locks access to a cached service and spawns go routines without waiting for them. When updates and deletions arrive in quick succession, previous goroutines remain active and race with the deletion logic. Coupled with this, the service_helper was not re-evaluating the value of the DeletionTimestamp.
Without this patch, federation will sometimes leak resources at destruction time.
Automatic merge from submit-queue
Fix federated config map unit tests
Fixes#41419 and #42847 and possibly other issues in this area.
cc: @nikhiljindal @csbell @perotinus
When federation controller manager checks cluster status, it will
delete the offline cluster from clusterSet, but do not delete the
corresponding restclient from the map clusterKubeClientMap for
the offline cluster. This patch can fix it.
The unit test for the ingress controller was previously adding
a cluster twice, which resulted in a cluster being deleted and added
back. The deletion was racing the controller shutdown to close
informer channels. This change ensures that the informer clears its
map of informers when Stop() is called to prevent a double close, and
that the test no longer adds the cluster twice.
Automatic merge from submit-queue (batch tested with PRs 42642, 42899, 42922)
[Federation] Deployments unaware of ReadyReplicas
The Deployment controller was not propagating ReadyReplicas to underlying clusters causing these errors:
```
Error syncing cluster controller: Deployment.apps "federation-deployment" is invalid: status.availableReplicas: Invalid value: 5: cannot be greater than readyReplicas
```
This was caught in e2e testing and is a 1.6 regression for support that was added in #37959. Without this fix, users will be unable to scale up their deployments.
Automatic merge from submit-queue
[Federation] Kubefed Init should use the right RBAC API version clientset
**What this PR does / why we need it**:
Implements the need as described in https://github.com/kubernetes/kubernetes/issues/41263
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
https://github.com/kubernetes/kubernetes/issues/41263
**Special notes for your reviewer**:
@madhusudancs @shashidharatd @marun
cc @kubernetes/sig-federation-bugs
**Release note**:
```
NONE
```
The Deployment controller was not propagating ReadyReplicas to underlying clusters causing these errors:
```
Error syncing cluster controller: Deployment.apps "federation-deployment" is invalid: status.availableReplicas: Invalid value: 5: cannot be greater than readyReplicas
```
This was caught in e2e testing and is a 1.6 regression for support that was added in #37959. Without this fix, users will be unable to scale up their deployments.
The description of NewReplicaSetController() does not match
its function, and the description of NewDeploymentController()
does not match its function. Let's update their descriptions.
Automatic merge from submit-queue
Add ProviderUid support to Federated Ingress
This PR (along with GLBC support [here](https://github.com/kubernetes/ingress/pull/278)) is a proposed fix for #39989. The Ingress controller uses a configMap reconciliation process to ensure that all underlying ingresses agree on a unique UID. This works for all of GLBC's resources except firewalls which need their own cluster-unique UID. This PR introduces a ProviderUid which is maintained and synchronized cross-cluster much like the UID. We chose to derive the ProviderUid from the cluster name (via md5 hash).
Testing here is augmented to guarantee that configMaps are adequately propagated prior to Ingress creation.
```release-note
Federated Ingress over GCE no longer requires separate firewall rules to be created for each cluster to circumvent flapping firewall health checks.
```
cc @madhusudancs @quinton-hoole