Update EnsureLoadBalancer/UpdateLoadBalancer API to use node objects.
In particular, this allows us to take the node address directly from the
node.Status.Addresses and avoids a name -> instance lookup.
Many providers need to do some sort of node name -> IP or instanceID
lookup before they can use the list of hostnames passed to
EnsureLoadBalancer/UpdateLoadBalancer.
This change just passes the full Node object instead of simply the node
name, allowing providers to use the node's provider ID and cached
addresses without additional lookups. Using `node.Name` reproduces the
old behaviour.
Automatic merge from submit-queue
Collect logs for dead kubelets too
Collect logs via journalctl if journalctl is installed, rather than only if
kubelet.service is running. The old way resulted in us losing logs any
time the kubelet was failing. This, of course, breaks on a node if
someone decided to install journalctl but not use it. But that is not
the case on any of the images used by cluster-level tests at present.
^^^^FYI @Random-Liu not sure if `which journalctl` implies that journalctl is actually used on all of the nodes we test in the node-e2e suites. This may be of consequence if we move to using `cluster/log-dump.sh` to scrape logs for node-e2e.
P0 because this is somewhat in the way of debugging https://github.com/kubernetes/kubernetes/issues/33882
@jessfraz @saad-ali This should be cherry-picked to 1.4 and 1.5 as well.
Automatic merge from submit-queue
When --grace-period=0 is provided, wait for deletion
The grace-period is automatically set to 1 unless --force is provided, and the client waits until the object is deleted.
This preserves backwards compatibility with 1.4 and earlier. It does not handle scenarios where the object is deleted and a new object is created with the same name because we don't have the initial object loaded (and that's a larger change for 1.5).
Fixes#37117 by relaxing the guarantees provided.
```release-note
When deleting an object with `--grace-period=0`, the client will begin a graceful deletion and wait until the resource is fully deleted. To force deletion, use the `--force` flag.
```
Automatic merge from submit-queue
log-dump: Change USE_KUBECTL path to instead call out to a custom function
**What this PR does / why we need it**: The LOG_DUMP_USE_KUBECTL path is fine, once the cluster is up. However, we've had a continuous low-grade Up flake in the kops builds, so I'd like to grab logs using the aws CLI.
This makes log-dump.sh extensible, so you can do:
```
function log-dump-custom-get-instances() { ... }
export -f log-dump-custom-get-instances
go run hack/e2e.go ...
```
Automatic merge from submit-queue
Fixes Addon Manager's pruning issue for old Deployments
Fixes#37641.
Attaches the `last-applied`annotations to the existing Deployments for pruning.
Below images are built and pushed:
- gcr.io/google-containers/kube-addon-manager:v6.1
- gcr.io/google-containers/kube-addon-manager-amd64:v6.1
- gcr.io/google-containers/kube-addon-manager-arm:v6.1
- gcr.io/google-containers/kube-addon-manager-arm64:v6.1
- gcr.io/google-containers/kube-addon-manager-ppc64le:v6.1
@mikedanese
cc @saad-ali @krousey
Automatic merge from submit-queue
Fix infinte loop in federated ingress controller
Previously ingress controller was constantly scheduling reconcilation, even if no updates were needed. That behavior creates a big mess in the logs and consumes resources.
This pr also fixes the stop function for federated ingress controller.
cc: @nikhiljindal @madhusudancs
Automatic merge from submit-queue
Revision handling in federated deployment controller
Deployment controller in regular kubernetes automatically adds an annotation in deployment. This causes a bit of confusion in controller and tests. This PR skips revision annotation in checks. In the next K8S release we will need to have better support for deployment revisions.
Helps with #36588
cc: @nikhiljindal @madhusudancs
Automatic merge from submit-queue
fix typo in `kubectl proxy` command line help
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**: improves docs
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: none
**Special notes for your reviewer**: doc only
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
```
(docs only) fixed port from 8011 to 8001 (the default) because in that particular line no specific port is specified and thus the default is going to be used.
Collect logs via journalctl if journalctl is installed, rather than only if
kubelet.service is running. The old way resulted in us losing logs any
time the kubelet was failing. This, of course, breaks on a node if
someone decided to install journalctl but not use it. But that is not
the case on any of the images used by cluster-level tests at present.
Automatic merge from submit-queue
Stop deleting underlying services when federation service is deleted
Fixes https://github.com/kubernetes/kubernetes/issues/36799
Fixing federation service controller to not delete services from underlying clusters when federated service is deleted.
None of the federation controller should do this unless explicitly asked by the user using DeleteOptions. This is the only federation controller that does that.
cc @kubernetes/sig-cluster-federation @madhusudancs
```release-note
federation service controller: stop deleting services from underlying clusters when federated service is deleted.
```
Automatic merge from submit-queue
hold namespaces briefly before processing deletion
possible fix for #36891
in HA scenarios (either HA apiserver or HA etcd), it is possible for deletion of resources from namespace cleanup to race with creation of objects in the terminating namespace
HA master timeline:
1. "delete namespace n" API call goes to apiserver 1, deletion timestamp is set in etcd
2. namespace controller observes namespace deletion, starts cleaning up resources, lists deployments
3. "create deployment d" API call goes to apiserver 2, gets persisted to etcd
4. apiserver 2 observes namespace deletion, stops allowing new objects to be created
5. namespace controller finishes deleting listed deployments, deletes namespace
HA etcd timeline:
1. "create deployment d" API call goes to apiserver, gets persisted to etcd
2. "delete namespace n" API call goes to apiserver, deletion timestamp is set in etcd
3. namespace controller observes namespace deletion, starts cleaning up resources, lists deployments
4. list call goes to non-leader etcd member that hasn't observed the new deployment or the deleted namespace yet
5. namespace controller finishes deleting the listed deployments, deletes namespace
In both cases, simply waiting to clean up the namespace (either for etcd members to observe objects created at the last second in the namespace, or for other apiservers to observe the namespace move to terminating phase and disallow additional creations) resolves the issue
Possible other fixes:
* do a second sweep of objects before deleting the namespace
* have the namespace controller check for and clean up objects in namespaces that no longer exist
* ...?
Automatic merge from submit-queue
make drain retry forever and use a new graceful period
Implemented the 1st approach according to https://github.com/kubernetes/kubernetes/issues/37460#issuecomment-263437516
1) Make drain retry forever if the error is always Too Many Requests (429) generated by Pod Disruption Budget.
2) Use a new graceful period per #37460
3) Update the message printed out when successfully deleting or evicting a pod.
fixes#37460
cc: @davidopp @erictune
Automatic merge from submit-queue
Skip rather than fail networking tests on single node
**What this PR does / why we need it**:
Needed for the general e2e tidying we need to do for flakey slow tests, imo pre 1.5, see #31402 and so on.
**Which issue this PR fixes** *
Dont fail multinode tests if on a single node cluster, skip instead.
Automatic merge from submit-queue
kubemark: add KUBEMARK_NUM_NODES and KUBEMARK_MASTER_SIZE config
A lot of test infra scripts are using these two parameters and repeatedly set NUM_NODES and MASTER_SIZE before running kubemark. When we try to use those scripts, we need to manually set these again and again.
It would come handy if kubemark config could take these into account and reduce duplication.
Automatic merge from submit-queue
Set Dashboard UI version to v1.5.0-beta1
There will be one more such PR coming for 1.5 release. In one week.
Setting release note to none. Will set notes for final version PR.
Github release info:
https://github.com/kubernetes/dashboard/releases/tag/v1.5.0-beta1
Automatic merge from submit-queue
Fix nil pointer dereference in test framework
Checking the `result.Code` prior to `err` in the if statement causes a panic if result is `nil`. It turns out the formatting of the error is already in `IssueSSHCommandWithResult`, so removing redundant code is enough to fix the issue. Logging the SSH result was also redundant, so I removed that as well.
Automatic merge from submit-queue
Node Conformance Test: Final cleanup for node conformance test.
This PR fits node conformance test with recent change.
* Remove `--manifest-path` because the test will get kubelet configuration through `/configz` now. https://github.com/kubernetes/kubernetes/pull/36919
* Add `$TEST_ARGS` so that we can override arguments inside the container.
* Fix a bug in garbage_collector_test.go which will cause the framework tries to connect docker no matter running the test or not. @dashpole
* Add `${REGISTRY}/node-test:${VERSION}` for convenience.
* Bump up the image version to `0.2`. (the one released with v1.4 is `v0.1`)
I've run the test both with `run_test.sh` script and directly `docker run`. Both of them passed.
After this gets merged, I'll build and push the new test image.
@dchen1107
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Per-container inode accounting test
Test spins up two pods: one pod uses all inodes, other pod acts normally. Test ensures correct pressure is encountered, inode-hog-pod is evicted, and the pod acting normally is not evicted. Test ensures conditions return to normal after the test.
Automatic merge from submit-queue
Deploy a default StorageClass instance on AWS and GCE
This needs a newer kubectl in kube-addons-manager container. It's quite tricky to test as I cannot push new container image to gcr.io and I must copy the newer container manually.
cc @kubernetes/sig-storage
**Release note**:
```release-note
Kubernetes now installs a default StorageClass object when deployed on AWS, GCE and
OpenStack with kube-up.sh scripts. This StorageClass will automatically provision
a PeristentVolume in corresponding cloud for a PersistentVolumeClaim that cannot be
satisfied by any existing matching PersistentVolume in Kubernetes.
To override this default provisioning, administrators must manually delete this default StorageClass.
```
Automatic merge from submit-queue
docs: devel: Add some notes about OWNERS process
docs: devel: point people at place for OWNERS status
All of the tracking is happening here
https://github.com/kubernetes/contrib/issues/1389 point people at it.
docs: devel: describe the current state of adding approvers
Document that we are currently holding off on adding new approvers until
the reviewers process is in place. And set a target deadline.
cc @calebamiles @bgrant0607 @apelisse
The LOG_DUMP_USE_KUBECTL path is fine, once the cluster is up.
However, we've had a continuous low-grade Up flake in the kops builds,
so I'd like to grab logs using the aws CLI.
This makes log-dump.sh extensible, so you can do:
function log_dump_custom_get_instances() { ... }
export -f log_dump_custom_get_instances
go run hack/e2e.go ...
Automatic merge from submit-queue
Removing references to the Google CLA & adding FAQ link
Previously reviewed at: https://github.com/kubernetes/kubernetes/pull/37028
Closed the old one as it was a branch on the main repo making it difficult to squash.
Automatic merge from submit-queue
Use gsed on the mac.
**What this PR does / why we need it**: Fixes node upgrades when run from a mac
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#37474
**Special notes for your reviewer**:
Automatic merge from submit-queue
Fixes dns autoscaling test flakes
Fixes #36457 and fixes#36569.
#36457 is flake due to the 10 minutes timeout for scaling down cluster. Changes to use `scaleDownTimeout` from [test/e2e/cluster_size_autoscaling.go](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/cluster_size_autoscaling.go), which is 15 minutes.
The failure in #36569 is because we get the schedulable nodes number at the beginning of the test and assume it will not change unless we manually change the cluster size. But below logs indicate there may be nodes become ready after the test has begun.
```
[BeforeEach] [k8s.io] DNS horizontal autoscaling
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/dns_autoscaling.go:71
Nov 10 00:36:26.951: INFO: Condition Ready of node jenkins-e2e-minion-group-x6w1 is false instead of true. Reason: KubeletNotReady, message: Kubenet does not have netConfig. This is most likely due to lack of PodCIDR
STEP: Replace the dns autoscaling parameters with testing parameters
Nov 10 00:36:26.961: INFO: DNS autoscaling ConfigMap updated.
STEP: Wait for kube-dns scaled to expected number
Nov 10 00:36:26.961: INFO: Waiting up to 5m0s for kube-dns reach 8 replicas
...
Expected error:
<*errors.errorString | 0xc420b17ef0>: {
s: "err waiting for DNS replicas to satisfy 8, got 9: timed out waiting for the condition",
}
err waiting for DNS replicas to satisfy 8, got 9: timed out waiting for the condition
not to have occurred
```
This fix puts the logic of counting schedulable nodes into the polling loop. By doing so, the test will have the correct expected replicas count even if schedulable nodes change in between.
@bowei @bprashanth
---
Updates: all `ExpectNoError(err)` are changed to `Expect(err).NotTo(HaveOccurred())`
Automatic merge from submit-queue
Bumps up Addon Manager to v6.0 with full support of kubectl apply
Below images are built and pushed:
- gcr.io/google-containers/kube-addon-manager:v6.0
- gcr.io/google-containers/kube-addon-manager-amd64:v6.0
- gcr.io/google-containers/kube-addon-manager-arm:v6.0
- gcr.io/google-containers/kube-addon-manager-arm64:v6.0
- gcr.io/google-containers/kube-addon-manager-ppc64le:v6.0
The actual change made is upgrade kubectl version from `v1.5.0-alpha.1` to `v1.5.0-beta.1`, which is released today.
@mikedanese
@saad-ali This need to get into 1.5 because Addon Manager v6.0-alpha.1 (currently in used) does not have full support of `kubectl apply --prune`.
Automatic merge from submit-queue
Better waiting for watch event delivery in cacher
@lavalamp - I think we should do something simple for now (and merge for 1.5), and do something a bit more sophisticated right after 1.5, WDYT?
Automatic merge from submit-queue
Fix concurrent read/write to map error in kubelet
Fix#37560.
The concurrent read/write is to the pod annotations. The call in apiserver.go reads the annotations, and the config.go writes the annotations. I moved the reads to config.go to avoid the race.
Automatic merge from submit-queue
add wrapper to provide stderr on command errors
The go standard library doesn't include stderr in the error message, but in many cases it is present: https://golang.org/src/os/exec/exec.go#L389 . This adds a wrapper to display that information. I've added in it on spot where the kops test is having trouble. If it works well, we can add it elsewhere.
@wojtek-t ptal
Automatic merge from submit-queue
Fixing the logic to select first cluster in federated ingress controller
Ref https://github.com/kubernetes/kubernetes/issues/36074.
Before this change, ingress controller was using cluster with clusterIndex = 0 as the first cluster to create the ingress in.
But the ordering of clusters can change and hence ingress controller ended up creating the ingress in multiple clusters.
This PR fixes it by using an annotation on federated ingress. Controller now picks up a cluster randomly as the first cluster and creates ingress there. This clusters name is stored as an annotation on the federated ingress. Contoller does not create an ingress in any other cluster if this annotation is set on the federated ingress and IP has not been propagated. Once IP has been propagated, controller creates the ingress in all clusters.
cc @kubernetes/sig-cluster-federation @madhusudancs