Automatic merge from submit-queue (batch tested with PRs 42877, 42853)
discriminate more when parsing kube-env :(
Exactly match the key. Right now CA_KEY matches ETCD_CA_KEY and we just pick the first because fml.
I HATE BASH
more fixes for kubelet rbac enablement upgrades.
Automatic merge from submit-queue (batch tested with PRs 42877, 42853)
Remove unused functions and make logs slightly better
Zero risk cleanup, removing function that are not used anymore, and adding few more logs to help debugging problems.
cc @aveshagarwal
Automatic merge from submit-queue (batch tested with PRs 36704, 42719)
Extend timeouts in taints test to account for slow Pod deletions
Fix#42685
Before merging this we need a consensus on what to do with slow Pod deletions.
Automatic merge from submit-queue
Use Prometheus instrumentation conventions
The `System` and `Subsystem` parameters are subject to removal.
(x-ref: https://github.com/prometheus/client_golang/issues/240)
All metrics should use base units, which is seconds in the duration
case.
Counters should always end in `_total` and metrics should avoid
referring to potential label dimensions. Those should rather be
mentioned in the documentation string.
@kubernetes/sig-instrumentation
Reference docs:
https://prometheus.io/docs/practices/instrumentation/https://prometheus.io/docs/practices/naming/
**Release note**:
```
Breaking change: Renamed REST client Prometheus metrics to follow the instrumentation conventions ("request_latency_microseconds" -> "rest_client_request_latency_seconds", "request_status_codes" -> "rest_client_requests_total"). Please update your alerting pipeline if you rely on them.
```
Automatic merge from submit-queue
e2e test: Log container output on TestContainerOutput error
When a pod started with TestContainerOutput or TestContainerOutputRegexp
fails from unknown reason, we should log all output of all its containers
so we can analyze what went wrong.
This would help us to see what wrong in https://github.com/kubernetes/kubernetes/issues/40811 - a container is running there for 3 minutes and dies and we want to see what it did for these 3 minutes.
```release-note
NONE
```
Automatic merge from submit-queue
Add new DaemonSetStatus to kubectl printer and describer
@kargakis @lukaszo @kubernetes/sig-apps-pr-reviews @kubernetes/sig-cli-pr-reviews
```release-note
Add new DaemonSet status fields to kubectl printer and describer.
```
When a pod started with TestContainerOutput or TestContainerOutputRegexp
fails from unknown reason, we should log all output of all its containers
so we can analyze what went wrong.
Automatic merge from submit-queue
[Federation] Prevent trailing periods in kube-dns federations domains
kubefed-level fix to catch cases where FEDERATIONS_DOMAIN_MAP is not set in the environment (i.e. CI).
Addresses https://github.com/kubernetes/kubernetes/issues/42809
The `System` and `Subsystem` parameters are subject to removal.
(x-ref: https://github.com/prometheus/client_golang/issues/240)
All metrics should use base units, which is seconds in the duration
case.
Counters should always end in `_total` and metrics should avoid
referring to potential label dimensions. Those should rather be
mentioned in the documentation string.
Automatic merge from submit-queue (batch tested with PRs 42811, 42859)
Change the junit file name format to `junit_image-name_id.xml`,
With this, the junit file name will be `junit_image-name_id.xml:
```
junit_containervm_id.xml
junit_coreos-alpha_id.xml
junit_gci_id.xml
junit_ubuntu-docker10_id.xml
junit_ubuntu-docker12_id.xml
```
The test infra team will use the `image-name` inside the junit file name and replace the original `[1] [2] [3] ..` with the actual image name.
This will make it a little bit easier for debugging.
/cc @dchen1107 @krzyzacy @kubernetes/sig-node-pr-reviews
/cc @kubernetes/release-maintainers This is a minor test only change to make it easier to debug issues.
Automatic merge from submit-queue (batch tested with PRs 42811, 42859)
Validation PVs for mount options
We are going to move the validation in its own package and we will be calling validation for individual volume types as needed.
Fixes https://github.com/kubernetes/kubernetes/issues/42573
Automatic merge from submit-queue (batch tested with PRs 42024, 42780, 42808, 42640)
kubectl: respect DaemonSet strategy parameters for rollout status
It handles "after-merge" comments from #41116
cc @kargakis @janetkuo
I will add one more e2e test later. I need to handle some in company stuff.
Automatic merge from submit-queue (batch tested with PRs 42024, 42780, 42808, 42640)
Node controller test flake 39975 with delay for try function
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#39975
/cc @ncdc @gmarek @liggitt
Automatic merge from submit-queue (batch tested with PRs 42024, 42780, 42808, 42640)
Handle NPD during cluster upgrade.
Generate NPD token during upgrade.
I could not fully verify this change because of https://github.com/kubernetes/kubernetes/issues/42199. However, at least I tried upgrade master, and the corresponding environment variables are correctly generated.
```
...
ENABLE_NODE_PROBLEM_DETECTOR: 'standalone'
...
KUBELET_TOKEN: 'PKNgAaVXeL3VojND2s0KMleELjzGK0oW'
```
@maisem @dchen1107
Automatic merge from submit-queue
Remove VCenterPort from vsphere cloud provider.
**What this PR does / why we need it**:
Address a bug inside vsphere cloud provider when a port number other than 443 is specified inside the config file.
The url which is used for communicating with govmomi should not include port number.
A port number other than 443 will result in 404 error.
VCenterPort stays in VSphereConfig structure for backward compatibility.
**Which issue this PR fixes** : fixes https://github.com/kubernetes/kubernetes-anywhere/issues/338
Automatic merge from submit-queue (batch tested with PRs 42734, 42745, 42758, 42814, 42694)
Dropped docker 1.9.x support. Changed the minimumDockerAPIVersion to
1.22
cc/ @Random-Liu @yujuhong
We talked about dropping docker 1.9.x support for a while. I just realized that we haven't really done it yet.
```release-note
Dropped the support for docker 1.9.x and the belows.
```
Automatic merge from submit-queue (batch tested with PRs 42734, 42745, 42758, 42814, 42694)
Implement automated downgrade testing.
Node version cannot be higher than the master version, so we must
switch the node version first. Also, we must use the upgrade script
from the appropriate version for GCE.
Automatic merge from submit-queue (batch tested with PRs 42734, 42745, 42758, 42814, 42694)
Create DefaultPodDeletionTimeout for e2e tests
In our e2e and e2e_node tests, we had a number of different timeouts for deletion.
Recent changes to the way deletion works (#41644, #41456) have resulted in some timeouts in e2e tests. #42661 was the most recent fix for this.
Most of these tests are not meant to test pod deletion latency, but rather just to clean up pods after a test is finished.
For this reason, we should change all these tests to use a standard, fairly high timeout for deletion.
cc @vishh @Random-Liu
Automatic merge from submit-queue
Don't wait for the final deletion of pod
The final deletion of the pod depends on kubelet and other components operating correctly. The purpose of this e2e test is verifying the clientset can handle deleteOptions correctly, so waiting for the deletionTimestamp and deletionGraceperiod get set is good enough.
In the long run, we should move this set of e2e tests to integration tests.
Fix#42724#42646
cc @marun
Node version cannot be higher than the master version, so we must
switch the node version first. Also, we must use the upgrade script
from the appropriate version for GCE.
Automatic merge from submit-queue
add debugging to the client watch test
Adds debugging information for https://github.com/kubernetes/kubernetes/issues/42724. I suspect that the watch is closing early, but I'd like proof before I consider things like retrying the list and doing another watch to observe the delete. I'm not even sure that would satisfy the test
It seems like a flaky way to build the test. Why wouldn't we delete non-gracefully?
@kubernetes/sig-api-machinery-misc @caesarxuchao
@wojtek-t saw you just hit this if you wanted to take a quick look at the debugging I added.
Automatic merge from submit-queue
Ensure a fixed godep version in hack/*-godep*.sh
No godep pinning asks for trouble when godep changes behaviour once again.
Moreover, call `hack/godep-restore.go` from `hack/update-all-staging.sh`. This was an actual bug.
Automatic merge from submit-queue (batch tested with PRs 42786, 42553)
Updated auto generated protobuf codes.
Generated by `./hack/update-generated-protobuf-dockerized.sh` in Mac.
Automatic merge from submit-queue (batch tested with PRs 42786, 42553)
Updated comments for TaintBasedEvictions.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
Automatic merge from submit-queue (batch tested with PRs 42728, 42278)
[Federation] Create integration test fixture for api
This PR factors a reusable fixture for the federation api server out of the existing integration test.
Targets #40705
cc: @kubernetes/sig-federation-pr-reviews
Automatic merge from submit-queue
Don't try to run hack/verify-staging-* on dirty repository
When the repo is dirty after running all `update-*` scripts in `hack/update-all.sh`, the staging verify scripts still fail. This PR removes these from `hack/update-all.sh`. Instead give useful instructions or continue automatically with `hack/update-all-staging.sh`.
Automatic merge from submit-queue (batch tested with PRs 42762, 42739, 42425, 42778)
Fixed potential OutOfSync of nodeInfo.
The cloned NodeInfo still share the same resource objects in cache; it may make `requestedResource` and Pods OutOfSync, for example, if the pod was deleted, the `requestedResource` is updated by Pods are not in cloned info. Found this when investigating #32531 , but seems not the root cause, as nodeInfo are readonly in predicts & priorities.
Sample codes for `&(*)`:
```
package main
import (
"fmt"
)
type Resource struct {
A int
}
type Node struct {
Res *Resource
}
func main() {
r1 := &Resource { A:10 }
n1 := &Node{Res: r1}
r2 := &(*n1.Res)
r2.A = 11
fmt.Printf("%t, %d %d\n", r1==r2, r1, r2)
}
```
Output:
```
true, &{11} &{11}
```