Automatic merge from submit-queue (batch tested with PRs 40978, 40994, 41008, 40622)
Refactored kubemark code into provider-specific and provider-independent parts [Part-2]
Applying part of the changes of PR https://github.com/kubernetes/kubernetes/pull/39033 (which refactored kubemark code completely). The changes included in this PR are:
- Added test/kubemark/skeleton/util.sh which defines a well-commented interface that any cloud-provider should implement to run kubemark.
This includes functions like creating the master machine instance along with its resources, remotely executing a given command on the master (like ssh), scp, deleting the master instance and its resources.
All these functions have to be over-ridden by each cloud provider inside the file /test/kubemark/$CLOUD_PROVIDER/util.sh
- Implemented the above mentioned interface for gce in /test/kubemark/$CLOUD_PROVIDER/util.sh
- Made start- and stop- kubemark scripts (almost) provider independent by making them source the interface based on cloud provider.
@kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue
Kubeadm discovery remove error passing
**What this PR does / why we need it**: In the app/discovery there is some confusion about the passing of error values created in the discovery/token, discovery/https/ and discovery/file pkgs. Since they always return `nil` , it was very confusing in discovery/flags.go why to propagate them up as if there was a chance for them to return a value other than `nil`. This change makes it much more clear what is being passed.
I noticed this as I was making a sweep through trying to add more unit tests and it was very confusing to read the code.
**Which issue this PR fixes** : fixes #https://github.com/kubernetes/kubeadm/issues/141
**Special notes for your reviewer**: /cc @luxas @pires
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
kubeadm: Remove the kubernetes.io/cluster-service label from the Deployment templates
**What this PR does / why we need it**:
As discussed on Slack, these labels have no function when not using the addon-manager, so it's best to remove them to avoid confusion.
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
@thockin @mikedanese @pires @MrHohn @bowei @dmmcquay @deads2k @philips
After launching the services, local-up-cluster.sh tells the
user how to configure kubectl to access it. The instructions
for 'set-credentials' enable plain password auth, but the
services are configured to mandate client certificate auth.
As a result it is not possible to access the cluster with
the instructions printed.
The use of client certs by default was added in
commit a1b17db458
Author: Dr. Stefan Schimanski <sttts@redhat.com>
Date: Sat Nov 12 23:09:04 2016 +0100
Configure client certs in local-cluster-up.sh
and the instructions were correctly updated to refer to
client certificates.
The changed instructions were (mistakenly) reverted though
when the following commit was merged:
commit 72e0e91b5e
Author: xilabao <chenr.fnst@cn.fujitsu.com>
Date: Fri Dec 2 11:04:25 2016 +0800
change prompt for enabling RBAC on local-up-cluster
Fixes: #40192
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Automatic merge from submit-queue
Using API_HOST_IP to do apiserver health check.
In `hack/local-up-cluster.sh`, it's better to use `API_HOST_IP` to do apiserver health check.
Automatic merge from submit-queue
[Federation][kubefed] Add option to expose federation apiserver on nodeport service
**What this PR does / why we need it**:
This PR adds an option to kubefed to expose federation api server over nodeport. This can be useful to deploy federation in non-cloud environments. This PR is target to address #39271
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```
[Federation] kubefed init learned a new flag, `--api-server-service-type`, that allows service type to be specified for the federation API server.
[Federation] kubefed init also learned a new flag, `--api-server-advertise-address`, that allows specifying advertise address for federation API server in case the service type is NodePort.
```
@kubernetes/sig-federation-misc @madhusudancs
Automatic merge from submit-queue
Add SIG to test owners
**What this PR does / why we need it**:
This PR adds a `sig` column to the test owners file generation script.
A problem experienced with the current owners file is that since members are auto-assigned there are times where tests are assigned to non-active users who don't follow up to notifications to fix flakes. By assigning a SIG to each test we can hold a group we know is active responsible for taking care of flakes it's less likely that flakes will fall through the cracks.
**Special notes for your reviewer**:
* A companion PR will go into *kubernetes/contrib* adding support for mungers parsing this new column.
* Another PR in contrib will add labeling GitHub flake issues with the appropriate SIG
* Currently SIGs are not labeled, this will be added in another PR where SIG determinations can be discussed
@saad-ali @pwittrock
The clientset will throw an error for aggregated apiservers because the
clientset looks for specific versions of apis that are compiled into
the client. These will be missing from aggregated apiservers.
The discoveryclient is fully dynamic and does not rely on compiled
in apiversions.
Automatic merge from submit-queue (batch tested with PRs 40289, 40877, 40879, 39972, 40942)
Extract util used by jsonmergepatch and SMPatch
followup https://github.com/kubernetes/kubernetes/pull/40666#discussion_r99198931
Extract some util out of the `strategicMergePatch` to make `jsonMergePatch` doesn't depend on `strategicMergePatch`.
```release-note
None
```
cc: @liggitt
Automatic merge from submit-queue (batch tested with PRs 40289, 40877, 40879, 39972, 40942)
Rename experimental-cgroups-per-pod flag
**What this PR does / why we need it**:
1. Rename `experimental-cgroups-per-qos` to `cgroups-per-qos`
1. Update hack/local-up-cluster to match `CGROUP_DRIVER` with docker runtime if used.
**Special notes for your reviewer**:
We plan to roll this feature out in the upcoming release. Previous node e2e runs were running with this feature on by default. We will default this feature on for all e2es next week.
**Release note**:
```release-note
Rename --experiemental-cgroups-per-qos to --cgroups-per-qos
```
Automatic merge from submit-queue (batch tested with PRs 40289, 40877, 40879, 39972, 40942)
PV E2E: provide each spec with a fresh nfs host
**What this PR does / why we need it**:
PersistentVolume e2e currently reuses an NFS host pod created at the start of the suite and accessed by each test. This is far less favorable than using a fresh volume per test. Additionally, this guards against the volume host pod or it's kubelet being disrupted, which has led to flakes.
```release-note-none
```
Automatic merge from submit-queue (batch tested with PRs 40289, 40877, 40879, 39972, 40942)
Remove the temporary fix for pre-1.0 mirror pods
The fix was introduced to fix#15960 for pre-1.0 pods. It should be safe to remove
this fix now.
Automatic merge from submit-queue (batch tested with PRs 40906, 40924, 40938, 40902, 40911)
federation: Updating deletion helper to add both finalizers in a single update
Fixes https://github.com/kubernetes/kubernetes/issues/40837
cc @mwielgus @csbell
Automatic merge from submit-queue (batch tested with PRs 40906, 40924, 40938, 40902, 40911)
print apiserver log location on apiserver error
**What this PR does / why we need it**:
Improve user experience. Attempt to direct user to logs of failing component.
**Special notes for your reviewer**:
In addition to failure, point to logs so that a user can attempt to self remedy and have more information available to debug immediately. A user may not know that the failing component has logs.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 40906, 40924, 40938, 40902, 40911)
Add [Flaky] tag to persistent volumes tests
**What this PR does / why we need it**:
Persistent Volume tests continue to flake in CI.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 40906, 40924, 40938, 40902, 40911)
Check whether apiversions is empty
What this PR does / why we need it:
#39719 check whether apisversions get from /api is empty
Special notes for your reviewer:
@caesarxuchao
Automatic merge from submit-queue
Add an upgrade test for secrets.
**What this PR does / why we need it**: This PR adds an upgrade test for secrets. It creates a secret and makes sure that pods can consume it before an after an upgrade.
Automatic merge from submit-queue
CRI: Handle cri in-place upgrade
Fixes https://github.com/kubernetes/kubernetes/issues/40051.
## How does this PR restart/remove legacy containers/sandboxes?
With this PR, dockershim will convert and return legacy containers and infra containers as regular containers/sandboxes. Then we can rely on the SyncPod logic to stop the legacy containers/sandboxes, and the garbage collector to remove the legacy containers/sandboxes.
To forcibly trigger restart:
* For infra containers, we manually set `hostNetwork` to opposite value to trigger a restart (See [here](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L389))
* For application containers, they will be restarted with the infra container.
## How does this PR avoid extra overhead when there is no legacy container/sandbox?
For the lack of some labels, listing legacy containers needs extra `docker ps`. We should not introduce constant performance regression for legacy container cleanup. So we added the `legacyCleanupFlag`:
* In `ListContainers` and `ListPodSandbox`, only do extra `ListLegacyContainers` and `ListLegacyPodSandbox` when `legacyCleanupFlag` is `NotDone`.
* When dockershim starts, it will check whether there are legacy containers/sandboxes.
* If there are none, it will mark `legacyCleanupFlag` as `Done`.
* If there are any, it will leave `legacyCleanupFlag` as `NotDone`, and start a goroutine periodically check whether legacy cleanup is done.
This makes sure that there is overhead only when there are legacy containers/sandboxes not cleaned up yet.
## Caveats
* In-place upgrade will cause kubelet to restart all running containers.
* RestartNever container will not be restarted.
* Garbage collector sometimes keep the legacy containers for a long time if there aren't too many containers on the node. In that case, dockershim will keep performing extra `docker ps` which introduces overhead.
* Manually remove all legacy containers will fix this.
* Should we garbage collect legacy containers/sandboxes in dockershim by ourselves? /cc @yujuhong
* Host port will not be reclaimed for the lack of checkpoint for legacy sandboxes. https://github.com/kubernetes/kubernetes/pull/39903 /cc @freehan
/cc @yujuhong @feiskyer @dchen1107 @kubernetes/sig-node-api-reviews
**Release note**:
```release-note
We should mention the caveats of in-place upgrade in release note.
```
Automatic merge from submit-queue
Plumb subresource through subjectaccessreview
plumb all fields for subjectaccessreview into the resulting `authorizer.AttributesRecord`
```release-note
The SubjectAccessReview API passes subresource and resource name information to the authorizer to answer authorization queries.
```
Automatic merge from submit-queue
examples: PV docs clarify Azure storage account restriction
**What this PR does / why we need it**: One line doc fix, clarifies a constraint for using `AzureDisk` volumes.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#40276
**Special notes for your reviewer**: None
**Release note**:
```release-note
NONE
```
cc: @rootfs @otaviosoares
Automatic merge from submit-queue
GroupMetaFactoryArgs documentation
**What this PR does / why we need it**:
Documentation for people writing new API-Groups.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: documentation
**Special notes for your reviewer**:
@deads2k @pmorie my thoughts from writing the service-catalog apiserver.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Tidy up the main README.
Removed the coveralls link since it hasn't been updated in a few years. Made some punctuation more consistent.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Optionally avoid evicting critical pods in kubelet
For #40573
```release-note
When feature gate "ExperimentalCriticalPodAnnotation" is set, Kubelet will avoid evicting pods in "kube-system" namespace that contains a special annotation - `scheduler.alpha.kubernetes.io/critical-pod`
This feature should be used in conjunction with the rescheduler to guarantee availability for critical system pods - https://kubernetes.io/docs/admin/rescheduler/
```
Automatic merge from submit-queue (batch tested with PRs 40696, 39914, 40374)
Convert hack/e2e.go to a test-infra/kubetest shim
Replaces `hack/e2e.go` for a shim that passes the args to `k8s.io/test-infra/kubetest`
Adds fejta to `hack/OWNERS`
Adds `e2e_test.go` for unit test coverage of the shim.
`Usage: go run hack/e2e.go [--get=true] [--old=1d] -- KUBETEST_ARGS`
In other words there is are `--get` and `--old` shim flags, which control how we upgrade `kubetest`, and a `--` to separate the shim args from the kubetest args, and the existing kubetest args like `--down` `--up`, etc. If only `KUBETEST_ARGS` are used then you can skip the `--` (although golang will complain about it).
Once this is ready to go I will update the kubekins-e2e image to copy this file from test-infra: https://github.com/kubernetes/test-infra/blob/master/jenkins/e2e-image/Dockerfile#L70
ref https://github.com/kubernetes/test-infra/issues/1475
Automatic merge from submit-queue (batch tested with PRs 40696, 39914, 40374)
Forgiveness library changes
**What this PR does / why we need it**:
Splited from #34825, contains library changes that are needed to implement forgiveness:
1. ~~make taints-tolerations matching respect timestamps, so that one toleration can just tolerate a taint for only a period of time.~~ As TaintManager is caching taints and observing taint changes, time-based checking is now outside the library (in TaintManager). see #40355.
2. make tolerations respect wildcard key.
3. add/refresh some related functions to wrap taints-tolerations operation.
**Which issue this PR fixes**:
Related issue: #1574
Related PR: #34825, #39469
~~Please note that the first 2 commits in this PR come from #39469 .~~
**Special notes for your reviewer**:
~~Since currently we have `pkg/api/helpers.go` and `pkg/api/v1/helpers.go`, there are some duplicated periods of code laying in these two files.~~
~~Ideally we should move taints-tolerations related functions into a separate package (pkg/util/taints), and make it a unified set of implementations. But I'd just suggest to do it in a follow-up PR after Forgiveness ones done, in case of feature Forgiveness getting blocked to long.~~
**Release note**:
```release-note
make tolerations respect wildcard key
```
Automatic merge from submit-queue (batch tested with PRs 40696, 39914, 40374)
Cleanup scheduler server with an external config class
**What this PR does / why we need it**:
Some cleanup in cmd/server so that the parts which setup scheduler configuration are stored and separately tested.
- additionally a simple unit test to check that erroneous configs return a non-nil error is included.
- it also will make sure we avoid nil panics of schedulerConfiguration is misconfigured.