Automatic merge from submit-queue (batch tested with PRs 42128, 42064, 42253, 42309, 42322)
Add storage.k8s.io/v1 API
This is combined version of reverted #40088 (first 4 commits) and #41646. The difference is that all controllers and tests use old `storage.k8s.io/v1beta1` API so in theory all tests can pass on GKE.
Release note:
```release-note
StorageClassName attribute has been added to PersistentVolume and PersistentVolumeClaim objects and should be used instead of annotation `volume.beta.kubernetes.io/storage-class`. The beta annotation is still working in this release, however it will be removed in a future release.
```
Automatic merge from submit-queue (batch tested with PRs 41980, 42192, 42223, 41822, 42048)
Adjust parameters of GCL cluster logging load tests
This PR increases the amount of logs produced in load tests to match the number of nodes and provide the predictable load of 100 KB/sec on each node.
Also this PR reduces in half amount of time, given for ingesting logs.
Automatic merge from submit-queue (batch tested with PRs 41980, 42192, 42223, 41822, 42048)
Take into account number of restarts in cluster logging tests
Before, in cluster logging tests, we only measured e2e number of lines delivered to the backend.
Also, befure https://github.com/kubernetes/kubernetes/pull/41795 was merged, from the k8s perspective, fluentd was always working properly, even if it's crashlooping inside.
Now we can detect whether fluentd is truly working properly, experiencing no, or almost no OOMs duing its operation.
Automatic merge from submit-queue (batch tested with PRs 41980, 42192, 42223, 41822, 42048)
Modified kubemark startup scripts to restore master on reboot
Fixes#41735
As discussed in the issue, modified the scripts to satisfy the conditions of restoring master env, running non-idempotent operations only for the first time and persist important data like pki/auth files on a PD.
Also attached `start-kubemark-master.sh` as startup-script metadata to master instance (on GCE) so that it is called automatically on each boot.
cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue (batch tested with PRs 41931, 39821, 41841, 42197, 42195)
Admission Controller: Add Pod Preset
Based off the proposal in https://github.com/kubernetes/community/pull/254
cc @pmorie @pwittrock
TODO:
- [ ] tests
**What this PR does / why we need it**: Implements the Pod Injection Policy admission controller
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Added new Api `PodPreset` to enable defining cross-cutting injection of Volumes and Environment into Pods.
```
Automatic merge from submit-queue (batch tested with PRs 41644, 42020, 41753, 42206, 42212)
Ingress-glbc upgrade tests
Basically #41676 but with some fixes and added comments. @bprashanth has been away this week and it's desirable to have this in before code freeze.
export functions from pkg/api/validation
add settings API
add settings to pkg/registry
add settings api to pkg/master/master.go
add admission control plugin for pod preset
add new admission control plugin to kube-apiserver
add settings to import_known_versions.go
add settings to codegen
add validation tests
add settings to client generation
add protobufs generation for settings api
update linted packages
add settings to testapi
add settings install to clientset
add start of e2e
add pod preset plugin to config-test.sh
Signed-off-by: Jess Frazelle <acidburn@google.com>
Automatic merge from submit-queue
Extensible Userspace Proxy
This PR refactors the userspace proxy to allow for custom proxy socket implementations.
It changes the the ProxySocket interface to ensure that other packages can properly implement it (making sure all arguments are publicly exposed types, etc), and adds in a mechanism for an implementation to create an instance of the userspace proxy with a non-standard ProxySocket.
Custom ProxySockets are useful to inject additional logic into the actual proxying. For example, our idling proxier uses a custom proxy socket to hold connections and notify the cluster that idled scalable resources need to be woken up.
Also-Authored-By: Ben Bennett bbennett@redhat.com
Automatic merge from submit-queue
Add apps/v1beta1 deployments with new defaults
This pull introduces deployments under `apps/v1beta1` and fixes#23597 and #23304.
TODO:
* [x] - create new type `apps/v1beta1.Deployment`
* [x] - update kubectl (stop, scale)
* [ ] - ~~new `kubectl run` generator~~ - this will only duplicate half of generator code, I suggest replacing current to use new endpoint
* [ ] - ~~create extended tests~~ - I've added integration and cmd tests verifying new endpoints
* [ ] - ~~create `hack/test-update-storage-objects.sh`~~ - see above
This is currently blocked by https://github.com/kubernetes/kubernetes/pull/38071, due to conflicting name `v1beta1.Deployment`.
```release-note
Introduce apps/v1beta1.Deployments resource with modified defaults compared to extensions/v1beta1.Deployments.
```
@kargakis @mfojtik @kubernetes/sig-apps-misc
Automatic merge from submit-queue (batch tested with PRs 42316, 41618, 42201, 42113, 42191)
[Kubemark] Add init container in hollow node for setting inotify limit of node to 200
Fixes#41713
Along with adding the init container, I also changed the manifest to a yaml as otherwise the entire init container annotation would have to be in a single line (with escaped characters), as json doesn't allow multi-line strings.
cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek @Random-Liu
Automatic merge from submit-queue
Extend experimental support to multiple Nvidia GPUs
Extended from #28216
```release-note
`--experimental-nvidia-gpus` flag is **replaced** by `Accelerators` alpha feature gate along with support for multiple Nvidia GPUs.
To use GPUs, pass `Accelerators=true` as part of `--feature-gates` flag.
Works only with Docker runtime.
```
1. Automated testing for this PR is not possible since creation of clusters with GPUs isn't supported yet in GCP.
1. To test this PR locally, use the node e2e.
```shell
TEST_ARGS='--feature-gates=DynamicKubeletConfig=true' FOCUS=GPU SKIP="" make test-e2e-node
```
TODO:
- [x] Run manual tests
- [x] Add node e2e
- [x] Add unit tests for GPU manager (< 100% coverage)
- [ ] Add unit tests in kubelet package
Automatic merge from submit-queue (batch tested with PRs 41597, 42185, 42075, 42178, 41705)
auto discovery CA for extension API servers
This is what the smaller pulls were leading to. Only the last commit is unique and I expect I'll still tweak some pod definitions, but this is where I was going.
@sttts @liggitt
Automatic merge from submit-queue (batch tested with PRs 42216, 42136, 42183, 42149, 36828)
[Federation] Remove federat{ed,ion} prefixes from e2e file names since they are all now scoped under the e2e_federation package.
This is purely cosmetic.
```release-note
NONE
```
cc @kubernetes/sig-federation-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 42200, 39535, 41708, 41487, 41335)
Add support for statefulset spreading to the scheduler
**What this PR does / why we need it**:
The scheduler SelectorSpread priority funtion didn't have the code to spread pods of StatefulSets. This PR adds StatefulSets to the list of controllers that SelectorSpread supports.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#41513
**Special notes for your reviewer**:
**Release note**:
```release-note
Add the support to the scheduler for spreading pods of StatefulSets.
```
Automatic merge from submit-queue (batch tested with PRs 35094, 42095, 42059, 42143, 41944)
Use chroot for containerized mounts
This PR is to modify the containerized mounter script to use chroot
instead of rkt fly. This will avoid the problem of possible large number
of mounts caused by rkt containers if they are not cleaned up.
Automatic merge from submit-queue (batch tested with PRs 35094, 42095, 42059, 42143, 41944)
add aggregation integration test
Wires up an integration test which runs a full kube-apiserver, the wardle server, and the kube-aggregator and creates the APIservice object for the wardle server. Without services and DNS the aggregator doesn't proxy, but it does ensure we don't have an obvious panic or bring up failure.
@sttts @ncdc
Automatic merge from submit-queue (batch tested with PRs 40746, 41699, 42108, 42174, 42093)
Avoid fake node names in user info
Node usernames should follow the format `system:node:<node-name>`,
but if we don't know the node name, it's worse to put a fake one in.
In the future, we plan to have a dedicated node authorizer, which would
start rejecting requests from a user with a bogus node name like this.
The right approach is to either mint correct credentials per node, or use node bootstrapping so it requests a correct client certificate itself.
Automatic merge from submit-queue
clean up generic apiserver options
Clean up generic apiserver options before we tag any levels. This makes them more in-line with "normal" api servers running on the platform.
Also remove dead example code.
@sttts
Automatic merge from submit-queue (batch tested with PRs 41937, 41151, 42092, 40269, 42135)
[Federation] ReplicaSet e2es should let the API server to generate the names to avoid collision while running tests in parallel.
cc @kubernetes/sig-federation-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 41937, 41151, 42092, 40269, 42135)
Add a unit test for idempotent applys to the TPR entries.
The test in apply_test follows the general pattern of other tests.
We load from a file in test/fixtures and mock the API server in the
function closure in the HttpClient call.
The apply operation expects a last-modified-configuration annotation.
That is written verbatim in the test/fixture file.
References #40841
**What this PR does / why we need it**:
Adds one unit test for TPR's using applies.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
References:
https://github.com/kubernetes/features/issues/95https://github.com/kubernetes/kubernetes/issues/40841#issue-204769102
**Special notes for your reviewer**:
I am not super proud of the tpr-entry name.
But I feel like we need to call the two objects differently.
The one which has Kind:ThirdPartyResource
and the one has Kind:Foo.
Is the name "ThirdPartyResource" used interchangeably for both ? I used tpr-entry for the Kind:Foo object.
Also I !assume! this is testing an idempotent apply because the last-applied-configuration annotation is the same as the object itself.
This is the state I see in the logs of kubectl if I do a proper idempotent apply of a third party resource entry.
I guess I will know more once I start playing around with apply command that change TPR objects.
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 41234, 42186, 41615, 42028, 41788)
Additional upgrade e2e tests
**What this PR does / why we need it**: Add basic upgrade tests for DaemonSet and Job, and add "during upgrade" testing to ConfigMap test. Add a simple harness for testing upgrade tests.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
**Special notes for your reviewer**: continuation of #41296 @krousey please review, thanks
**Release note**: `NONE`
Automatic merge from submit-queue
Enforce Node Allocatable via cgroups
This PR enforces node allocatable across all pods using a top level cgroup as described in https://github.com/kubernetes/community/pull/348
This PR also provides an option to enforce `kubeReserved` and `systemReserved` on user specified cgroups.
This PR will by default make kubelet create top level cgroups even if `kubeReserved` and `systemReserved` is not specified and hence `Allocatable = Capacity`.
```release-note
New Kubelet flag `--enforce-node-allocatable` with a default value of `pods` is added which will make kubelet create a top level cgroup for all pods to enforce Node Allocatable. Optionally, `system-reserved` & `kube-reserved` values can also be specified separated by comma to enforce node allocatable on cgroups specified via `--system-reserved-cgroup` & `--kube-reserved-cgroup` respectively. Note the default value of the latter flags are "".
This feature requires a **Node Drain** prior to upgrade failing which pods will be restarted if possible or terminated if they have a `RestartNever` policy.
```
cc @kubernetes/sig-node-pr-reviews @kubernetes/sig-node-feature-requests
TODO:
- [x] Adjust effective Node Allocatable to subtract hard eviction thresholds
- [x] Add unit tests
- [x] Complete pending e2e tests
- [x] Manual testing
- [x] Get the proposal merged
@dashpole is working on adding support for evictions for enforcing Node allocatable more gracefully. That work will show up in a subsequent PR for v1.6
Automatic merge from submit-queue (batch tested with PRs 41205, 42196, 42068, 41588, 41271)
Implements an upgrade test for Job
**What this PR does / why we need it**:
This PR implements a cluster upgrade test for Job. Some functionality for Job testing has been moved from the e2e package to the framework package to facilitate code reuse between the e2e package and the upgrade package without introducing cyclic dependencies.
We need this PR to help automate the testing of cluster upgrades between versions.
**Release note**
```release-note
NONE
```
This changes the userspace proxy so that it cleans up its conntrack
settings when a service is removed (as the iptables proxy already
does). This could theoretically cause problems when a UDP service
as deleted and recreated quickly (with the same IP address). As
long as packets from the same UDP source IP and port were going to
the same destination IP and port, the the conntrack would apply and
the packets would be sent to the old destination.
This is astronomically unlikely if you did not specify the IP address
to use in the service, and even then, only happens with an "established"
UDP connection. However, in cases where a service could be "switched"
between using the iptables proxy and the userspace proxy, this case
becomes much more frequent.
Updates the dnsmasq cache/mux layer to be managed by dnsmasq-nanny.
dnsmasq-nanny manages dnsmasq based on values from the
kube-system:kube-dns configmap:
"stubDomains": {
"acme.local": ["1.2.3.4"]
},
is a map of domain to list of nameservers for the domain. This is used
to inject private DNS domains into the kube-dns namespace. In the above
example, any DNS requests for *.acme.local will be served by the
nameserver 1.2.3.4.
"upstreamNameservers": ["8.8.8.8", "8.8.4.4"]
is a list of upstreamNameservers to use, overriding the configuration
specified in /etc/resolv.conf.
The tests in apply_test follows the general pattern of other tests.
We load from a file in test/fixtures and mock the API server in the
function closure in the HttpClient call.
In PATCH request rount-tripper we check that the kubectl apply
implementation worked as expected.
References #40841
Automatic merge from submit-queue (batch tested with PRs 41116, 41804, 42104, 42111, 42120)
make kubectl taint command respect effect NoExecute
**What this PR does / why we need it**:
Part of feature forgiveness implementation, make kubectl taint command respect effect NoExecute.
**Which issue this PR fixes**:
Related Issue: #1574
Related PR: #39469
**Special notes for your reviewer**:
**Release note**:
```release-note
make kubectl taint command respect effect NoExecute
```
Automatic merge from submit-queue (batch tested with PRs 41962, 42055, 42062, 42019, 42054)
Update flag to --check-version-skew instead of --check_version_skew
https://github.com/kubernetes/test-infra/issues/2012
Also add a `--` to send the flags to kubetest without triggering a warning.
Automatic merge from submit-queue (batch tested with PRs 41962, 42055, 42062, 42019, 42054)
PV upgrade test
**What this PR does / why we need it**:
This PR adds a PV upgrade test to the new upgrade test framework. Before, this test had to be done manually. Currently the upgrade test framework only works on the GCE environment, so I plan to add support for other providers later. In order to write the test, I had to modify and refactor some volume test util libraries. I reran the impacted tests to make sure they still passed.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
It's probably easier to review the two commits separately. I split it up into the refactor changes, and the upgrade test changes.
**Release note**:
NONE
cc @saad-ali @krousey
Automatic merge from submit-queue (batch tested with PRs 42044, 41694, 41927, 42050, 41987)
Use existing nginx image in the node
**What this PR does / why we need it**:
Fixes a test flake: `[k8s.io] Garbage collector should orphan pods created by rc if delete options say so`
**Which issue this PR fixes**
fixes#35771
**Special notes for your reviewer**:
**Release note**:
```release-note NONE
```
Automatic merge from submit-queue (batch tested with PRs 42044, 41694, 41927, 42050, 41987)
Add apply set-last-applied subcommand
implement part of https://github.com/kubernetes/community/pull/287, will rebase after https://github.com/kubernetes/kubernetes/pull/41699 got merged, EDIT: since bug output format has been confirmed, will update the behavior of output format soon
cc @kubernetes/sig-cli-pr-reviews @AdoHe @pwittrock
```release-note
Support kubectl apply set-last-applied command to update the applied-applied-configuration annotation
```
Automatic merge from submit-queue (batch tested with PRs 35408, 41915, 41992, 41964, 41925)
e2e/upgrade: add sysctls
Add sysctl upgrade tests.
How can these be effectively tested?
Automatic merge from submit-queue (batch tested with PRs 41954, 40528, 41875, 41165, 41877)
Updating apiserver to return 202 when resource is being deleted asynchronously via cascading deletion
As per https://github.com/kubernetes/kubernetes/issues/33196#issuecomment-278440622.
cc @kubernetes/sig-api-machinery-pr-reviews @smarterclayton @caesarxuchao @bgrant0607 @kubernetes/api-reviewers
```release-note
Updating apiserver to return http status code 202 for a delete request when the resource is not immediately deleted because of user requesting cascading deletion using DeleteOptions.OrphanDependents=false.
```
Automatic merge from submit-queue (batch tested with PRs 41701, 41818, 41897, 41119, 41562)
Optimization of on-wire information sent to scheduler extender interfaces that are stateful
The changes are to address the issue described in https://github.com/kubernetes/kubernetes/issues/40991
```release-note
Added support to minimize sending verbose node information to scheduler extender by sending only node names and expecting extenders to cache the rest of the node information
```
cc @ravigadde
Automatic merge from submit-queue (batch tested with PRs 41994, 41969, 41997, 40952, 40576)
Guaranteed admission for Critical Pods
This is the first step in implementing node-level preemption for critical pods.
It defines the AdmissionFailureHandler interface, which allows callers, like the kubelet, to define how failed predicates are handled, and take steps to correct failures if necessary.
In the kubelet's implementation, it triggers preemption if the pod being admitted is critical, and if the only failed predicates are InsufficientResourceErrors, then it prempts (not yet implemented) other other pods to allow admission of the critical pod.
cc: @vishh
Automatic merge from submit-queue (batch tested with PRs 41994, 41969, 41997, 40952, 40576)
Inode Eviction test Tests that Innocent pods are not evicted throughout test.
The current version of the inode eviction test only makes sure that the two abusive pods are evicted before the innocent pod. This change adds a check during the post-eviction monitoring period to ensure that the innocent pod is not evicted, even after the abusive pods.
This will likely result in a higher failure rate for this test, but since it is in the flaky test suite, this is not an issue.
This change should help give us a better perspective on eviction issues.
cc @Random-Liu @vishh @derekwaynecarr
Automatic merge from submit-queue (batch tested with PRs 40932, 41896, 41815, 41309, 41628)
Modify CronJob API to add job history limits, cleanup jobs in controller
**What this PR does / why we need it**:
As discussed in #34710: this adds two limits to `CronJobSpec`, to limit the number of finished jobs created by a CronJob to keep.
**Which issue this PR fixes**: fixes#34710
**Special notes for your reviewer**:
cc @soltysh, please have a look and let me know what you think -- I'll then add end to end testing and update the doc in a separate commit. What is the timeline to get this into 1.6?
The plan:
- [x] API changes
- [x] Changing versioned APIs
- [x] `types.go`
- [x] `defaults.go` (nothing to do)
- [x] `conversion.go` (nothing to do?)
- [x] `conversion_test.go` (nothing to do?)
- [x] Changing the internal structure
- [x] `types.go`
- [x] `validation.go`
- [x] `validation_test.go`
- [x] Edit version conversions
- [x] Edit (nothing to do?)
- [x] Run `hack/update-codegen.sh`
- [x] Generate protobuf objects
- [x] Run `hack/update-generated-protobuf.sh`
- [x] Generate json (un)marshaling code
- [x] Run `hack/update-codecgen.sh`
- [x] Update fuzzer
- [x] Actual logic
- [x] Unit tests
- [x] End to end tests
- [x] Documentation changes and API specs update in separate commit
**Release note**:
```release-note
Add configurable limits to CronJob resource to specify how many successful and failed jobs are preserved.
```
Automatic merge from submit-queue (batch tested with PRs 42106, 42094, 42069, 42098, 41852)
Pod deletion observation is flaking, increase timeout and debug more
We can afford to wait longer than 30 seconds, and we should be printing
more error and output information about the cause of the failure.
Fixes / triages #41902
Automatic merge from submit-queue (batch tested with PRs 42106, 42094, 42069, 42098, 41852)
Add ncdc to test/OWNERS
**What this PR does / why we need it**: add me to test/OWNERS
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
cc @kubernetes/sig-testing-pr-reviews @smarterclayton @bgrant0607
Automatic merge from submit-queue (batch tested with PRs 41854, 41801, 40088, 41590, 41911)
Add storage.k8s.io/v1 API
v1 API is direct copy of v1beta1 API. This v1 API gets installed and exposed in this PR, I tested that kubectl can create both v1beta1 and v1 StorageClass.
~~Rest of Kubernetes (controllers, examples,. tests, ...) still use v1beta1 API, I will update it when this PR gets merged as these changes would get lost among generated code.~~ Most parts use v1 API now, it would not compile / run tests without it.
**Release note**:
```
Kubernetes API storage.k8s.io for storage objects is now fully supported and is available as storage.k8s.io/v1. Beta version of the API storage.k8s.io/v1beta1 is still available in this release, however it will be removed in a future Kubernetes release.
Together with the API endpoint, StorageClass annotation "storageclass.beta.kubernetes.io/is-default-class" is deprecated and "storageclass.kubernetes.io/is-default-class" should be used instead to mark a default storage class. The beta annotation is still working in this release, however it won't be supported in the next one.
```
@kubernetes/sig-storage-misc
Automatic merge from submit-queue
Supports 'ensure exist' class addon in Addon-manager
Fixes#39561, fixes#37047 and fixes#36411. Depends on #40057.
This PR splits cluster addons into two categories:
- Reconcile: Addons that need to be reconciled (`kube-dns` for instance).
- EnsureExists: Addons that need to be exist but changeable (`default-storage-class`).
The behavior for the 'EnsureExists' class addon would be:
- Create it if not exist.
- Users could do any modification they want, addon-manager will not reconcile it.
- If it is deleted, addon-manager will recreate it with the given template.
- It will not be updated/clobbered during upgrade.
As Brian pointed out in [#37048/comment](https://github.com/kubernetes/kubernetes/issues/37048#issuecomment-272510835), this may not be the best solution for addon-manager. Though #39561 needs to be fixed in 1.6 and we might not have enough bandwidth to do a big surgery.
@mikedanese @thockin
cc @kubernetes/sig-cluster-lifecycle-misc
---
Tasks for this PR:
- [x] Supports 'ensure exist' class addon and switch to use new labels in addon-manager.
- [x] Updates READMEs regarding the new behavior of addon-manager.
- [x] Updated `test/e2e/addon_update.go` to match the new behavior.
- [x] Go through all current addons and apply the new labels on them regarding what they need.
- [x] Bump addon-manager and update its template files.
Automatic merge from submit-queue (batch tested with PRs 41714, 41510, 42052, 41918, 31515)
Switch scheduler to use generated listers/informers
Where possible, switch the scheduler to use generated listers and
informers. There are still some places where it probably makes more
sense to use one-off reflectors/informers (listing/watching just a
single node, listing/watching scheduled & unscheduled pods using a field
selector).
I think this can wait until master is open for 1.7 pulls, given that we're close to the 1.6 freeze.
After this and #41482 go in, the only code left that references legacylisters will be federation, and 1 bit in a stateful set unit test (which I'll clean up in a follow-up).
@resouer I imagine this will conflict with your equivalence class work, so one of us will be doing some rebasing 😄
cc @wojtek-t @gmarek @timothysc @jayunit100 @smarterclayton @deads2k @liggitt @sttts @derekwaynecarr @kubernetes/sig-scheduling-pr-reviews @kubernetes/sig-scalability-pr-reviews
This PR is to modify the containerized mounter script to use chroot
instead of rkt fly. This will avoid the problem of possible large number
of mounts caused by rkt containers if they are not cleaned up.
Fix bash command's execution in MakePod().
Add isPriviledged as a parameter to MakePod().
Move PD utils to pv_util.go
Ran all the tests in pd.go, persistent_volumes.go,
persistent_volumes-disruptive.go.
These changes are needed for the PV upgrade test I am working on.
Automatic merge from submit-queue (batch tested with PRs 41667, 41820, 40910, 41645, 41361)
Switch admission to use shared informers
Originally part of #40097
cc @smarterclayton @derekwaynecarr @deads2k @liggitt @sttts @gmarek @wojtek-t @timothysc @lavalamp @kubernetes/sig-scalability-pr-reviews @kubernetes/sig-api-machinery-pr-reviews
node cache don't need to get all the information about every
candidate node. For huge clusters, sending full node information
of all nodes in the cluster on the wire every time a pod is scheduled
is expensive. If the scheduler is capable of caching node information
along with its capabilities, sending node name alone is sufficient.
These changes provide that optimization in a backward compatible way
- removed the inadvertent signature change of Prioritize() function
- added defensive checks as suggested
- added annotation specific test case
- updated the comments in the scheduler types
- got rid of apiVersion thats unused
- using *v1.NodeList as suggested
- backing out pod annotation update related changes made in the
1st commit
- Adjusted the comments in types.go and v1/types.go as suggested
in the code review
Where possible, switch the scheduler to use generated listers and
informers. There are still some places where it probably makes more
sense to use one-off reflectors/informers (listing/watching just a
single node, listing/watching scheduled & unscheduled pods using a field
selector).
Automatic merge from submit-queue (batch tested with PRs 41812, 41665, 40007, 41281, 41771)
Bump golang versions to 1.7.5
**What this PR does / why we need it**: While #41636 might not make it in until 1.7, this would bump current golang versions from 1.7.4 to 1.7.5 to integrate the fixes from that patch version. This would include, among other things, a fix to ensure cross-built binaries for darwin don't have certificate validation errors (golang/go#18688)
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: none
**Special notes for your reviewer**:
**Release note**:
```release-note
Upgrade golang versions to 1.7.5
```
Automatic merge from submit-queue (batch tested with PRs 41146, 41486, 41482, 41538, 41784)
Switch statefulset controller to shared informers
Originally part of #40097
I *think* the controller currently makes a deep copy of a StatefulSet before it mutates it, but I'm not 100% sure. For those who are most familiar with this code, could you please confirm?
@beeps @smarterclayton @ingvagabund @sttts @liggitt @deads2k @kubernetes/sig-apps-pr-reviews @kubernetes/sig-scalability-pr-reviews @timothysc @gmarek @wojtek-t
Automatic merge from submit-queue (batch tested with PRs 41146, 41486, 41482, 41538, 41784)
Add apply view-last-applied subcommand
reopen pr https://github.com/kubernetes/kubernetes/pull/40984, implement part of https://github.com/kubernetes/community/pull/287
for now unit test all pass, the output looks like:
```console
shiywang@dhcp-140-33 template $ ./kubectl apply view last-applied deployment nginx-deployment
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
creationTimestamp: null
name: nginx-deployment
spec:
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
app: nginx
spec:
containers:
- image: nginx:1.12.10
name: nginx
ports:
- containerPort: 80
resources: {}
status: {}
```
```release-note
Support new kubectl apply view-last-applied command for viewing the last configuration file applied
```
not sure if there is any flag I should updated or the some error handling I should changed.
will generate docs when you guys think is ok.
cc @pwittrock @jessfraz @AdoHe @ymqytw
Automatic merge from submit-queue (batch tested with PRs 38957, 41819, 41851, 40667, 41373)
Move pvutil.go from e2e package to framework package
**What this PR does / why we need it**:
This PR moves pvutil.go to the e2e/framework package.
I am working on a PV upgrade test, and would like to use some of the wrapper functions in pvutil.go. However, the upgrade test is in the upgrade package, and not the e2e package, and it cannot import the e2e package because it would create a circular dependency. So pvutil.go needs to be moved out of e2e in order to break the circular dependency. This is a refactoring name change, no logic has been modified.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
**Special notes for your reviewer**:
**Release note**:
NONE
Automatic merge from submit-queue (batch tested with PRs 38957, 41819, 41851, 40667, 41373)
Change taints/tolerations to api fields
This PR changes current implementation of taints and tolerations from annotations to API fields. Taint and toleration are now part of `NodeSpec` and `PodSpec`, respectively. The annotation keys: `scheduler.alpha.kubernetes.io/tolerations` and `scheduler.alpha.kubernetes.io/taints` have been removed.
**Release note**:
Pod tolerations and node taints have moved from annotations to API fields in the PodSpec and NodeSpec, respectively. Pod tolerations and node taints that are defined in the annotations will be ignored. The annotation keys: `scheduler.alpha.kubernetes.io/tolerations` and `scheduler.alpha.kubernetes.io/taints` have been removed.
Automatic merge from submit-queue
Removes additional columns in test_owners.csv
**What this PR does / why we need it**:
fixes a huge collection of typos with the number of columns in the CSV file that probably has broken the auto assign bot
**Special notes for your reviewer**:
None
**Release note**:
`NONE`
Automatic merge from submit-queue (batch tested with PRs 41349, 41532, 41256, 41587, 41657)
Update kubectl in addon-manager to use HPA in autoscaling/v1
Addon-manager is broken since HPA objects were removed from extensions api group.
Came across the logs from [the latest addon-manager on Jenkins](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce/4290/artifacts/bootstrap-e2e-master/kube-addon-manager.log):
```
INFO: == Entering periodical apply loop at 2017-02-16T17:33:37+0000 ==
error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource
WRN: == Failed to execute /usr/local/bin/kubectl apply --namespace=kube-system -f /etc/kubernetes/addons --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:38+0000. 2 tries remaining. ==
error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource
WRN: == Failed to execute /usr/local/bin/kubectl apply --namespace=kube-system -f /etc/kubernetes/addons --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:46+0000. 1 tries remaining. ==
error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource
WRN: == Failed to execute /usr/local/bin/kubectl apply --namespace=kube-system -f /etc/kubernetes/addons --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:53+0000. 0 tries remaining. ==
WRN: == Kubernetes addon update completed with errors at 2017-02-16T17:33:58+0000 ==
```
And notice this commit (f66679a4e9) came in two weeks ago, which removed HorizontalPodAutoscaler from extensions/v1beta1.
Addon-manager is now partially functioning that it could successfully create and update addons, but will fail to prune objects, which means upgrade tests may mostly fail.
Pushed another version of addon-manager with kubectl v1.6.0-alpha.2 ([release 2 days ago](https://github.com/kubernetes/kubernetes/releases/tag/v1.6.0-alpha.2)) for fixing, including below images:
- gcr.io/google-containers/kube-addon-manager:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-amd64:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-arm:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-arm64:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-ppc64le:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-s390x:v6.4-alpha.2
@mikedanese
cc @wojtek-t @shyamjvs
Automatic merge from submit-queue (batch tested with PRs 41349, 41532, 41256, 41587, 41657)
Enable pod level cgroups by default
**What this PR does / why we need it**:
It enables pod level cgroups by default.
**Special notes for your reviewer**:
This is intended to be enabled by default on 2/14/2017 per the plan outlined here:
https://github.com/kubernetes/community/pull/314
**Release note**:
```release-note
Each pod has its own associated cgroup by default.
```
Automatic merge from submit-queue (batch tested with PRs 41844, 41803, 39116, 41129, 41240)
NPD: Update NPD test.
For https://github.com/kubernetes/node-problem-detector/issues/58.
Update NPD e2e test based on the new behavior.
Note that before merging this PR, we need to merge all pending PRs in npd, and release the v0.3.0-alpha.1 version of NPD.
/cc @dchen1107 @kubernetes/node-problem-detector-reviewers
Automatic merge from submit-queue (batch tested with PRs 41844, 41803, 39116, 41129, 41240)
Allow for not-ready pods in large clusters
This is to workaround issues with non-starting pods in large clusters in roughly 1/3rd of runs.
Automatic merge from submit-queue (batch tested with PRs 41844, 41803, 39116, 41129, 41240)
test: fetch updated deployment before finding new and old rss
@krousey @janetkuo ptal
Ref https://github.com/kubernetes/kubernetes/issues/41518
Automatic merge from submit-queue (batch tested with PRs 41364, 40317, 41326, 41783, 41782)
Debug what is hapening in large clusters
What I'm seeing in large clusters is:
```
I0219 19:34:29.994] [90m/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/common/secrets.go:44[0m
I0219 19:34:29.994] [90m------------------------------[0m
I0219 21:27:11.421] Dumping master and node logs to /workspace/_artifacts
I0219 21:27:11.422] Master SSH not supported for gke
```
i have no idea what is happening during those 2 hours, and would like to understand this.
Automatic merge from submit-queue
[Federation] Modify the comments in Federation E2E tests to use standard Go conventions for documentation comments
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 41709, 41685, 41754, 41759, 37237)
Projected volume plugin
This is a WIP volume driver implementation as noted in the commit for https://github.com/kubernetes/kubernetes/pull/35313.
change to GetOriginalConfiguration
add bazel
refactor apply view-last-applied command
update some changes
minor change
add unit tests, update
update some codes and genreate docs
update LongDesc
Automatic merge from submit-queue (batch tested with PRs 41706, 39063, 41330, 41739, 41576)
[Kubemark] Add option to log hollow-node logs
Ref https://github.com/kubernetes/kubernetes/issues/41613
Added an option to log kubemark hollow-node logs which includes kubelet, kubeproxy and npd logs for each hollow-node.
Setting the env var `ENABLE_HOLLOW_NODE_LOGS=true` should now enable logging for tests.
cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek @yujuhong @Random-Liu
Automatic merge from submit-queue (batch tested with PRs 41421, 41440, 36765, 41722)
Use watch param instead of deprecated /watch/ prefix
Switches clients to use watch param instead of /watch/ prefix
```release-note
Clients now use the `?watch=true` parameter to make watch API calls, instead of the `/watch/` path prefix
```
Automatic merge from submit-queue (batch tested with PRs 41421, 41440, 36765, 41722)
ResourceQuota ability to support default limited resources
Add support for the ability to configure the quota system to identify specific resources that are limited by default. A limited resource means its consumption is denied absent a covering quota. This is in contrast to the current behavior where consumption is unlimited absent a covering quota. Intended use case is to allow operators to restrict consumption of high-cost resources by default.
Example configuration:
**admission-control-config-file.yaml**
```
apiVersion: apiserver.k8s.io/v1alpha1
kind: AdmissionConfiguration
plugins:
- name: "ResourceQuota"
configuration:
apiVersion: resourcequota.admission.k8s.io/v1alpha1
kind: Configuration
limitedResources:
- resource: pods
matchContains:
- pods
- requests.cpu
- resource: persistentvolumeclaims
matchContains:
- .storageclass.storage.k8s.io/requests.storage
```
In the above configuration, if a namespace lacked a quota for any of the following:
* cpu
* any pvc associated with particular storage class
The attempt to consume the resource is denied with a message stating the user has insufficient quota for the matching resources.
```
$ kubectl create -f pvc-gold.yaml
Error from server: error when creating "pvc-gold.yaml": insufficient quota to consume: gold.storageclass.storage.k8s.io/requests.storage
$ kubectl create quota quota --hard=gold.storageclass.storage.k8s.io/requests.storage=10Gi
$ kubectl create -f pvc-gold.yaml
... created
```
Automatic merge from submit-queue
Fix kubemark default e2e test suite's name
Seems like the suite "[Feature:performance]" doesn't trigger tests anymore. Changed it to "[Feature:Performance]" in kubemark run-e2e-tests.sh.
cc @wojtek-t @gmarek
Automatic merge from submit-queue
Adds kube-public to the whitelist to not be deleted for e2e tests
We added the `kube-public` namespace but didn't add it to a whitelist of namespaces to not delete as part of e2e cleanup.
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 39373, 41585, 41617, 41707, 39958)
[Federation][e2e] Remove ns creation in federated clusters
**What this PR does / why we need it**:
In federation e2e, framework creates a namespace for each test case. the same ns is supposed to be created in federated clusters. Due to issues in namespace controller, this was not working earlier. but now it is working.
so currently the namespace is created twice, once by namespace controller and another when we call `getRegisteredClusters`. depending on the timing of these 2 calls, some [test cases fails ](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-federation/1199#k8sio-federation-secrets-featurefederation-secret-objects-should-not-be-deleted-from-underlying-clusters-when-orphandependents-is-true). So removing the creation of namespace when `getRegisteredClusters` which is unnecessary.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Fixes flakes in federation e2e.
cc @madhusudancs @nikhiljindal @kubernetes/sig-federation-bugs
Automatic merge from submit-queue (batch tested with PRs 39373, 41585, 41617, 41707, 39958)
Owners file related changes for kubectl and docs contributors
- adding a command to kubectl updates the root .generated_docs file requiring root level approval: move .generated_docs under docs/
- run hack/update-generated-docs.sh so the docs are up to date
- add kubectl contributors to test/OWNERS and test/fixtures/pkg/kubectl/OWNERS so they can approve kubectl e2e test changes
```release-note
NONE
```
Automatic merge from submit-queue
Add standalone npd on GCI.
This PR added standalone NPD in GCE GCI cluster. I already verified the PR, and it should work.
/cc @dchen1107 @fabioy @andyxning @kubernetes/sig-node-misc
Automatic merge from submit-queue (batch tested with PRs 41401, 41195, 41664, 41521, 41651)
Remove default failure domains from anti-affinity feature
Removing it is necessary to make performance of this feature acceptable at some point.
With default failure domains (or in general when multiple topology keys are possible), we don't have transitivity between node belonging to a topology. And without this, it's pretty much impossible to solve this effectively.
@timothysc
Automatic merge from submit-queue (batch tested with PRs 41604, 41273, 41547)
Switch pv controller to shared informer
This is WIP because I still need to do something with bazel? and add 'get storageclasses' to the controller-manager rbac role
@jsafrane PTAL and make sure I did not break anything in the PV controller. Do we need to clone the volumes/claims we get from the shared informer before we use them? I could not find a place where we modify them but you would know for certain.
cc @ncdc because I copied what you did in your other PRs.
Automatic merge from submit-queue
Add cluster logging load test for GCL
Changed code around logging tests a little to avoid duplication. Added GCL go library, because command line tool cannot handle required number of logs effectively.
Related to https://github.com/kubernetes/kubernetes/issues/34310
@piosz
Automatic merge from submit-queue (batch tested with PRs 41548, 41221)
StatefulSet Upgrade Test
Adds StatefulSet upgrade tests and moves common functionality into the framework package. This removes the potential for cyclic dependencies while allowing for code reuse.
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 41606, 41475)
Modify kubemark run-e2e-tests.sh to run right command based on environment
In order to run-e2e-tests.sh in kubemark, we currently need to toggle comment b/w the last 2 lines of the script. This is inconvenient to change each time and sometimes this change gets accidentally included as part of a PR. It's even difficult for someone newly trying kubemark to figure this out in order to run tests. Changed the logic to use the right command based on whether the script is running from within a docker container or just locally.
cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue (batch tested with PRs 41517, 41494, 41163)
Deployment: filter out old RSes that are deleted or with non-zero replicas before cleanup
Fixes#36379
cc @zmerlynn @yujuhong @kargakis @kubernetes/sig-apps-bugs
Automatic merge from submit-queue (batch tested with PRs 41517, 41494, 41163)
Make scheduler_perf warn rather then fail if scheduling rate is betwe…
**What this PR does / why we need it**
Fix scheduler per bench tests so they pass even if there is a cold interval .
**Special notes for your reviewer**:
This wont effect CI, as this bench test is run manually by devs.
This is the first step in a two step change. First is to move the file.
The second step is to change the package and callers. It needs to be
two steps in order to correctly transfer commit history.
Automatic merge from submit-queue (batch tested with PRs 40505, 34664, 37036, 40726, 41595)
Allow `make test-integration` to pass on OSX
**What this PR does / why we need it**: `make test-integration` isn't passing on my OSX setup (10.11.6, go1.7, docker 1.13.1). Tests that startup an api server fail because the default `cert-dir` of `/var/run/kubernetes` isn't world-writable. Use a tempdir instead.
**Release note**:
```release-note
NONE
```
/cc @kubernetes/sig-testing-pr-reviews
Adds SIG owners to e2e tests in order to support assigning test flakes and general test failures to SIGs rather than relying solely on individuals. Ideally this will help spread the maintenance burden across more This work is part of closing https://github.com/kubernetes/contrib/issues/2389
Automatic merge from submit-queue (batch tested with PRs 41505, 41484, 41544, 41514, 41022)
[Federation] Fixes the newly-added test for replicaset deletion upon namespace deletion.
See #41278 for the original test.
**Release note**:
```release-note
NONE
```
`/var/run` is not world-writable on my OSX 10.11.x setup, so tests that
standup a secure apiserver fail with the default cert dir. Use a
tempdir instead.
Automatic merge from submit-queue (batch tested with PRs 41466, 41456, 41550, 41238, 41416)
Revert "Modify load test to not create too may services/endpoints"
Reverts kubernetes/kubernetes#40798
We seem to be ready to go for it now.
Automatic merge from submit-queue
Ingress e2e typos
**What this PR does / why we need it**: fix typos in e2e test
**Special notes for your reviewer**: none
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Remove alpha provisioning
This is the first part of https://github.com/kubernetes/features/issues/36
@kubernetes/sig-storage-misc
**Release note**:
```release-note
Alpha version of dynamic volume provisioning is removed in this release. Annotation
"volume.alpha.kubernetes.io/storage-class" does not have any special meaning. A default storage class
and DefaultStorageClass admission plugin can be used to preserve similar behavior of Kubernetes cluster,
see https://kubernetes.io/docs/user-guide/persistent-volumes/#class-1 for details.
```
Automatic merge from submit-queue
Switch serviceaccounts controller to generated shared informers
Originally part of #40097
cc @deads2k @sttts @liggitt @smarterclayton @gmarek @wojtek-t @timothysc @kubernetes/sig-scalability-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 37137, 41506, 41239, 41511, 37953)
Add field to control service account token automounting
Fixes https://github.com/kubernetes/kubernetes/issues/16779
* adds an `automountServiceAccountToken *bool` field to `ServiceAccount` and `PodSpec`
* if set in both the service account and pod, the pod wins
* if unset in both the service account and pod, we automount for backwards compatibility
```release-note
An `automountServiceAccountToken *bool` field was added to ServiceAccount and PodSpec objects. If set to `false` on a pod spec, no service account token is automounted in the pod. If set to `false` on a service account, no service account token is automounted for that service account unless explicitly overridden in the pod spec.
```
Automatic merge from submit-queue (batch tested with PRs 37137, 41506, 41239, 41511, 37953)
Fix typos in e2e
**What this PR does / why we need it**: fix typos in e2e test
**Special notes for your reviewer**: none
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 37137, 41506, 41239, 41511, 37953)
e2e test for storage class diskformat verification for vsphere cloud provider
**What this PR does / why we need it**:
This PR adds a new e2e test for vsphere cloud provider.
Test is to verify diskformat specified in storage-class is being honored while volume creation.
Steps:
1. Create StorageClass with diskformat set to valid type (supported options are `eagerzeroedthick`, `zeroedthick` and `thin`)
2. Create PVC which uses the StorageClass created in step 1.
3. Wait for PV to be provisioned.
4. Wait for PVC's status to become Bound
5. Create POD using PVC on specific node.
6. Wait for Disk to be attached to the node.
7. Get node VM's devices and find PV's Volume Disk.
8. Get Backing Info of the Volume Disk and obtain Property of `VirtualDiskFlatVer2BackingInfo` - `EagerlyScrub` and `ThinProvisioned`
9. Based on the value of `EagerlyScrub` and `ThinProvisioned`, verify if diskformat is correct.
10. Delete POD and Wait for Volume Disk to be detached from the Node.
11. Delete PVC, PV and Storage Class
**Which issue this PR fixes** *
fixes #
**Special notes for your reviewer**:
Test is executed against v1.6.0-alpha.1
Test is failing on v1.4.8
**Release Note**
```release-note
NONE
```
@kerneltime @BaluDontu @abrarshivani please review this PR
Automatic merge from submit-queue (batch tested with PRs 37137, 41506, 41239, 41511, 37953)
Bump addon-manager version to v6.4-alpha.1 in kubemark
Fixes https://github.com/kubernetes/kubernetes/issues/41493
cc @wojtek-t @liggitt
Automatic merge from submit-queue
Change node e2e cri-validation configs
Copy the configs to a new directory to test non-cri implementation. We can
remove the original directory after the dependent PRs are merged.
Automatic merge from submit-queue
Switch resourcequota controller to shared informers
Originally part of #40097
I have had some issues with this change in the past, when I updated `pkg/quota` to use the new informers while `pkg/controller/resourcequota` remained on the old informers. In this PR, both are switched to using the new informers. The issues in the past were lots of flakey test failures in the ResourceQuota e2es, where it would randomly fail to see deletions and handle replenishment. I am hoping that now that everything here is consistently using the new informers, there won't be any more of these flakes, but it's something to keep an eye out for.
I also think `pkg/controller/resourcequota` could be cleaned up. I don't think there's really any need for `replenishment_controller.go` any more since it's no longer running individual controllers per kind to replenish. It instead just uses the shared informer and adds event handlers to it. But maybe we do that in a follow up.
cc @derekwaynecarr @smarterclayton @wojtek-t @deads2k @sttts @liggitt @timothysc @kubernetes/sig-scalability-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 41332, 41069, 41470, 41474)
Update test owners
@nikhiljindal I've noticed you've duplicated the `test_owners.csv` contents in c1c2a12 was that intentional. I'm removing it here, since it's failing `hack/update_owners.py`
Automatic merge from submit-queue (batch tested with PRs 41134, 41410, 40177, 41049, 41313)
apiserver: further cleanup of apiserver storage plumbing
- move kubeapiserver`s `RESTOptionsFactory` back to EtcdOptions by adding a `AddWithStorageFactoryTo`
- factor out storage backend `Config` construction from EtcdOptions
- move all `StorageFactory` related code into server/storage subpackage.
In short: remove my stomach ache about `kubeapiserver.RESTOptionsFactory`.
approved based on #40363
Automatic merge from submit-queue (batch tested with PRs 41134, 41410, 40177, 41049, 41313)
PV E2E: dynamically provisioned volume should not create in unmanaged zone
**What this PR does / why we need it**:
Adds e2e test to that attempts to provision a volume in an unmanaged zone and fails on success. This is to catch regressions of #31948.
cc @jeffvance
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 41134, 41410, 40177, 41049, 41313)
Isolate recycler behavior in PV E2E
**What this PR does / why we need it**:
Sets the default `reclaimPolicy` for PV E2E to `Retain` and isolates `Recycle` tests to their own context. The purpose of this is to future proof the PV test suite against the possible deprecation of the `Recycle` behavior. This is done by consolidating recycling test code into a single Context block that can be removed en masse without affecting the test suite.
Secondly, adds a liveliness check for the NFS server pod prior to each test to avoid maxing out timeouts if the NFS server becomes unavailable.
cc @saad-ali @jeffvance
Automatic merge from submit-queue (batch tested with PRs 41134, 41410, 40177, 41049, 41313)
Refactored kubemark code into provider-specific and provider-independent parts [Part-3]
Fixes#38967
Applying final part of the changes in PR #39033 (which refactored kubemark code completely). The changes included in this PR are:
- Removed `test/kubemark/common.sh` and moved relevant parts of its code to the right places in start-kubemark/stop-kubemark scripts.
- Added DOCKER_REGISTRY, PROJECT, KUBEMARK_IMAGE_MAKE_TARGET variables to `/test/kubemark/cloud-provider-config.sh` to make the kubemark image push location variable wrt provider.
- Removed get-real-pod-for-hollow-node.sh as it doesn't seem to do anything useful.
@kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue (batch tested with PRs 40297, 41285, 41211, 41243, 39735)
Secure kube-scheduler
This PR:
* Adds a bootstrap `system:kube-scheduler` clusterrole
* Adds a bootstrap clusterrolebinding to the `system:kube-scheduler` user
* Sets up a kubeconfig for kube-scheduler on GCE (following the controller-manager pattern)
* Switches kube-scheduler to running with kubeconfig against secured port (salt changes, beware)
* Removes superuser permissions from kube-scheduler in local-up-cluster.sh
* Adds detailed RBAC deny logging
```release-note
On kube-up.sh clusters on GCE, kube-scheduler now contacts the API on the secured port.
```
Automatic merge from submit-queue
delete cadvisor pod after test
tracing looks at events for pod deletion and volume teardown. SInce the cadvisor pod has more than 1 volume, this can make results harder to analyze.
This PR moves the deletion of the cadvisor pod to after the logPodCreateThroughput call, since that marks the "end" of the test.
cc: @dchen1107 @Random-Liu
Automatic merge from submit-queue (batch tested with PRs 41216, 41362, 41275, 41277, 41412)
Fix statefulset e2e test
...by removing the liveness/readiness probes from the cockroachdb
manifests, as explained in
https://github.com/kubernetes/test-infra/issues/1740#issuecomment-279555187
@kow3ns @spxtr
Added nfs-server verification between tests; prototyped recycler test
recycler test in working state, needs clean up/comments
comments added; gets mount path from pod instead of hard code
typos
removed NFS server checker, will be handled in separate PR
catch err; inaccurate comment
gofmt; typo
Automatic merge from submit-queue (batch tested with PRs 41382, 41407, 41409, 41296, 39636)
Update to use proxy subresource consistently
Proxy subresources have been in place since 1.2.0 and improve the ability to put policy in place around proxy access.
This PR updates the last few clients to use proxy subresources rather than the root proxy
Automatic merge from submit-queue (batch tested with PRs 41382, 41407, 41409, 41296, 39636)
implement configmap upgrade test
**What this PR does / why we need it**: Add an automated ConfigMap upgrade test to the e2e upgrade test suite. The test creates a ConfigMap and verifies it can be consumed by pods before/after upgrade. See PR #39953 and #40747 for context.
**Special notes for your reviewer**:
@krousey please review for consistency with new upgrade test interface. I copied heavily from secrets test for this. I will implement similar tests for DaemonSets and Jobs next. Note that I have not run this test locally.
**Release note**:
`NONE`
Automatic merge from submit-queue (batch tested with PRs 41299, 41325, 41386, 41329, 41418)
Fix resource leak in federation e2e tests and another issue
**What this PR does / why we need it**:
The cleanup after federation service e2e tests is not effective as this function cleanupServiceShardsAndProviderResources is getting called with empty string for namespace ("nsName") because the nsName variable is getting redefined.
Another issue is we are prematurely exiting the Poll in waitForServiceOrFail and the error check is incorrect.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
Fixing the 2 issues mentioned above.
**Special notes for your reviewer**:
**Release note**:
`NONE`
cc @madhusudancs @kubernetes/sig-federation-bugs
...by removing the liveness/readiness probes from the cockroachdb
manifests, as explained in
github.com/kubernetes/test-infra/issues/1740#issuecomment-279555187
Automatic merge from submit-queue (batch tested with PRs 41342, 41257)
Move two flaky e2e tests to the flaky suite.
cc @kargakis @davidopp
We should have moved these a long time ago.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 41357, 41178, 41280, 41184, 41278)
[Federation] Add an end-to-end test verifying that deleting a federated namespace deletes child replicasets.
Verifies #38225.
Also, remove a few custom package aliases.
Automatic merge from submit-queue (batch tested with PRs 41357, 41178, 41280, 41184, 41278)
Switch RBAC subject apiVersion to apiGroup in v1beta1
Referencing a subject from an RBAC role binding, the API group and kind of the subject is needed to fully-qualify the reference.
The version is not, and adds complexity around re-writing the reference when returning the binding from different versions of the API, and when reconciling subjects.
This PR:
* v1beta1: change the subject `apiVersion` field to `apiGroup` (to match roleRef)
* v1alpha1: convert apiVersion to apiGroup for backwards compatibility
* all versions: add defaulting for the three allowed subject kinds
* all versions: add validation to the field so we can count on the data in etcd being good until we decide to relax the apiGroup restriction
```release-note
RBAC `v1beta1` RoleBinding/ClusterRoleBinding subjects changed `apiVersion` to `apiGroup` to fully-qualify a subject. ServiceAccount subjects default to an apiGroup of `""`, User and Group subjects default to an apiGroup of `"rbac.authorization.k8s.io"`.
```
@deads2k @kubernetes/sig-auth-api-reviews @kubernetes/sig-auth-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 41357, 41178, 41280, 41184, 41278)
fix flaky host cleanup test
**What this PR does / why we need it**:
Fixes 2 flakes in the "HostCleanup tests in e2e/_kubelet.go_
Also does some very minor refactoring.
**Which issue this PR fixes**
This is an improved fix for issue [31272](https://github.com/kubernetes/kubernetes/issues/31272)
**Special notes for your reviewer**:
```release-note
NONE
```
Automatic merge from submit-queue
Fixes Hazelcast example e2e test
**What this PR does / why we need it**:
This PR fixes the Hazelcast example e2e test
**Special notes for your reviewer**:
It is related to this PR https://github.com/kubernetes/kubernetes/pull/39580
Automatic merge from submit-queue (batch tested with PRs 41274, 41241)
[Federation] Make federation namespace e2e tests parallelizable.
Because deleteAllTestNamespaces deleted all the e2e namespaces it interefered with other federation namespace tests running in parallel. This change should mitigate the problem and make the tests runnable in parallel.
cc @kubernetes/sig-federation-pr-reviews
Automatic merge from submit-queue
Add e2e test for external provisioners
this is a retry of https://github.com/kubernetes/kubernetes/pull/39545
This time around:
* take advantage of the system:persistent-volume-provisioner bootstrap cluster role to grant the external provisioner pod serviceaccount permissions
* add storageclass suffix so that the first and third test don't conflict when one creates the storageclass with the same name before the other
also tested more thoroughly myself on gce :)
@jsafrane if you would like to re-review
Automatic merge from submit-queue (batch tested with PRs 41248, 41214)
Switch hpa controller to shared informer
**What this PR does / why we need it**: switch the hpa controller to use a shared informer
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**: Only the last commit is relevant. The others are from #40759, #41114, #41148
**Release note**:
```release-note
```
cc @smarterclayton @deads2k @sttts @liggitt @DirectXMan12 @timothysc @kubernetes/sig-scalability-pr-reviews @jszczepkowski @mwielgus @piosz
Automatic merge from submit-queue (batch tested with PRs 39418, 41175, 40355, 41114, 32325)
add e2e tests for replicasets with weight, min and max replicas
e2e test with weight, min and max replicas set
#31904#32014
@quinton-hoole @nikhiljindal @deepak-vij @kshafiee @mwielgus
Automatic merge from submit-queue (batch tested with PRs 39418, 41175, 40355, 41114, 32325)
TaintController
```release-note
This PR adds a manager to NodeController that is responsible for removing Pods from Nodes tainted with NoExecute Taints. This feature is beta (as the rest of taints) and enabled by default. It's gated by controller-manager enable-taint-manager flag.
```
Automatic merge from submit-queue (batch tested with PRs 41112, 41201, 41058, 40650, 40926)
[Federation][e2e] Fix few flakes in federation e2e tests
**What this PR does / why we need it**:
Fixes few flakes in #37105
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes # partly fixes few test cases in the above mentioned issue.
**Special notes for your reviewer**:
While cleaning up in AfterEach Block some objects are returned while listing, but by the time the object is delete is issued the object is disappearing resulting in this flake occasionally.
To fix this, we need to check if the err is NotFound while deleting, its ok and need not fail the test.
**Release note**: `NONE`
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 41112, 41201, 41058, 40650, 40926)
Promote TokenReview to v1
Peer to https://github.com/kubernetes/kubernetes/pull/40709
We have multiple features that depend on this API:
- [webhook authentication](https://kubernetes.io/docs/admin/authentication/#webhook-token-authentication)
- [kubelet delegated authentication](https://kubernetes.io/docs/admin/kubelet-authentication-authorization/#kubelet-authentication)
- add-on API server delegated authentication
The API has been in use since 1.3 in beta status (v1beta1) with negligible changes:
- Added a status field for reporting errors evaluating the token
This PR promotes the existing v1beta1 API to v1 with no changes
Because the API does not persist data (it is a query/response-style API), there are no data migration concerns.
This positions us to promote the features that depend on this API to stable in 1.7
cc @kubernetes/sig-auth-api-reviews @kubernetes/sig-auth-misc
```release-note
The authentication.k8s.io API group was promoted to v1
```
Because deleteAllTestNamespaces deleted all the e2e namespaces
it interefered with other federation namespace tests running in
parallel. This change should mitigate the problem and make the
tests runnable in parallel.
Automatic merge from submit-queue (batch tested with PRs 40796, 40878, 36033, 40838, 41210)
StatefulSet hardening
**What this PR does / why we need it**:
This PR contains the following changes to StatefulSet. Only one change effects the semantics of how the controller operates (This is described in #38418), and this change only brings the controller into conformance with its documented behavior.
1. pcb and pcb controller are removed and their functionality is encapsulated in StatefulPodControlInterface. This class modules the design contoller.PodControlInterface and provides an abstraction to clientset.Interface which is useful for testing purposes.
2. IdentityMappers has been removed to clarify what properties of a Pod are mutated by the controller. All mutations are performed in the UpdateStatefulPod method of the StatefulPodControlInterface.
3. The statefulSetIterator and petQueue classes are removed. These classes sorted Pods by CreationTimestamp. This is brittle and not resilient to clock skew. The current control loop, which implements the same logic, is in stateful_set_control.go. The Pods are now sorted and considered by their ordinal indices, as is outlined in the documentation.
4. StatefulSetController now checks to see if the Pods matching a StatefulSet's Selector also match the Name of the StatefulSet. This will make the controller resilient to overlapping, and will be enhanced by the addition of ControllerRefs.
5. The total lines of production code have been reduced, and the total number of unit tests has been increased. All new code has 100% unit coverage giving the module 83% coverage. Tests for StatefulSetController have been added, but it is not practical to achieve greater coverage in unit testing for this code (the e2e tests for StatefulSet cover these areas).
6. Issue #38418 is fixed in that StaefulSet will ensure that all Pods that are predecessors of another Pod are Running and Ready prior to launching a new Pod. This removes the potential for deadlock when a Pod needs to be rescheduled while its predecessor is hung in Pending or Initializing.
7. All reference to pet have been removed from the code and comments.
**Which issue this PR fixes**
fixes #38418,#36859
**Special notes for your reviewer**:
**Release note**:
```release-note
Fixes issue #38418 which, under circumstance, could cause StatefulSet to deadlock.
Mediates issue #36859. StatefulSet only acts on Pods whose identity matches the StatefulSet, providing a partial mediation for overlapping controllers.
```
Automatic merge from submit-queue (batch tested with PRs 40796, 40878, 36033, 40838, 41210)
Implement TTL controller and use the ttl annotation attached to node in secret manager
For every secret attached to a pod as volume, Kubelet is trying to refresh it every sync period. Currently Kubelet has a ttl-cache of secrets of its pods and the ttl is set to 1 minute. That means that in large clusters we are targetting (5k nodes, 30pods/node), given that each pod has a secret associated with ServiceAccount from its namespaces, and with large enough number of namespaces (where on each node (almost) every pod is from a different namespace), that resource in ~30 GETs to refresh all secrets every minute from one node, which gives ~2500QPS for GET secrets to apiserver.
Apiserver cannot keep up with it very easily.
Desired solution would be to watch for secret changes, but because of security we don't want a node watching for all secrets, and it is not possible for now to watch only for secrets attached to pods from my node.
So as a temporary solution, we are introducing an annotation that would be a suggestion for kubelet for the TTL of secrets in the cache and a very simple controller that would be setting this annotation based on the cluster size (the large cluster is, the bigger ttl is).
That workaround mean that only very local changes are needed in Kubelet, we are creating a well separated very simple controller, and once watching "my secrets" will be possible it will be easy to remove it and switch to that. And it will allow us to reach scalability goals.
@dchen1107 @thockin @liggitt
Automatic merge from submit-queue (batch tested with PRs 40917, 41181, 41123, 36592, 41183)
fix scheduler performance test script
**What this PR does / why we need it**:
`test-performance.sh` is in dir `kubernetes/test/integration/scheduler_perf`
the dir `kubernetes/test/component/scheduler/perf` does not exist
Thanks.
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 41074, 41147, 40854, 41167, 40045)
Fix some funky funcs.
This is code cleanup. Fix function declarations and remove stale comment.
Automatic merge from submit-queue (batch tested with PRs 41074, 41147, 40854, 41167, 40045)
Upgrade test for deployments
Upgrade test for Deployments. Should prevent issues like https://github.com/kubernetes/kubernetes/issues/40415 in the future.
@krousey @janetkuo @soltysh
Haven't managed to run it locally...
```
$ go run hack/e2e.go --up --test --test_args="--ginkgo.focus=\[Feature:MasterUpgrade\] --upgrade-target=ci/latest --upgrade-image=gci"
2017/02/02 11:43:22 e2e.go:946: Running: ./hack/e2e-internal/e2e-down.sh
2017/02/02 11:43:22 e2e.go:948: Step './hack/e2e-internal/e2e-down.sh' finished in 7.278236ms
2017/02/02 11:43:22 e2e.go:946: Running: ./hack/e2e-internal/e2e-up.sh
2017/02/02 11:43:22 e2e.go:948: Step './hack/e2e-internal/e2e-up.sh' finished in 5.286328ms
2017/02/02 11:43:22 e2e.go:946: Running: ./cluster/kubectl.sh version --match-server-version=false
2017/02/02 11:43:22 e2e.go:948: Step './cluster/kubectl.sh version --match-server-version=false' finished in 213.847259ms
2017/02/02 11:43:22 e2e.go:946: Running: ./hack/e2e-internal/e2e-status.sh
2017/02/02 11:43:22 e2e.go:948: Step './hack/e2e-internal/e2e-status.sh' finished in 103.253183ms
2017/02/02 11:43:22 e2e.go:230: Something went wrong: encountered 2 errors: [exit status 1 exit status 1]
exit status 1
```
@krousey any eta for when the upgrade framework will be integrated in the pr builder?
Automatic merge from submit-queue (batch tested with PRs 41121, 40048, 40502, 41136, 40759)
add k8s.io/sample-apiserver to demonstrate how to build an aggregated API server
builds on https://github.com/kubernetes/kubernetes/pull/41093
This creates a sample API server is a separate staging repo to guarantee no cheating with `k8s.io/kubernetes` dependencies. The sample is run during integration tests (simple tests on it so far) to ensure that it continues to run.
@sttts @kubernetes/sig-api-machinery-misc ptal
@pwittrock @pmorie @kris-nova an aggregated API server example that will stay up to date.
Automatic merge from submit-queue (batch tested with PRs 41121, 40048, 40502, 41136, 40759)
Remove deprecated kubelet flags that look safe to remove
Removes:
```
--config
--auth-path
--resource-container
--system-container
```
which have all been marked deprecated since at least 1.4 and look safe to remove.
```release-note
The deprecated flags --config, --auth-path, --resource-container, and --system-container were removed.
```
Automatic merge from submit-queue (batch tested with PRs 41145, 38771, 41003, 41089, 40365)
Use privileged containers for statefulset e2e tests
Test containers need to run as spc_t in order to interact with the host
filesystem under /tmp, as the tests for StatefulSet are doing. Docker
will transition the container into this domain when running the container
as privileged.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
**Release note**:
```release-note
NONE
```
/cc @ncdc @soltysh @pmorie
1. pcb and pcb controller are removed and their functionality is
encapsulated in StatefulPodControlInterface.
2. IdentityMappers has been removed to clarify what properties of a Pod are
mutated by the controller. All mutations are performed in the
UpdateStatefulPod method of the StatefulPodControlInterface.
3. The statefulSetIterator and petQueue classes are removed. These classes
sorted Pods by CreationTimestamp. This is brittle and not resilient to
clock skew. The current control loop, which implements the same logic,
is in stateful_set_control.go. The Pods are now sorted and considered by
their ordinal indices, as is outlined in the documentation.
4. StatefulSetController now checks to see if the Pods matching a
StatefulSet's Selector also match the Name of the StatefulSet. This will
make the controller resilient to overlapping, and will be enhanced by
the addition of ControllerRefs.