Automatic merge from submit-queue (batch tested with PRs 56688, 56577). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add pvc as part of equivalence hash
**What this PR does / why we need it**:
Should add PVC as part of equivalence hash so that `StatefulSe`t and `Operator` will always run the volume predicate, while the `ReplicaSet` can still re-use cached ones.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#56265
**Special notes for your reviewer**:
**Release note**:
```release-note
Add pvc as part of equivalence hash
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
CSI - Fix feature gate bug and add bootstrap RBAC rules
**What this PR does / why we need it**:
This PR addresses show-stopper bug https://github.com/kubernetes/kubernetes/issues/56532. It fixes the faulty feature gate logic and adds RBAC rules for kube-controller-manager and kubelet that allows `VolumeAttachment` API operations against the api-server.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#56532, #56667
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
resource limits are satisfied by the input node's allocatable resources or not.
If yes, the node is assigned a score of 1, otherwise the node's score is not changed.
This admission plugin puts finalizer to every created PVC. The finalizer is
removed by PVCProtectionController when the PVC is not referenced by any
pods and thus the PVC can be deleted.
Automatic merge from submit-queue (batch tested with PRs 55952, 49112, 55450, 56178, 56151). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add PodDisruptionBudget support in pod preemption
**What this PR does / why we need it**:
This PR adds the logic to make scheduler preemption aware of PodDisruptionBudget. Preemption tries to avoid preempting pods whose PDBs are violated by preemption. If preemption does not find any other pods to preempt, it will preempt pods despite violating their PDBs.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#53913
**Special notes for your reviewer**:
**Release note**:
```release-note
Add PodDisruptionBudget support during pod preemption
```
ref/ #47604
/sig scheduling
Automatic merge from submit-queue (batch tested with PRs 52767, 55065, 55148, 56228, 56221). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add resize support for ceph RBD
Add resize support for ceph RBD
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: part of [#657](https://github.com/kubernetes/community/pull/657)
**Special notes for your reviewer**:
**Release note**:
```release-note
Add resize support for ceph RBD
```
WIP, need to add fs resize,
assign to myself first
/assign @NickrenREN
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fixtypo
**What this PR does / why we need it**:
fixtypo
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
None
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
[PodSecurityPolicy] Optimize authorization check
**What this PR does / why we need it**:
Authorizing PodSecurityPolicy use may involve a remote call, and can be slow. Rather than authorizing the user / SA for every policy in the cluster, only test authz for the policies under which the pod is valid.
This is a big improvement in the case where there are a lot of policies for which the pod is not valid (benchmark below), but should also help when the pod is valid under other policies, as it allows the authorization to short-circuit on the first accepted policy.
**Benchmark:**
Highlight from scale testing (see https://docs.google.com/document/d/1IIcHHE_No1KMAybW5krIphdN325eGa2sxF2eqg2YAPI/edit for the full results). These were run with 1000 policies under which the pods were not valid, and had no role bindings.
| | method | resource | 50th percentile | 90th percentile | 99th percentile
| -- | -- | -- | -- | -- | --
| 1.8 HEAD | POST | pods | 8.696784s | 20.497659s | 22.472421s
| 1.8 With fix | POST | pods | 25.454ms | 29.068ms | 85.817ms
(I didn't benchmark master, but expect the difference to be more drastic, since the authorization is run twice - for both Admit and Validate)
**Which issue(s) this PR fixes**:
Fixes#55521
**Special notes for your reviewer**:
The validation errors are no longer totally accurate, as they may include errors from PSPs that the user/pod isn't authorized to use. However, I think this is a worthwhile tradeoff. If this is a big concern, we could authorize all policies in the case where none admitted /validated the pod.
**Release note**:
```release-note
Improved PodSecurityPolicy admission latency, but validation errors are no longer limited to only errors from authorized policies.
```
Automatic merge from submit-queue (batch tested with PRs 51321, 55969, 55039, 56183, 55976). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Topology aware volume scheduler and PV controller changes
**What this PR does / why we need it**:
Scheduler and PV controller changes to support volume topology aware scheduling, as specified in kubernetes/community#1168
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#54435
**Special notes for your reviewer**:
* I've split the PR into logical commits to make it easier to review
* The remaining TODOs I plan to address next release unless you think it needs to be done now
**Release note**:
```release-note
Adds alpha support for volume scheduling, which allows the scheduler to make PersistentVolume binding decisions while respecting the Pod's scheduling requirements. Dynamic provisioning is not supported with this feature yet.
Action required for existing users of the LocalPersistentVolumes alpha feature:
* The VolumeScheduling feature gate also has to be enabled on kube-scheduler and kube-controller-manager.
* The NoVolumeNodeConflict predicate has been removed. For non-default schedulers, update your scheduler policy.
* The CheckVolumeBinding predicate has to be enabled in non-default schedulers.
```
@kubernetes/sig-storage-pr-reviews @kubernetes/sig-scheduling-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 55103, 56036, 56186). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Removed opaque integer resources (deprecated in v1.8)
**What this PR does / why we need it**:
* Remove opaque integer resources (OIR) support from the code base. This feature was deprecated in v1.8 and replaced by Extended Resources (ER).
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#55102
**Release note**:
```release-note
Remove opaque integer resources (OIR) support (deprecated in v1.8.)
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Implement volume resize for cinder
**What this PR does / why we need it**:
resize for cinder
xref: [resize proposal](https://github.com/kubernetes/community/pull/657)
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: xref https://github.com/kubernetes/community/pull/657
Follow up: #49727
**Special notes for your reviewer**:
**Release note**:
```release-note
Implement volume resize for cinder
```
wip, assign to myself first
/assign @NickrenREN
Automatic merge from submit-queue (batch tested with PRs 55812, 55752, 55447, 55848, 50984). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Make versioned types for webhook admission config
Versioned webhook admission config type as promised in https://github.com/kubernetes/kubernetes/pull/54414.
@kubernetes/sig-api-machinery-pr-reviews
@ericchiang as promised. fyi.
```yaml
kind: AdmissionConfiguration
apiVersion: apiserver.k8s.io/v1alpha1
plugins:
- name: GenericAdmissionWebhook
configuration:
kind: WebhookAdmission
apiVersion: apiserver.config.k8s.io/v1alpha1
kubeConfigFile: /path/to/my/file
```
`ADMISSION_CONTROL_CONFIG_FILE=../foo.yaml hack/local-up-cluster.sh`
Automatic merge from submit-queue (batch tested with PRs 56128, 56004, 56083, 55833, 56042). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Suppress the warning when a pod in binding cannot be expired
**What this PR does / why we need it**:
I have a scheduler extender, which implements the `Bind` call and takes several minutes to respond to that call. The scheduler log was full of the following error.
```
W1120 10:23:09.691188 99720 cache.go:442] Couldn't expire cache for pod default/xxx. Binding is still in progress.
```
The TTL for a pod to be expired in the scheduler cache is 30 seconds. But it's also possible that the binding (which is done asynchronously) can take longer than 30 seconds.
2cbb07a439/plugin/pkg/scheduler/factory/factory.go (L143)
The go routine that checks whether a pod has been expired is triggered every second.
2cbb07a439/plugin/pkg/scheduler/schedulercache/cache.go (L33)
So, it will print the the following warning every seconds until the pod gets expired.
2cbb07a439/plugin/pkg/scheduler/schedulercache/cache.go (L442-L443)
I think it's a valid for the binding to take more than one second, so we should downgrade this to an info to avoid polluting the scheduler log.
**Release note**:
```release-note
None
```
/sig scheduling
/assign @bsalamat
/cc @vishh
Automatic merge from submit-queue (batch tested with PRs 54316, 53400, 55933, 55786, 55794). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add support to take nominated pods into account during scheduling to avoid starvation of higher priority pods
**What this PR does / why we need it**:
When a pod preempts lower priority pods, the preemptor gets a "nominated node name" annotation. We call such a pod a nominated pod. This PR adds the logic to take such nominated pods into account when scheduling other pods on the same node that the nominated pod is expected to run. This is needed to avoid starvation of preemptor pods. Otherwise, lower priority pods may fill up the space freed after preemption before the preemptor gets a chance to get scheduled.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#54501
**Special notes for your reviewer**: This PR is built on top of #55109 and includes all the changes there as well.
**Release note**:
```release-note
Add support to take nominated pods into account during scheduling to avoid starvation of higher priority pods.
```
/sig scheduling
ref/ #47604
Automatic merge from submit-queue (batch tested with PRs 55938, 56055, 53385, 55796, 55922). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
admission: make admission metrics compositional
Metrics emission of admission plugins and the admission chain can be implemented compositionally, i.e. completely independently from the chain logic. This PR does that, moves the whole metrics code into a sub-package to contain complexity. The plumbing logic for the emitted metrics finally is cleanly done in the apiserver bootstrapping code, instead of being totally interleaved with the core admission logic.
Ratio:
- considerably less complexity
- admission plugins are compositional, including the chain. We cannot assume that there is only one chain at the outside of the admission plugin structure. Downstream projects might have more complex admission chains, i.e. multiple chain object nested.
- addition of metrics is plumbing and should be in the apiserver plumbing code. This makes it much easier to reason about the security critical admission chain.
Follow-up of #55183 and based on #55919.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add ExtendedResourceToleration admission controller.
/kind feature
/sig scheduling
/area hw-accelerators
There's elaborate discussion on this in #55080. In short, we would like to enable cluster operators and/or cloud providers to create dedicated nodes with extended resources (like GPUs, FPGAs etc.) that are reserved for pods requesting such resources. [Taints is the kubernetes concept to create dedicated nodes.](https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/#example-use-cases) If the cluster operator or cloud provider wants to create dedicated node pools, they are expected to taint the nodes containing extended resources with the key equal to the name of the resource and effect equal to NoSchedule. If they do that, only pods that have a toleration for such a taint can be scheduled there. To make it easy for the user, this admission controller when enabled, automatically adds a toleration with key `example.com/device`, operator `Exists` and effect `NoSchedule` if an extended resource of name `example.com/device` is requested.
**Release note**:
```release-note
Add ExtendedResourceToleration admission controller. This facilitates creation of dedicated nodes with extended resources. If operators want to create dedicated nodes with extended resources (like GPUs, FPGAs etc.), they are expected to taint the node with extended resource name as the key. This admission controller, if enabled, automatically adds tolerations for such taints to pods requesting extended resources, so users don't have to manually add these tolerations.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Move of unreachable taint key out of alpha
**What this PR does / why we need it**:
Move of unreachable taint key out of alpha, which already happened in community doc.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#54198
**Special notes for your reviewer**:
Please see #54198 for the context of this inconsistency.
**Release note**:
```release-note
Move unreachable taint key out of alpha.
Please note the existing pods with the alpha toleration should be updated by user himself to tolerate the GA taint.
```