When kubelet starts a pod that refers to non-existing PV, PVC or Node, it
should clearly show that the requested element does not exist.
Previous "PersistentVolumeClaim 'default/ceph-claim-wm' is not in cache"
looks like random kubelet hiccup, while "PersistentVolumeClaim
'default/ceph-claim-wm' not found" suggests that the object may not exist at
all and it might be an user error.
Fixes#27523
Automatic merge from submit-queue
add unit and integration tests for rbac authorizer
This PR adds lots of tests for the RBAC authorizer.
The plan over the next couple days is to add a lot more test cases.
Updates #23396
cc @erictune
We had a long-lasting bug which prevented creation of volumes in
non-master zones, because the cloudprovider in the volume label
admission controller is not initialized with the multizone setting
(issue #27656).
This implements a simple workaround: if the volume is created with the
failure-domain zone label, we look for the volume in that zone. This is
more efficient, avoids introducing a new semantic, and allows users (and
the dynamic provisioner) to create volumes in non-master zones.
Fixes#27657
Automatic merge from submit-queue
Considering all nodes for the scheduler cache to allow lookups
Fixes the actual issue that led me to create https://github.com/kubernetes/kubernetes/issues/22554
Currently the nodes in the cache provided to the predicates excludes the unschedulable nodes using field level filtering for the watch results. This results in the above issue as the `ServiceAffinity` predicate uses the cached node list to look up the node metadata for a peer pod (another pod belonging to the same service). Since this peer pod could be currently hosted on a node that is currently unschedulable, the lookup could potentially fail, resulting in the pod failing to be scheduled.
As part of the fix, we are now including all nodes in the watch results and excluding the unschedulable nodes using `NodeCondition`
@derekwaynecarr PTAL
Currently the API server only checks the errors returned by an
authorizer plugin, it doesn't return or log them[0]. This makes
incorrectly configuring the wehbook authorizer plugin extremely
difficult to debug.
Add a logging statement if the request to the remove service fails
as this indicates misconfiguration.
[0] https://goo.gl/9zZFv4
Automatic merge from submit-queue
scheduler: remove unused random generator
The way scheduler selecting host has been changed to round-robin.
Clean up leftover.
Automatic merge from submit-queue
Add a NodeCondition "NetworkUnavaiable" to prevent scheduling onto a node until the routes have been created
This is new version of #26267 (based on top of that one).
The new workflow is:
- we have an "NetworkNotReady" condition
- Kubelet when it creates a node, it sets it to "true"
- RouteController will set it to "false" when the route is created
- Scheduler is scheduling only on nodes that doesn't have "NetworkNotReady ==true" condition
@gmarek @bgrant0607 @zmerlynn @cjcullen @derekwaynecarr @danwinship @dcbw @lavalamp @vishh
Automatic merge from submit-queue
reduce conflict retries
Eliminates quota admission conflicts due to latent caches on the same API server.
@derekwaynecarr
Automatic merge from submit-queue
plumb Update resthandler to allow old/new comparisons in admission
Rework how updated objects are passed to rest storage Update methods (first pass at https://github.com/kubernetes/kubernetes/pull/23928#discussion_r61444342)
* allows centralizing precondition checks (uid and resourceVersion)
* allows admission to have the old and new objects on patch/update operations (sets us up for field level authorization, differential quota updates, etc)
* allows patch operations to avoid double-GETting the object to apply the patch
Overview of important changes:
* pkg/api/rest/rest.go
* changes `rest.Update` interface to give rest storage an `UpdatedObjectInfo` interface instead of the object directly. To get the updated object, the storage must call `UpdatedObject()`, passing in the current object
* pkg/api/rest/update.go
* provides a default `UpdatedObjectInfo` impl
* passes a copy of the updated object through any provided transforming functions and returns it when asked
* builds UID preconditions from the updated object if they can be extracted
* pkg/apiserver/resthandler.go
* Reworks update and patch operations to give old objects to admission
* pkg/registry/generic/registry/store.go
* Calls `UpdatedObject()` inside `GuaranteedUpdate` so it can provide the old object
Todo:
- [x] Update rest.Update interface:
* Given the name of the object being updated
* To get the updated object data, the rest storage must pass the current object (fetched using the name) to an `UpdatedObject(ctx, oldObject) (newObject, error)` func. This is typically done inside a `GuaranteedUpdate` call.
- [x] Add old object to admission attributes interface
- [x] Update resthandler Update to move admission into the UpdatedObject() call
- [x] Update resthandler Patch to move the patch application and admission into the UpdatedObject() call
- [x] Add resttest tests to make sure oldObj is correctly passed to UpdatedObject(), and errors propagate back up
Follow-up:
* populate oldObject in admission for delete operations?
* update quota plugin to use `GetOldObject()` in admission attributes
* admission plugin to gate ownerReference modification on delete permission
* Decide how to handle preconditions (does that belong in the storage layer or in the resthander layer?)
Automatic merge from submit-queue
Introduce node memory pressure condition to scheduler
Following the work done by @derekwaynecarr at https://github.com/kubernetes/kubernetes/pull/21274, introducing memory pressure predicate for scheduler.
Missing:
* write down unit-test
* test the implementation
At the moment this is a heads up for further discussion how the new node's memory pressure condition should be handled in the generic scheduler.
**Additional info**
* Based on [1], only best effort pods are subject to filtering.
* Based on [2], best effort pods are those pods "iff requests & limits are not specified for any resource across all containers".
[1] 542668cc79/docs/proposals/kubelet-eviction.md (scheduler)
[2] https://github.com/kubernetes/kubernetes/pull/14943
Automatic merge from submit-queue
Cache Webhook Authentication responses
Add a simple LRU cache w/ 2 minute TTL to the webhook authenticator.
Kubectl is a little spammy, w/ >= 4 API requests per command. This also prevents a single unauthenticated user from being able to DOS the remote authenticator.
The PR add the length detection of the "predicateFuncs" for "findNodesThatFit" function of generic_scheduler.go.
In “findNodesThatFit” function, if the length of the "predicateFuncs" parameter is 0, it can set filtered equals nodes.Items, and needn't to traverse the nodes.Items.
It should reduce the resource data after finding the pod in the pods, because perhaps no corresponding pod in the pods of the node, at this time it shouldn't reduce the resource data of the node.
* Use an interface for OIDC Client, so that we're testing the behavior
of the client, not the go-oidc package itself
* add backoff and retry when server rejects token
This commit handles:
* Passing ID Token as Bearer token
* Refreshing of tokens using refresh-tokens
* Persisting refreshed tokens
* ability to add arbitrary extra scopes via config
* this is what enables the cross-client/azp stuff
Automatic merge from submit-queue
Make IsValidLabelValue return error strings
Part of the larger validation PR, broken out for easier review and merge. Builds on previous PRs in the series.
Automatic merge from submit-queue
Improve fatal error description in plugins.go of scheduler
The PR add more information for the fatal error in plugins.go of scheduler.
Automatic merge from submit-queue
Make IsQualifiedName return error strings
Part of the larger validation PR, broken out for easier review and merge.
@lavalamp FYI, but I know you're swamped, too.
In RegisterCustomFitPredicate, when policy.Argument is nil and fitPredicateMap has the policy.Name, it can return the policy.Name directly. Subsequent operations are redundant.
Automatic merge from submit-queue
WIP v0 NVIDIA GPU support
```release-note
* Alpha support for scheduling pods on machines with NVIDIA GPUs whose kubelets use the `--experimental-nvidia-gpus` flag, using the alpha.kubernetes.io/nvidia-gpu resource
```
Implements part of #24071 for #23587
I am not familiar with the scheduler enough to know what to do with the scores. Mostly punting for now.
Missing items from the implementation plan: limitranger, rkt support, kubectl
support and docs
cc @erictune @davidopp @dchen1107 @vishh @Hui-Zhi @gopinatht
Automatic merge from submit-queue
Automatically create the kube-system namespace
At the same time we ensure that the `default` namespace is present, it also creates `kube-system` if it doesn't exist.
`kube-system` will now exist from the beginning, and will be recreated every 10s if deleted, in the same manner as the `default` ns
This makes UX much better, no need for `kubectl`ing a `kube-system.yaml` file anymore for a function that is essential to Kubernetes (addons). For instance, this makes dashboard deployment much easier when there's no need to check for the `kube-system` ns first.
A follow up in the future may remove places where logic to manually create the kube-system namespace is present.
Also fixed a small bug where `CreateNamespaceIfNeeded` ignored the `ns` parameter and was hardcoded to `api.NamespaceDefault`.
@davidopp @lavalamp @thockin @mikedanese @bryk @cheld @fgrzadkowski @smarterclayton @wojtek-t @dlorenc @vishh @dchen1107 @bgrant0607 @roberthbailey
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/25196)
<!-- Reviewable:end -->
Automatic merge from submit-queue
Add pod condition PodScheduled to detect situation when scheduler tried to schedule a Pod, but failed
Set `PodSchedule` condition to `ConditionFalse` in `scheduleOne()` if scheduling failed and to `ConditionTrue` in `/bind` subresource.
Ref #24404
@mml (as it seems to be related to "why pending" effort)
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/24459)
<!-- Reviewable:end -->
Automatic merge from submit-queue
Webhook Token Authenticator
Add a webhook token authenticator plugin to allow a remote service to make authentication decisions.
Automatic merge from submit-queue
Sort resources in quota errors to avoid duplicate events
Bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=1333122
Errors describing why a request was rejected to quota would get variable responses (cpu=x,memory=y or memory=x,cpu=y) which caused duplicate events for the same root cause.
/cc @ncdc @jwforres
Implements part of #24071
I am not familiar with the scheduler enough to know what to do with the scores. Punting for now.
Missing items from the implementation plan: limitranger, rkt support, kubectl
support and user docs
Automatic merge from submit-queue
Support persisting config from kubecfg AuthProvider plugins
Plumbs through an interface to the plugin that can persist a `map[string]string` config for just that plugin. Also adds `config` to the AuthProvider serialization type, and `Login()` to the AuthProvider plugin interface.
Modified the gcp AuthProvider to cache short-term access tokens in the kubecfg file.
Builds on #23066
@bobbyrullo @deads2k @jlowdermilk @erictune
Automatic merge from submit-queue
add namespace index for cache
@wojtek-t
Implement in this approach make the change of lister.go small, but we should replace all `NewInformer()` to `NewIndexInformer()`, even when someone not want to filter by namespace(eg. gc_controller and scheduler). Any suggestion?