* Use an interface for OIDC Client, so that we're testing the behavior
of the client, not the go-oidc package itself
* add backoff and retry when server rejects token
This commit handles:
* Passing ID Token as Bearer token
* Refreshing of tokens using refresh-tokens
* Persisting refreshed tokens
* ability to add arbitrary extra scopes via config
* this is what enables the cross-client/azp stuff
Automatic merge from submit-queue
Make IsValidLabelValue return error strings
Part of the larger validation PR, broken out for easier review and merge. Builds on previous PRs in the series.
Automatic merge from submit-queue
Improve fatal error description in plugins.go of scheduler
The PR add more information for the fatal error in plugins.go of scheduler.
Automatic merge from submit-queue
Make IsQualifiedName return error strings
Part of the larger validation PR, broken out for easier review and merge.
@lavalamp FYI, but I know you're swamped, too.
In RegisterCustomFitPredicate, when policy.Argument is nil and fitPredicateMap has the policy.Name, it can return the policy.Name directly. Subsequent operations are redundant.
Automatic merge from submit-queue
WIP v0 NVIDIA GPU support
```release-note
* Alpha support for scheduling pods on machines with NVIDIA GPUs whose kubelets use the `--experimental-nvidia-gpus` flag, using the alpha.kubernetes.io/nvidia-gpu resource
```
Implements part of #24071 for #23587
I am not familiar with the scheduler enough to know what to do with the scores. Mostly punting for now.
Missing items from the implementation plan: limitranger, rkt support, kubectl
support and docs
cc @erictune @davidopp @dchen1107 @vishh @Hui-Zhi @gopinatht
Automatic merge from submit-queue
Automatically create the kube-system namespace
At the same time we ensure that the `default` namespace is present, it also creates `kube-system` if it doesn't exist.
`kube-system` will now exist from the beginning, and will be recreated every 10s if deleted, in the same manner as the `default` ns
This makes UX much better, no need for `kubectl`ing a `kube-system.yaml` file anymore for a function that is essential to Kubernetes (addons). For instance, this makes dashboard deployment much easier when there's no need to check for the `kube-system` ns first.
A follow up in the future may remove places where logic to manually create the kube-system namespace is present.
Also fixed a small bug where `CreateNamespaceIfNeeded` ignored the `ns` parameter and was hardcoded to `api.NamespaceDefault`.
@davidopp @lavalamp @thockin @mikedanese @bryk @cheld @fgrzadkowski @smarterclayton @wojtek-t @dlorenc @vishh @dchen1107 @bgrant0607 @roberthbailey
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/25196)
<!-- Reviewable:end -->
Automatic merge from submit-queue
Add pod condition PodScheduled to detect situation when scheduler tried to schedule a Pod, but failed
Set `PodSchedule` condition to `ConditionFalse` in `scheduleOne()` if scheduling failed and to `ConditionTrue` in `/bind` subresource.
Ref #24404
@mml (as it seems to be related to "why pending" effort)
<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/24459)
<!-- Reviewable:end -->
Automatic merge from submit-queue
Webhook Token Authenticator
Add a webhook token authenticator plugin to allow a remote service to make authentication decisions.
Automatic merge from submit-queue
Sort resources in quota errors to avoid duplicate events
Bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=1333122
Errors describing why a request was rejected to quota would get variable responses (cpu=x,memory=y or memory=x,cpu=y) which caused duplicate events for the same root cause.
/cc @ncdc @jwforres
Implements part of #24071
I am not familiar with the scheduler enough to know what to do with the scores. Punting for now.
Missing items from the implementation plan: limitranger, rkt support, kubectl
support and user docs
Automatic merge from submit-queue
Support persisting config from kubecfg AuthProvider plugins
Plumbs through an interface to the plugin that can persist a `map[string]string` config for just that plugin. Also adds `config` to the AuthProvider serialization type, and `Login()` to the AuthProvider plugin interface.
Modified the gcp AuthProvider to cache short-term access tokens in the kubecfg file.
Builds on #23066
@bobbyrullo @deads2k @jlowdermilk @erictune
Automatic merge from submit-queue
add namespace index for cache
@wojtek-t
Implement in this approach make the change of lister.go small, but we should replace all `NewInformer()` to `NewIndexInformer()`, even when someone not want to filter by namespace(eg. gc_controller and scheduler). Any suggestion?
The codec factory should support two distinct interfaces - negotiating
for a serializer with a client, vs reading or writing data to a storage
form (etcd, disk, etc). Make the EncodeForVersion and DecodeToVersion
methods only take Encoder and Decoder, and slight refactoring elsewhere.
In the storage factory, use a content type to control what serializer to
pick, and use the universal deserializer. This ensures that storage can
read JSON (which might be from older objects) while only writing
protobuf. Add exceptions for those resources that may not be able to
write to protobuf (specifically third party resources, but potentially
others in the future).
Automatic merge from submit-queue
fully qualify admission resources and kinds
Fully qualifies the `Kind` and `Resource` fields for admission attributes. The information was getting filtered at the `RESTHandler` before.
@derekwaynecarr
Automatic merge from submit-queue
Store node information in NodeInfo
This is significantly improving scheduler throughput.
On 1000-node cluster:
- empty cluster: ~70pods/s
- full cluster: ~45pods/s
Drop in throughput is mostly related to priority functions, which I will be looking into next (I already have some PR #24095, but we need for more things before).
This is roughly ~40% increase.
However, we still need better understanding of predicate function, because in my opinion it should be even faster as it is now. I'm going to look into it next week.
@gmarek @hongchaodeng @xiang90
Automatic merge from submit-queue
Add Subresource & Name to webhook authorizer.
Pass through the Subresource and Name fields from the `authorization.Attributes` to the `SubjectAccessReviewSpec.ResourceAttributes`.
Automatic merge from submit-queue
schedulercache: remove bind() from AssumePod
Due to #24197, we make bind() asynchronous and don't need it anymore.
Submit this PR to clean it up.
Automatic merge from submit-queue
Client auth provider plugin framework
Allows client plugins to modify the underlying transport to, for example, add custom authorization headers.
Automatic merge from submit-queue
Move predicates into library
This PR tries to implement #12744
Any suggestions/ideas are welcome. @davidopp
current state: integration test fails if including podCount check in Kubelet.
DONE:
1. refactor all predicates: predicates return fitOrNot(bool) and error(Error) in which the latter is of type PredicateFailureError or InsufficientResourceError
2. GeneralPredicates() is a predicate function, which includes serveral other predicate functions (PodFitsResource, PodFitsHost, PodFitsHostPort). It is registered as one of the predicates in DefaultAlgorithmProvider, and is also called in canAdmitPod() in Kubelet and should be called by other components (like rescheduler, etc if necessary. See discussion in issue #12744
TODO:
1. determine which predicates should be included in GeneralPredicates()
2. separate GeneralPredicates() into: a.) GeneralPredicatesEvictPod() and b.) GeneralPredicatesNotEvictPod()
3. DaemonSet should use GeneralPredicates()
DONE:
1. refactor all predicates: predicates return fitOrNot(bool) and error(Error) in which the latter is of type
PredicateFailureError or InsufficientResourceError. (For violation of either MaxEBSVolumeCount or
MaxGCEPDVolumeCount, returns one same error type as ErrMaxVolumeCountExceeded)
2. GeneralPredicates() is a predicate function, which includes serveral other predicate functions (PodFitsResource,
PodFitsHost, PodFitsHostPort). It is registered as one of the predicates in DefaultAlgorithmProvider, and
is also called in canAdmitPod() in Kubelet and should be called by other components (like rescheduler, etc)
if necessary. See discussion in issue #12744
3. remove podNumber check from GeneralPredicates
4. HostName is now verified in Kubelet's canAdminPod(). add TestHostNameConflicts in kubelet_test.go
5. add getNodeAnyWay() method in Kubelet to get node information in standaloneMode
TODO:
1. determine which predicates should be included in GeneralPredicates()
2. separate GeneralPredicates() into:
a. GeneralPredicatesEvictPod() and
b. GeneralPredicatesNotEvictPod()
3. DaemonSet should use GeneralPredicates()
AWS has soft support limit for 40 attached EBS devices. Assuming there is just
one root device, use the rest for persistent volumes.
The devices will have name /dev/xvdba - /dev/xvdcm, leaving /dev/sda - /dev/sdz
to the system.
Also, add better error handling and propagate error
"Too many EBS volumes attached to node XYZ" to a pod.
Custom quota evaluators may need to query a target namespace of an input
object during quota admission check. For this, namespace needs to be
known.
Signed-off-by: Michal Minar <miminar@redhat.com>
The provider config has changed a little bit in go-oidc. It is more
complete and now throws errors when unmarshaling provider configs
that are missing required fields (as defined by the OpenID Connect
Discovery spec).
Update the oidc plugin to use the new type.
This is a first-aid bandage to let admission controller ignore persistent
volumes that are being provisioned right now and thus may not exist in
external cloud infrastructure yet.
Instead of sorting hosts by score, find max score and choose one of
the hosts with max score directly. Saves a little time on sorting and
avoids extra copying of HostPriorityList entries.
Test had to be updated as one of the test cases relied on having a
stable order from pre-sorting.
I can't revert with github which says "Sorry, this pull request couldn’t be
reverted automatically. It may have already been reverted, or the content may
have changed since it was merged."
Reverts commit: 0c191e787b
We are (sadly) using a copy-and-paste of the GCE PD code for AWS EBS.
This code hasn't been updated in a while, and it seems that the GCE code
has some code to make volume mounting more robust that we should copy.
For certain volume types (e.g. AWS EBS or GCE PD), a limitted
number of such volumes can be attached to a given node. This commit
introduces a predicate with allows cluster admins to cap
the maximum number of volumes matching a particular type attached to a
given node.
The volume type is configurable by passing a pair of filter functions,
and the maximum number of such volumes is configurable to allow node
admins to reserve a certain number of volumes for system use.
By default, the predicate is exposed as MaxEBSVolumeCount and
MaxGCEPDVolumeCount (for AWS ElasticBlocKStore and GCE PersistentDisk
volumes, respectively), each of which can be configured using the
`KUBE_MAX_PD_VOLS` environment variable.
Fixes#7835
Combine the fields that will be used for content transformation
(content-type, codec, and group version) into a single struct in client,
and then pass that struct into the rest client and request. Set the
content-type when sending requests to the server, and accept the content
type as primary.
Will form the foundation for content-negotiation via the client.
Remove Codec from versionInterfaces in meta (RESTMapper is now agnostic
to codec and serialization). Register api/latest.Codecs as the codec
factory and use latest.Codecs.LegacyCodec(version) as an equvialent to
the previous codec.
For AWS EBS, a volume can only be attached to a node in the same AZ.
The scheduler must therefore detect if a volume is being attached to a
pod, and ensure that the pod is scheduled on a node in the same AZ as
the volume.
So that the scheduler need not query the cloud provider every time, and
to support decoupled operation (e.g. bare metal) we tag the volume with
our placement labels. This is done automatically by means of an
admission controller on AWS when a PersistentVolume is created backed by
an EBS volume.
Support for tagging GCE PVs will follow.
Pods that specify a volume directly (i.e. without using a
PersistentVolumeClaim) will not currently be scheduled correctly (i.e.
they will be scheduled without zone-awareness).
Public utility methods and JWT parsing, and controller specific logic.
Also remove the coupling between ServiceAccountTokenGetter and the
authenticator class.
1. Name default scheduler with name `kube-scheduler`
2. The default scheduler only schedules the pods meeting the following condition:
- the pod has no annotation "scheduler.alpha.kubernetes.io/name: <scheduler-name>"
- the pod has annotation "scheduler.alpha.kubernetes.io/name: kube-scheduler"
update gofmt
update according to @david's review
run hack/test-integration.sh, hack/test-go.sh and local e2e.test
To improve the throughput of current scheduler, we can do
a simple optimization by calcluating priorities in parallel.
This doubles the throughput of density test, which has the default
config with 3 priority funcs (the spreading one does not actually
consume any computation time). It matches the expectation.