Automatic merge from submit-queue
Quota integration test improvements
This PR does the following:
* allow a replication manager to get created that does not record events
* improve the shutdown behavior of replication manager and resource quota to ensure doWork funcs exit properly
* update quota integration test to use non event generating replication manager, reduce number of pods to provision
I am hoping this combination of changes should fix the referenced flake.
Fixes https://github.com/kubernetes/kubernetes/issues/25037
Automatic merge from submit-queue
Return 'too old' errors from watch cache via watch stream
Fixes#25151
This PR updates the API server to produce the same results when a watch is attempted with a resourceVersion that is too old, regardless of whether the etcd watch cache is enabled. The expected result is a `200` http status, with a single watch event of type `ERROR`. Previously, the watch cache would deliver a `410` http response.
This is the uncached watch impl:
```
// Implements storage.Interface.
func (h *etcdHelper) WatchList(ctx context.Context, key string, resourceVersion string, filter storage.FilterFunc) (watch.Interface, error) {
if ctx == nil {
glog.Errorf("Context is nil")
}
watchRV, err := storage.ParseWatchResourceVersion(resourceVersion)
if err != nil {
return nil, err
}
key = h.prefixEtcdKey(key)
w := newEtcdWatcher(true, h.quorum, exceptKey(key), filter, h.codec, h.versioner, nil, h)
go w.etcdWatch(ctx, h.etcdKeysAPI, key, watchRV)
return w, nil
}
```
once the resourceVersion parses, there is no path that returns a direct error, so all errors would have to be returned as an `ERROR` event via the ResultChan().
Automatic merge from submit-queue
etcd3/watcher: Event.Object should have the same rev as etcd delete
### What's the problem?
When a delete is watched, the revision should be larger than any previous to guarantee ordering. However, currently etcd3 decodes the previous rev into returned object:
995f022808/pkg/storage/etcd3/watcher.go (L322)
This will break, for example, cacher's assumption here if it re-watch.
995f022808/pkg/storage/cacher.go (L579-L581)
The etcd2 impl. also takes the current ModifiedIndex to ensure it's a larger number:
995f022808/pkg/storage/etcd/etcd_watcher.go (L437-L442)
### What's this PR?
It fixes above problem by using etcd's delete revision.
Automatic merge from submit-queue
kubenet try to retrieve ip inside pod net namespace
Kubenet currently stores the ips of pods inside a map. Kubelet gets pod ip from kubenet during syncpod. If Kubelet restarts, all pods on the node lost their ips in podStatus. This PR adds logic to retrieve pod IP from pod netns.
cc: @yujuhong
This prevents programmer error from resulting in objects serialized to
the wire that are incorrectly designed. The normal path guards against
this, but the runtime.Unknown NestedMarshalTo fast path (which avoids an
allocation) doesn't have the same defensive guard.
Use constructor for ecrProvider
Rename package to "credentials" like golint requests
Don't wrap the lazy provider with a caching provider
Add immedita compile-time interface conformance checks for the interfaces
Added comments
This is step two. We now create long-lived, lazy ECR providers in all regions.
When first used, they will create the actual ECR providers doing the work
behind the scenes, namely talking to ECR in the region where the image lives,
rather than the one our instance is running in.
Also:
- moved the list of AWS regions out of the AWS cloudprovider and into the
credentialprovider, then exported it from there.
- improved logging
Behold, running in us-east-1:
```
aws_credentials.go:127] Creating ecrProvider for us-west-2
aws_credentials.go:63] AWS request: ecr:GetAuthorizationToken in us-west-2
aws_credentials.go:217] Adding credentials for user AWS in us-west-2
Successfully pulled image 123456789012.dkr.ecr.us-west-2.amazonaws.com/test:latest"
```
*"One small step for a pod, one giant leap for Kube-kind."*
Automatic merge from submit-queue
Introduce skeleton of new attach/detach controller
This PR introduces the skeleton of the new attach/detach controller for #20262
Automatic merge from submit-queue
Moving StorageFactory building logic to genericapiserver
Adding a DefaultStorageFactoryBuilder which builds the required StorageFactory.
This allows us to remove the duplicated code between `cmd/kube-apiserver` and `federation/cmd/federated-apiserver`
cc @deads2k @lavalamp @jianhuiz
Automatic merge from submit-queue
Display line number on JSON errors
Related issue: https://github.com/kubernetes/kubernetes/issues/12231
This PR will introduce line numbers for all JSON errors in the CLI:
(this is existing error reporting for YAML)
```console
$ kubectl create -f broken.yaml
yaml: line 8: mapping values are not allowed in this context
```
(this is error reporting proposed in this PR for JSON)
```console
$ kubectl create -f broken.json
json: line 35: invalid character '{' after object key
```
(and this is the current reporting:)
```console
$ kubectl create -f broken.json
invalid character '{' after object key
```
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]()
Automatic merge from submit-queue
GCE: Allow node count to exceed GCE TargetPool maximums
```release-note
If the cluster node count exceeds the GCE TargetPool maximum (currently 1000),
randomly select which nodes are members of Kubernetes External Load Balancers.
```
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]()
If we would exceeded the TargetPool API maximums, instead just
randomly select some subsection of the nodes to include in the TP
instead.
Automatic merge from submit-queue
Kubelet: Add docker operation timeout
For #23563.
Based on #24748, only the last 2 commits are new.
This PR:
1) Add timeout for all docker operations.
2) Add docker operation timeout metrics
3) Cleanup kubelet stats and add runtime operation error and timeout rate monitoring.
4) Monitor runtime operation error and timeout rate in kubelet perf.
@yujuhong
/cc @gmarek Because of the metrics change.
/cc @kubernetes/sig-node
Implements part of #24071
I am not familiar with the scheduler enough to know what to do with the scores. Punting for now.
Missing items from the implementation plan: limitranger, rkt support, kubectl
support and user docs
Automatic merge from submit-queue
etcd3/watcher: fix goroutine leak if ctx is canceled
### Problem
In reflector.go, it could probably call Stop() without retrieving all results
from ResultChan(). See [here](https://github.com/kubernetes/kubernetes/blob/master/pkg/client/cache/reflector.go#L369). A potential leak is that when an error has happened, it could block on resultChan,
and then cancelling context in Stop() wouldn't unblock it.
### What's this PR?
This fixes the problem by making it also select ctx.Done and cancel context afterwards if error happened.
In reflector.go, it could probably call Stop() without retrieving all results
from ResultChan().
A potential leak is that when an error has happened, it could block on resultChan,
and then cancelling context in Stop() wouldn't unblock it.
This fixes the problem by making it also select ctx.Done and cancel context
afterwards if error happened.
When vSphere cloud provider object is instantiated, the VM name of the
Node where this object is being create in needs to be set. This patch
also includes vSphere as part of the cloud provider package.
This patch includes implementation for the following Instance object
interfaces:
* NodeAddresses
* ExternalID
* InstanceID
Also minor refactoring in overall Instance implementation.
Automatic merge from submit-queue
Add data structure for managing go routines by name
This PR introduces a data structure for managing go routines by name. It prevents the creation of new go routines if an existing go routine with the same name exists. This will enable parallelization of the designs in https://github.com/kubernetes/kubernetes/issues/20262 and https://github.com/kubernetes/kubernetes/issues/21931 with sufficient protection to prevent starting multiple operations on the same volume.
Automatic merge from submit-queue
enable resource name and service account cases for impersonation
Adds the resource name check since that attribute was added for authorization. Also adds a check against a separate resource for service accounts. Allowing impersonation of service accounts to use a different resource check places control of impersonation with the same users to have the power to get the SA tokens directly.
@kubernetes/kube-iam
@sgallagher FYI
Automatic merge from submit-queue
Kubelet eviction flag parsers and tests
The first two commits are from https://github.com/kubernetes/kubernetes/pull/24559 that have achieved LGTM.
The last commit is only part that is interesting, it adds the parsing logic to handle the flags, and reserves `pkg/kubelet/eviction` for eviction manager logic.
Automatic merge from submit-queue
Add data structure for storing attach detach controller state.
This PR introduces the data structure for maintaining the in-memory state for the new attach/detach controller (#20262).
Automatic merge from submit-queue
kubelet: Remove redundant `Container.Created`
As far as I can tell, this has been supplanted by a) the `DockerJSON.CreatedAt` field and b) the
`ContainerStatus.CreatedAt`, where the first is used for creating the
second.
The `.Created` field was only written to as far as I can see.
cc @yifan-gu & @Random-Liu
Is there any reason we might want to keep this around?
Automatic merge from submit-queue
Introduce events flag for describers
Printing events for a given object is not always needed. Thus, introducing --show-events=false to ``kubectl describe`` to skip events printing.
Fixes: #24239
Automatic merge from submit-queue
Abstract node side functionality of attachable plugins
- Create PhysicalAttacher interface to abstract MountDevice and
WaitForAttach.
- Create PhysicalDetacher interface to abstract WaitForDetach and
UnmountDevice.
- Expand unit tests to check that Attach, Detach, WaitForAttach,
WaitForDetach, MountDevice, and UnmountDevice get call where
appropriet.
Physical{Attacher,Detacher} are working titles suggestions welcome. Some other thoughts:
- NodeSideAttacher or NodeAttacher.
- AttachWatcher
- Call this Attacher and call the Current Attacher CloudAttacher.
- DeviceMounter (although there are way too many things called Mounter right now :/)
This is to address: https://github.com/kubernetes/kubernetes/pull/21709#issuecomment-192035382
@saad-ali
Automatic merge from submit-queue
Support persisting config from kubecfg AuthProvider plugins
Plumbs through an interface to the plugin that can persist a `map[string]string` config for just that plugin. Also adds `config` to the AuthProvider serialization type, and `Login()` to the AuthProvider plugin interface.
Modified the gcp AuthProvider to cache short-term access tokens in the kubecfg file.
Builds on #23066
@bobbyrullo @deads2k @jlowdermilk @erictune
Automatic merge from submit-queue
kubectl describe: show multiple labels/annotations on multiple lines
Small UX improvement: when there is more than one label/annotation, it's more readable to see them on the different lines.
Before:
```console
$ kubectl describe svc
Name: s2i-test
Namespace: test2
Labels: app=s2i-test,foo=bar
...
```
After:
```console
$ kubectl describe svc
Name: s2i-test
Namespace: test2
Labels: app=s2i-test
foo=bar
...
```
This change affects output of the labels/annotations in many of the sub-commands of the `kubectl describe`.
PTAL @smarterclayton @kargakis
Automatic merge from submit-queue
add namespace index for cache
@wojtek-t
Implement in this approach make the change of lister.go small, but we should replace all `NewInformer()` to `NewIndexInformer()`, even when someone not want to filter by namespace(eg. gc_controller and scheduler). Any suggestion?
Automatic merge from submit-queue
kubectl: more sophisticated pod selection for logs and attach
Trying to get the logs or attach to an object other than a pod
will poll forever if that object has no replicas. This commit adds
a 20s timeout for polling.
@kubernetes/kubectl @deads2k @fabianofranz
Automatic merge from submit-queue
Add subPath to mount a child dir or file of a volumeMount
Allow users to specify a subPath in Container.volumeMounts so they can use a single volume for many mounts instead of creating many volumes. For instance, a user can now use a single PersistentVolume to store the Mysql database and the document root of an Apache server of a LAMP stack pod by mapping them to different subPaths in this single volume.
Also solves https://github.com/kubernetes/kubernetes/issues/20466.
Automatic merge from submit-queue
Automatically Add Supplemental Groups from Volumes to Pods
This adds support for a "GID" annotation that one can add to their PVs. When this annotation is seen the kubelet automatically adds the given GID to the list of supplemental groups for the pod to which the PV is attached. This allows admins to create volumes and suggest a GID to use to access the volume. This is needed for volumes which do not support ownership management such as NFS.
@markturansky PTAL
Automatic merge from submit-queue
Handle image digests in node status and image GC
Start including Docker image digests in the node status and consider image digests during image
garbage collection.
@kubernetes/rh-cluster-infra @kubernetes/sig-node @smarterclayton
Fixes#23917
Automatic merge from submit-queue
PLEG: reinspect pods that failed prior inspections
Fix the following sequence of events:
1. relist call 1 successfully inspects a pod (just has infra container)
1. relist call 2 gets an error inspecting the same pod (has infra container and a transient
container that failed to create) and doesn't update the old/new pod records
1. relist calls 3+ don't inspect the pod any more (just has infra container so it doesn't look like
anything changed)
This change adds a new list that keeps track of pods that failed inspection and retries them the
next time relist is called. Without this change, a pod in this state would never be inspected again,
its entry in the status cache would never be updated, and the pod worker would never call syncPod
again because the most recent entry in the status cache has an error associated with it. Without
this change, pods in this state would be stuck Terminating forever, unless the user issued a
deletion with a grace period value of 0.
Fixes#24819
cc @kubernetes/rh-cluster-infra @kubernetes/sig-node
Automatic merge from submit-queue
Make ThirdPartyResource a root scoped object
ThirdPartyResource (the registration of a third party type) belongs at the cluster scope. It results in resource handlers installed in every namespace, and the same name in two namespaces collides (namespace is ignored when determining group/kind).
ThirdPartyResourceData (an actual instance of that type) is still namespace-scoped.
This PR moves ThirdPartyResource to be a root scope object. Someone previously using ThirdPartyResource definitions in alpha should be able to move them from namespace to root scope like this:
setup (run on 1.2):
```
kubectl create ns ns1
echo '{"kind":"ThirdPartyResource","apiVersion":"extensions/v1beta1","metadata":{"name":"foo.example.com"},"versions":[{"name":"v8"}]}' | kubectl create -f - --namespace=ns1
echo '{"kind":"Foo","apiVersion":"example.com/v8","metadata":{"name":"MyFoo"},"testkey":"testvalue"}' | kubectl create -f - --namespace=ns1
```
export:
```
kubectl get thirdpartyresource --all-namespaces -o yaml > tprs.yaml
```
remove namespaced kind registrations (this shouldn't remove the data of that type, which is another possible issue):
```
kubectl delete -f tprs.yaml
```
... upgrade ...
re-register the custom types at the root scope:
```
kubectl create -f tprs.yaml
```
Additionally, pre-1.3 clients that expect to read/write ThirdPartyResource at a namespace scope will not be compatible with 1.3+ servers, and 1.3+ clients that expect to read/write ThirdPartyResource at a root scope will not be compatible with pre-1.3 servers.
Automatic merge from submit-queue
Define interfaces for kubelet pod admission and eviction
There is too much code and logic in `kubelet.go` that makes it hard to test functions in discrete pieces.
I propose an interface that an internal module can implement that will let it make an admission decision for a pod. If folks are ok with the pattern, I want to move the a) predicate checking, b) out of disk, c) eviction preventing best-effort pods being admitted into their own dedicated handlers that would be easier for us to mock test. We can then just write tests to ensure that the `Kubelet` calls a call-out, and we can write easier unit tests to ensure that dedicated handlers do the right thing.
The second interface I propose was a `PodEvictor` that is invoked in the main kubelet sync loop to know if pods should be pro-actively evicted from the machine. The current active deadline check should move into a simple evictor implementation, and I want to plug the out of resource killer code path as an implementation of the same interface.
@vishh @timothysc - if you guys can ack on this, I will add some unit testing to ensure we do the call-outs.
/cc @kubernetes/sig-node @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
Avoid allocations and a reflect.Call in conversion
reflect.Call is fairly expensive, performing 8 allocations and having to
set up a call stack. Using a fairly straightforward to generate switch
statement, we can bypass that early in conversion (as long as the
function takes responsibility for invocation). We may also be able to
avoid an allocation for the conversion scope, but not positive yet.
```
benchmark old ns/op new ns/op delta
BenchmarkPodConversion-8 14713 12173 -17.26%
benchmark old allocs new allocs delta
BenchmarkPodConversion-8 80 72 -10.00%
benchmark old bytes new bytes delta
BenchmarkPodConversion-8 9133 8712 -4.61%
```
@wojtek-t related to #20309
Automatic merge from submit-queue
kubenet: fix up CNI bridge TX queue length if needed
CNI's bridge plugin mis-handles the TxQLen when creating the bridge,
leading to a zero-length TX queue. This doesn't typically cause
problems (since virtual interfaces don't have hard queue limits)
but when adding traffic shaping, some qdiscs pull their packet
limits from the TX queue length, leading to a packet limit of 0
in some cases. Until we can depend on a new enough version of
CNI, fix up the TX queue length internally.
Closes: https://github.com/kubernetes/kubernetes/issues/25092