Automatic merge from submit-queue
Create restclient interface
Refactoring of code to allow replace *restclient.RESTClient with any RESTClient implementation that implements restclient.RESTClientInterface interface.
Automatic merge from submit-queue
CRI: Handle container/sandbox restarts for pod with RestartPolicy == …
If all sandbox and containers are dead in a pod, and the restart policy is
"Never", kubelet should not try to recreate all of them.
Automatic merge from submit-queue
Return an empty network namespace path for exited infra containers
If the infra container has already terminated, `docker inspect` will report
pid 0. The path constructed using the pid to check the network namespace of
the process will be invalid. This commit changes docker to report an empty
path to stop kubenet from erroring out whenever TearDown is called on an
exited infra container.
This is not a fix for all the plugins, as some plugins may require the actual
network namespace to tear down properly.
If the infra container has already terminated, `docker inspect` will report
pid 0. The path constructed using the pid to check the network namespace of
the process will be invalid. This commit changes docker to report an empty
path to stop kubenet from erroring out whenever TearDown is called on an
exited infra container.
This is not a fix for all the plugins, as some plugins may require the actual
network namespace to tear down properly.
Automatic merge from submit-queue
rkt: Convert image name to be a valid acidentifier
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
Fix a bug under the rkt runtime whereby image-registries with ports would not be fetched from
```
This fixes a bug whereby an image reference that included a port was not
recognized after being downloaded, and so could not be run
This is the quick-and-simple fix. In the longer term, we'll want to refactor image logic a bit more to handle the many special cases that the current code does not, mostly related to library images on dockerhub.
/cc @yifan-gu @kubernetes/sig-rktnetes
Automatic merge from submit-queue
WIP: Remove the legacy networking mode
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
Removes the deprecated configure-cbr0 flag and networking mode to avoid having untested and maybe unstable code in kubelet, see: #33789
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
fixes#30589fixes#31937
**Special notes for your reviewer**: There are a lot of deployments who rely on this networking mode. Not sure how we deal with that: force switch to kubenet or just delete the old deployment?
But please review the code changes first (the first commit)
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
Removed the deprecated kubelet --configure-cbr0 flag, and with that the "classic" networking mode as well
```
PTAL @kubernetes/sig-network @kubernetes/sig-node @mikedanese
Automatic merge from submit-queue
Remove static kubelet client, refactor ConnectionInfoGetter
Follow up to https://github.com/kubernetes/kubernetes/pull/33718
* Collapses the multi-valued return to a `ConnectionInfo` struct
* Removes the "raw" connection info method and interface, since it was only used in a single non-test location (by the "real" connection info method)
* Disentangles the node REST object from being a ConnectionInfoProvider itself by extracting an implementation of ConnectionInfoProvider that takes a node (using a provided NodeGetter) and determines ConnectionInfo
* Plumbs the KubeletClientConfig to the point where we construct the helper object that combines the config and the node lookup. I anticipate adding a preference order for choosing an address type in https://github.com/kubernetes/kubernetes/pull/34259
Automatic merge from submit-queue
kubelet: storage: don't hang kubelet on unresponsive nfs
Fixes#31272
Currently, due to the nature of nfs, an unresponsive nfs volume in a pod can wedge the kubelet such that additional pods can not be run.
The discussion thus far surrounding this issue was to wrap the `lstat`, the syscall that ends up hanging in uninterruptible sleep, in a goroutine and limiting the number of goroutines that hang to one per-pod per-volume.
However, in my investigation, I found that the callsites that request a listing of the volumes from a particular volume plugin directory don't care anything about the properties provided by the `lstat` call. They only care about whether or not a directory exists.
Given that constraint, this PR just avoids the `lstat` call by using `Readdirnames()` instead of `ReadDir()` or `ReadDirNoExit()`
### More detail for reviewers
Consider the pod mounted nfs volume at `/var/lib/kubelet/pods/881341b5-9551-11e6-af4c-fa163e815edd/volumes/kubernetes.io~nfs/myvol`. The kubelet wedges because when we do a `ReadDir()` or `ReadDirNoExit()` it calls `syscall.Lstat` on `myvol` which requires communication with the nfs server. If the nfs server is unreachable, this call hangs forever.
However, for our code, we only care what about the names of files/directory contained in `kubernetes.io~nfs` directory, not any of the more detailed information the `Lstat` call provides. Getting the names can be done with `Readdirnames()`, which doesn't need to involve the nfs server.
@pmorie @eparis @ncdc @derekwaynecarr @saad-ali @thockin @vishh @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
Fix edge case in qos evaluation
If a pod has a container C1 and C2, where sum(C1.requests, C2.requests) equals (C1.Limits), the code was reporting that the pod had "Guaranteed" qos, when it should have been Burstable.
/cc @vishh @dchen1107
Automatic merge from submit-queue
Log more information on pod status updates
Also bump the logging level to V2 so that we can see them in a non-test
cluster.
Automatic merge from submit-queue
add UpdateRuntimeConfig interface
Expose UpdateRuntimeConfig interface in RuntimeService for kubelet to pass a set of configurations to runtime. Currently it only takes PodCIDR.
The use case is for kubelet to pass configs to runtime. Kubelet holds some config/information which runtime does not have, such as PodCIDR. I expect some of kubelet configurations will gradually move to runtime, but I believe cases like PodCIDR, which dynamically assigned by k8s master, need to stay for a while.
Automatic merge from submit-queue
Allow kuberuntime to get network namespace for not ready sandboxes
Kubelet calls TearDownPod to clean up the network resources for a pod sandbox.
TearDownPod relies on GetNetNS to retrieve network namespace, and the current
implementation makes this impossible for not-ready sandboxes. This change
removes the unnecessary filter to fix this issue.
Kubelet calls TearDownPod to clean up the network resources for a pod sandbox.
TearDownPod relies on GetNetNS to retrieve network namespace, and the current
implementation makes this impossible for not-ready sandboxes. This change
removes the unnecessary filter to fix this issue.
Automatic merge from submit-queue
CRI: Image pullable support in dockershim
For #33189.
The new test `ImageID should be set to the manifest digest (from RepoDigests) when available` introduced in #33014 is failing, because:
1) `docker-pullable://` conversion is not supported in dockershim;
2) `kuberuntime` and `dockershim` is using `ListImages with image name filter` to check whether image presents. However, `ListImages` doesn't support filter with `digest`.
This PR:
1) Change `kuberuntime.IsImagePresent` to use `runtime.ImageStatus` and `dockershim.InspectImage` instead. ***Notice an API change: `ImageStatus` should return `(nil, nil)` for non-existing image.***
2) Add `docker-pullable://` support.
3) Fix `RemoveImage` in dockershim https://github.com/kubernetes/kubernetes/pull/29316.
I've tried myself, the test can pass now.
@yujuhong @feiskyer @yifan-gu
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Add version cache for cri APIVersion
ref https://github.com/kubernetes/kubernetes/issues/29478
1. Added a version cache for `APIVersion()` by using object cache., with ttl=1 min
2. Leaving `Version()` as it is today
Automatic merge from submit-queue
Update godeps for libcontainer+cadvisor
Needed to unblock more progress on pod cgroup.
/cc @vishh @dchen1107 @timstclair
Automatic merge from submit-queue
Kubelet: Use RepoDigest for ImageID when available
```release-note
Use manifest digest (as `docker-pullable://`) as ImageID when available (exposes a canonical, pullable image ID for containers).
```
Previously, we used the docker config digest (also called "image ID"
by Docker) for the value of the `ImageID` field in the container status.
This was not particularly useful, since the config manifest is not
what's used to identify the image in a registry, which uses the manifest
digest instead. Docker 1.12+ always populates the RepoDigests field
with the manifest digests, and Docker 1.10 and 1.11 populate it when
images are pulled by digest.
This commit changes `ImageID` to point to the the manifest digest when
available, using the prefix `docker-pullable://` (instead of
`docker://`)
Related to #32159
Previously, we used the docker config digest (also called "image ID"
by Docker) for the value of the `ImageID` field in the container status.
This was not particularly useful, since the config manifest is not
what's used to identify the image in a registry, which uses the manifest
digest instead. Docker 1.12+ always populates the RepoDigests field
with the manifest digests, and Docker 1.10 and 1.11 populate it when
images are pulled by digest.
This commit changes `ImageID` to point to the the manifest digest when
available, using the prefix `docker-pullable://` (instead of
`docker://`)
Previously, the `InspectImage` method of the Docker interface expected a
"pullable" image ref (name, tag, or manifest digest). If you tried to
inspect an image by its ID (config digest), the inspect would fail to
validate the image against the input identifier. This commit changes
the original method to be named `InspectImageByRef`, and introduces a
new method called `InspectImageByID` which validates that the input
identifier was an image ID.
Automatic merge from submit-queue
Use nodeutil.GetHostIP consistently when talking to nodes
Most of our communications from apiserver -> nodes used
nodutil.GetNodeHostIP, but a few places didn't - and this meant that the
node name needed to be resolvable _and_ we needed to populate valid IP
addresses.
```release-note
The apiserver now uses addresses reported by the kubelet in the Node object's status for apiserver->kubelet communications, rather than the name of the Node object. The address type used defaults to `InternalIP`, `ExternalIP`, and `LegacyHostIP` address types, in that order.
```
Automatic merge from submit-queue
Add sandbox gc minage
Fixes https://github.com/kubernetes/kubernetes/issues/34272.
Fixes https://github.com/kubernetes/kubernetes/issues/33984.
This PR:
1) Change the `GetPodStatus` to get statuses of all containers in a pod instead of only containers belonging to existing sandboxes. This is because sandbox may be removed by GC or by users, kubelet should be able to deal with this case.
2) Change the CRI comment to clarify the timestamp unit (nanosecond).
2) Add MinAge for sandbox GC Policy.
@yujuhong @feiskyer @yifan-gu
/cc @kubernetes/sig-node
Automatic merge from submit-queue
remove testapi.Default.GroupVersion
I'm going to try to take this as a series of mechanicals. This removes `testapi.Default.GroupVersion()` and replaces it with `registered.GroupOrDie(api.GroupName).GroupVersion`.
@caesarxuchao I'm trying to see how much of `pkg/api/testapi` I can remove.
Automatic merge from submit-queue
Revert "Add kubelet awareness to taint tolerant match caculator."
Reverts kubernetes/kubernetes#26501
Original PR was not fully reviewed by @kubernetes/sig-node
cc/ @timothysc @resouer
Automatic merge from submit-queue
Kubelet: Use RepoDigest for ImageID when available
**Release note**:
```release-note
Use manifest digest (as `docker-pullable://`) as ImageID when available (exposes a canonical, pullable image ID for containers).
```
Previously, we used the docker config digest (also called "image ID"
by Docker) for the value of the `ImageID` field in the container status.
This was not particularly useful, since the config manifest is not
what's used to identify the image in a registry, which uses the manifest
digest instead. Docker 1.12+ always populates the RepoDigests field
with the manifest digests, and Docker 1.10 and 1.11 populate it when
images are pulled by digest.
This commit changes `ImageID` to point to the the manifest digest when
available, using the prefix `docker-pullable://` (instead of
`docker://`)
Related to #32159
Automatic merge from submit-queue
Add kubelet awareness to taint tolerant match caculator.
Add kubelet awareness to taint tolerant match caculator.
Ref: #25320
This is required by `TaintEffectNoScheduleNoAdmit` & `TaintEffectNoScheduleNoAdmitNoExecute `, so that node will know if it should expect the taint&tolerant
Automatic merge from submit-queue
Refactor: separate KubeletClient & ConnectionInfoGetter concepts
KubeletClient implements ConnectionInfoGetter, but it is not a complete
implementation: it does not set the kubelet port from the node record,
for example.
By renaming the method so that it does not implement the interface, we
are able to cleanly see where the "raw" GetConnectionInfo is used (it is
correct) and also have go type-checking enforce this for us.
This is related to #25532; I wanted to satisfy myself that what we were doing there was correct, and I wanted also to ensure that the compiler could enforce this going forwards.
Automatic merge from submit-queue
Add node event for container/image GC failure
Follow up to #31988. Add an event for a node when container/image GC fails.
Automatic merge from submit-queue
Fix nil pointer issue when getting metrics from volume mounter
Currently it is possible that the mounter object stored in Mounted
Volume data structure in the actual state of kubelet volume manager is
nil if this information is recovered from state sync process. This will
cause nil pointer issue when calculating stats in volume_stat_calculator.
A quick fix is to not return the volume if its mounter is nil. A more
complete fix is to also recover mounter object when reconstructing the
volume data structure which will be addressed in PR #33616
Currently it is possible that the mounter object stored in Mounted
Volume data structure in the actual state of kubelet volume manager is
nil if this information is recovered from state sync process. This will
cause nil pointer issue when calculating stats in volume_stat_calculator.
A quick fix is to not return the volume if its mounter is nil. A more
complete fix is to also recover mounter object when reconstructing the
volume data structure which will be addressed in PR #33616
Automatic merge from submit-queue
kubelet: eviction: avoid duplicate action on stale stats
Currently, the eviction code can be overly aggressive when synchronize() is called two (or more) times before a particular stat has been recollected by cadvisor. The eviction manager will take additional action based on information for which it has already taken actions.
This PR provides a method for the eviction manager to track the timestamp of the last obversation and not take action if the stat has not been updated since the last time synchronize() was run.
@derekwaynecarr @vishh @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
kubelet: eviction: allow minimum reclaim as percentage
Fixes#33354
xref #32537
**Release note**:
```release-note
The kubelet --eviction-minimum-reclaim option can now take precentages as well as absolute values for resources quantities
```
@derekwaynecarr @vishh @mtaufen
Previously, we used the docker config digest (also called "image ID"
by Docker) for the value of the `ImageID` field in the container status.
This was not particularly useful, since the config manifest is not
what's used to identify the image in a registry, which uses the manifest
digest instead. Docker 1.12+ always populates the RepoDigests field
with the manifest digests, and Docker 1.10 and 1.11 populate it when
images are pulled by digest.
This commit changes `ImageID` to point to the the manifest digest when
available, using the prefix `docker-pullable://` (instead of
`docker://`)
Automatic merge from submit-queue
CRI: Remove the mount name and port name.
Per discussion on https://github.com/kubernetes/kubernetes/issues/33873.
Currently the mount name is not being used and also involves some
incorrect usage (sometimes it's referencing a mount name, sometimes
it's referecing a volume name), so we decide to remove it from CRI.
The port name is also not used, so remove it as well.
Fix#33873Fix#33526
/cc @kubernetes/sig-node @kubernetes/sig-rktnetes
Automatic merge from submit-queue
CRI: Implement temporary ImageStats in kuberuntime_manager
For #33048 and #33189.
This PR:
1) Implement a temporary `ImageStats` in kuberuntime_manager.go
2) Add container name label on infra container to make the current summary api logic work with dockershim.
I run the summary api test locally and it passed for me. Notice that the original summary api test is not showing up on CRI testgrid because it was removed yesterday. It will be added back in https://github.com/kubernetes/kubernetes/pull/33779.
@yujuhong @feiskyer
Previously, the `InspectImage` method of the Docker interface expected a
"pullable" image ref (name, tag, or manifest digest). If you tried to
inspect an image by its ID (config digest), the inspect would fail to
validate the image against the input identifier. This commit changes
the original method to be named `InspectImageByRef`, and introduces a
new method called `InspectImageByID` which validates that the input
identifier was an image ID.
Per discussion on https://github.com/kubernetes/kubernetes/issues/33873.
Currently the mount name is not being used and also involves some
incorrect usage (sometimes it's referencing a mount name, sometimes
it's referecing a volume name), so we decide to remove it from CRI.
The port name is also not used, so remove it as well.
Automatic merge from submit-queue
CRI: Enable custom infra container image
A minor fix to enable custom infra container image ref #29478
- Need to address:
Not sure how do deal with infra image credential, leave it as it is today. Should we allow user to specify credentials in pod yaml?
Automatic merge from submit-queue
CRI: Add init containers
This PR adds init containers support in CRI.
CC @yujuhong @Random-Liu @yifan-gu
Also CC @kubernetes/sig-node @kubernetes/sig-rktnetes
Automatic merge from submit-queue
Kubelet: fix port forward for dockershim
This PR fixes port forward for dockershim and also adds a `kubecontainer.FormatPod`.
Locally cluster has passed `--ginkgo.focus=Port\sforwarding'` tests.
cc/ @Random-Liu @yujuhong
Automatic merge from submit-queue
Fix issue in updating device path when volume is attached multiple times
When volume is attached, it is possible that the actual state
already has this volume object (e.g., the volume is attached to multiple
nodes, or volume was detached and attached again). We need to update the
device path in such situation, otherwise, the device path would be stale
information and cause kubelet mount to the wrong device.
This PR partially fixes issue #29324
Automatic merge from submit-queue
Fix#33784, IN_CREATE event does not guarantee file content written
Fixed#33784.
The CREATE inotify event [here](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/config/file_linux_test.go#L275) is triggered by os.OpenFile(), however the content would be written by the following f.Write(). It will fail if the program try to process the event in between.
IN_CREAE event is triggered by open(2), mkdir(2), link(2), symlink(2), bind(2), but not all of them will guarantee the content is written ([ref](http://man7.org/linux/man-pages/man7/inotify.7.html)). <s>Hence we should not respond to IN_CREATE event for pod creation. I believe listen on IN_MODIFY and IN_MOVED_TO would be sufficient for pod addition&update.
Would like to see the Jenkins test results for further evaluation.
@Random-Liu
Automatic merge from submit-queue
Split NodeDiskPressure into NodeInodePressure and NodeDiskPressure
Added NodeInodePressure as a NodeConditionType. SignalImageFsInodesFree and SignalNodeFsInodesFree signal this pressure. Also added simple pieces to the scheduler predicates so that it takes InodePressure into account.
Automatic merge from submit-queue
Add seccomp and apparmor support.
This PR adds seccomp and apparmor support in new CRI.
This a WIP because I'm still adding unit test for some of the functions. Sent this PR here for design discussion.
This PR is similar with https://github.com/kubernetes/kubernetes/pull/33450.
The differences are:
* This PR passes seccomp and apparmor configuration via annotations;
* This PR keeps the seccomp handling logic in docker shim because current seccomp implementation is very docker specific, and @timstclair told me that even the json seccomp profile file is defined by docker.
Notice that this PR almost passes related annotations in `api.Pod` to the runtime directly instead of introducing new CRI annotation.
@yujuhong @feiskyer @timstclair
When volume is attached, it is possible that the actual state
already has this volume object (e.g., the volume is attached to multiple
nodes, or volume was detached and attached again). We need to update the
device path in such situation, otherwise, the device path would be stale
information and cause kubelet mount to the wrong device.
This PR partially fixes issue #29324
Automatic merge from submit-queue
CRI: Fix bug in dockershim to set sandbox id properly.
For https://github.com/kubernetes/kubernetes/issues/33189#issuecomment-249307796.
During debugging `Variable Expansion should allow composing env vars into new env vars`, I found that the root cause is that the sandbox was removed before all containers were deleted, which caused the pod to be started again after succeed.
This happened because the `PodSandboxID` field is not set. This PR fixes the bug.
Some other test flakes are also caused by this
```
Downward API volume should provide node allocatable (cpu) as default cpu limit if the limit is not set
Downward API volume should provide container's memory limit
EmptyDir volumes should support (non-root,0666,tmpfs)
...
```
/cc @yujuhong @feiskyer