Automatic merge from submit-queue
Made tracing of calls and container lifecycle steps in FakeDockerClient optional
Fixes#39717
Slightly refactored the FakeDockerClient code and made tracing optional (but enabled by default).
@yujuhong @Random-Liu
Automatic merge from submit-queue
Fix kubelet cross build
**What this PR does / why we need it**: Cross builds are not passing for MacOS and Windows. We are expecting Windows binaries for `kubelet` and `kube-proxy` to be released by the first time with 1.5.2 to be released later today.
**Which issue this PR fixes**:
fixes#39005fixes#39714
**Special notes for your reviewer**: /cc @feiskyer @smarterclayton @vishh this should be P0 in order to be merged before 1.5.2 and obviously fix the cross build.
Automatic merge from submit-queue (batch tested with PRs 39475, 38666, 39327, 38396, 39613)
Create k8s.io/apimachinery repo
Don't panic.
The diff is quite large, but its all generated change. The first few commits are where are all the action is. I built a script to find the fanout from
```
k8s.io/kubernetes/pkg/apimachinery/registered
k8s.io/kubernetes/pkg/runtime/serializer
k8s.io/kubernetes/pkg/runtime/serializer/yaml
k8s.io/kubernetes/pkg/runtime/serializer/streaming
k8s.io/kubernetes/pkg/runtime/serializer/recognizer/testing
```
It copied
```
k8s.io/kubernetes/pkg/api/meta
k8s.io/kubernetes/pkg/apimachinery
k8s.io/kubernetes/pkg/apimachinery/registered
k8s.io/kubernetes/pkg/apis/meta/v1
k8s.io/kubernetes/pkg/apis/meta/v1/unstructured
k8s.io/kubernetes/pkg/conversion
k8s.io/kubernetes/pkg/conversion/queryparams
k8s.io/kubernetes/pkg/genericapiserver/openapi/common - this needs to renamed post-merge. It's just types
k8s.io/kubernetes/pkg/labels
k8s.io/kubernetes/pkg/runtime
k8s.io/kubernetes/pkg/runtime/schema
k8s.io/kubernetes/pkg/runtime/serializer
k8s.io/kubernetes/pkg/runtime/serializer/json
k8s.io/kubernetes/pkg/runtime/serializer/protobuf
k8s.io/kubernetes/pkg/runtime/serializer/recognizer
k8s.io/kubernetes/pkg/runtime/serializer/recognizer/testing
k8s.io/kubernetes/pkg/runtime/serializer/streaming
k8s.io/kubernetes/pkg/runtime/serializer/versioning
k8s.io/kubernetes/pkg/runtime/serializer/yaml
k8s.io/kubernetes/pkg/selection
k8s.io/kubernetes/pkg/types
k8s.io/kubernetes/pkg/util/diff
k8s.io/kubernetes/pkg/util/errors
k8s.io/kubernetes/pkg/util/framer
k8s.io/kubernetes/pkg/util/json
k8s.io/kubernetes/pkg/util/net
k8s.io/kubernetes/pkg/util/runtime
k8s.io/kubernetes/pkg/util/sets
k8s.io/kubernetes/pkg/util/validation
k8s.io/kubernetes/pkg/util/validation/field
k8s.io/kubernetes/pkg/util/wait
k8s.io/kubernetes/pkg/util/yaml
k8s.io/kubernetes/pkg/watch
k8s.io/kubernetes/third_party/forked/golang/reflect
```
The script does the import rewriting and gofmt. Then you do a build, codegen, bazel update, and it produces all the updates.
If we agree this is the correct approach. I'll create a verify script to make sure that no one messes with any files in the "dead" packages above.
@kubernetes/sig-api-machinery-misc @smarterclayton @sttts @lavalamp @caesarxuchao
`staging/prime-apimachinery.sh && hack/update-codegen.sh && nice make WHAT="federation/cmd/federation-apiserver/ cmd/kube-apiserver" && hack/update-openapi-spec.sh && hack/update-federation-openapi-spec.sh && hack/update-codecgen.sh && hack/update-codegen.sh && hack/update-generated-protobuf.sh && hack/update-bazel.sh`
Automatic merge from submit-queue (batch tested with PRs 38212, 38792, 39641, 36390, 39005)
Set MemorySwap to zero on Windows
Fixes https://github.com/kubernetes/kubernetes/issues/39003
@dchen1107 @michmike @kubernetes/sig-node-misc
Automatic merge from submit-queue
Kubelet: add image ref to ImageService interfaces
This PR adds image ref (digest or ID, depending on runtime) to PullImage result, and pass image ref in CreateContainer instead of image name. It also
* Adds image ref to CRI's PullImageResponse
* Updates related image puller
* Updates related testing utilities
~~One remaining issue is: it breaks some e2e tests because they [checks image repoTags](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/framework/util.go#L1941) while docker always returns digest in this PR. Should we update e2e test or continue to return repoTags in `containerStatuses.image`?~~
Fixes#38833.
Automatic merge from submit-queue (batch tested with PRs 39079, 38991, 38673)
Support systemd based pod qos in CRI dockershim
This PR makes pod level QoS works for CRI dockershim for systemd based cgroups. And will also fix#36807
- [x] Add cgroupDriver to dockerService and use docker info api to set value for it
- [x] Add a NOTE that detection only works for docker 1.11+, see [CHANGE LOG](https://github.com/docker/docker/blob/master/CHANGELOG.md#1110-2016-04-13)
- [x] Generate cgroupParent in syntax expected by cgroupDriver
- [x] Set cgroupParent to hostConfig for both sandbox and user container
- [x] Check if kubelet conflicts with cgroup driver of docker
cc @derekwaynecarr @vishh
Automatic merge from submit-queue (batch tested with PRs 38154, 38502)
Wrong comment to describe docker version
The original comment about minimal docker version fo `room_score_adj` is wrong (though the code is right).
Really sorry for misleading :/
This allows us to interrupt/kill the executed command if it exceeds the
timeout (not implemented by this commit).
Set timeout in Exec probes. HTTPGet and TCPSocket probes respect the
timeout, while Exec probes used to ignore it.
Add e2e test for exec probe with timeout. However, the test is skipped
while the default exec handler doesn't support timeouts.
Automatic merge from submit-queue
CRI: Add Status into CRI.
For https://github.com/kubernetes/kubernetes/issues/35701.
Fixes https://github.com/kubernetes/kubernetes/issues/35701.
This PR added a `Status` call in CRI, and the `RuntimeStatus` is defined as following:
``` protobuf
message RuntimeCondition {
// Type of runtime condition.
optional string type = 1;
// Status of the condition, one of true/false.
optional bool status = 2;
// Brief reason for the condition's last transition.
optional string reason = 3;
// Human readable message indicating details about last transition.
optional string message = 4;
}
message RuntimeStatus {
// Conditions is an array of current observed runtime conditions.
repeated RuntimeCondition conditions = 1;
}
```
Currently, only `conditions` is included in `RuntimeStatus`, and the definition is almost the same with `NodeCondition` and `PodCondition` in K8s api.
@yujuhong @feiskyer @bprashanth If this makes sense, I'll send a follow up PR to let dockershim return `RuntimeStatus` and let kubelet make use of it.
@yifan-gu @euank Does this make sense to rkt?
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Initial work on running windows containers on Kubernetes
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
This is the first stab at getting the Kubelet running on Windows (fixes#30279), and getting it to deploy network-accessible pods that consist of Windows containers. Thanks @csrwng, @jbhurat for helping out.
The main challenge with Windows containers at this point is that container networking is not supported. In other words, each container in the pod will get it's own IP address. For this reason, we had to make a couple of changes to the kubelet when it comes to setting the pod's IP in the Pod Status. Instead of using the infra-container's IP, we use the IP address of the first container.
Other approaches we investigated involved "disabling" the infra container, either conditionally on `runtime.GOOS` or having a separate windows-docker container runtime that re-implemented some of the methods (would require some refactoring to avoid maintainability nightmare).
Other changes:
- The default docker endpoint was removed. This results in the docker client using the default for the specific underlying OS.
More detailed documentation on how to setup the Windows kubelet can be found at https://docs.google.com/document/d/1IjwqpwuRdwcuWXuPSxP-uIz0eoJNfAJ9MWwfY20uH3Q.
cc: @ikester @brendandburns @jstarks
Automatic merge from submit-queue
[AppArmor] Hold bad AppArmor pods in pending rather than rejecting
Fixes https://github.com/kubernetes/kubernetes/issues/32837
Overview of the fix:
If the Kubelet needs to reject a Pod for a reason that the control plane doesn't understand (e.g. which AppArmor profiles are installed on the node), then it might contiinuously try to run the pod on the same rejecting node. This change adds a concept of "soft rejection", in which the Pod is admitted, but not allowed to run (and therefore held in a pending state). This prevents the pod from being retried on other nodes, but also prevents the high churn. This is consistent with how other missing local resources (e.g. volumes) is handled.
A side effect of the change is that Pods which are not initially runnable will be retried. This is desired behavior since it avoids a race condition when a new node is brought up but the AppArmor profiles have not yet been loaded on it.
``` release-note
Pods with invalid AppArmor configurations will be held in a Pending state, rather than rejected (failed). Check the pod status message to find out why it is not running.
```
@kubernetes/sig-node @timothysc @rrati @davidopp
Automatic merge from submit-queue
Separate Direct and Indirect streaming paths, implement indirect path for CRI
This PR refactors the `pkg/kubelet/container.Runtime` interface to remove the `ExecInContainer`, `PortForward` and `AttachContainer` methods. Instead, those methods are part of the `DirectStreamingRuntime` interface which all "legacy" runtimes implement. I also added an `IndirectStreamingRuntime` which handles the redirect path and is implemented by CRI runtimes. To control the size of this PR, I did not fully setup the indirect streaming path for the dockershim, so I left legacy path behind.
Most of this PR is moving & renaming associated with the refactoring. To understand the functional changes, I suggest tracing the code from `getExec` in `pkg/kubelet/server/server.go`, which calls `GetExec` in `pkg/kubelet/kubelet_pods.go` to determine whether to follow the direct or indirect path.
For https://github.com/kubernetes/kubernetes/issues/29579
/cc @kubernetes/sig-node
Automatic merge from submit-queue
CRI: Add devices to ContainerConfig
This PR adds devices to ContainerConfig and adds experimental GPU support.
cc/ @yujuhong @Hui-Zhi @vishh @kubernetes/sig-node
Automatic merge from submit-queue
Only set sysctls for infra containers
We did set the sysctls for each container in a pod. This opens up a way to set un-whitelisted sysctls during upgrade from v1.3:
- set annotation in v1.3 with an un-whitelisted sysctl. Set restartPolicy=Always
- upgrade cluster to v1.4
- kill container process
- un-whitelisted sysctl is set on restart of the killed container.
Automatic merge from submit-queue
SELinux Overhaul
Overhauls handling of SELinux in Kubernetes. TLDR: Kubelet dir no longer has to be labeled `svirt_sandbox_file_t`.
Fixes#33351 and #33510. Implements #33951.
Automatic merge from submit-queue
Use the rawTerminal setting from the container itself
**What this PR does / why we need it**:
Checks whether the container is set for rawTerminal connection and uses the appropriate connection.
Prevents the output `Error from server: Unrecognized input header` when doing `kubectl run`.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
helps with case 1 in #28695, resolves#30159
**Special notes for your reviewer**:
**Release note**:
```
release-note-none
```
If the infra container has already terminated, `docker inspect` will report
pid 0. The path constructed using the pid to check the network namespace of
the process will be invalid. This commit changes docker to report an empty
path to stop kubenet from erroring out whenever TearDown is called on an
exited infra container.
This is not a fix for all the plugins, as some plugins may require the actual
network namespace to tear down properly.
Previously, we used the docker config digest (also called "image ID"
by Docker) for the value of the `ImageID` field in the container status.
This was not particularly useful, since the config manifest is not
what's used to identify the image in a registry, which uses the manifest
digest instead. Docker 1.12+ always populates the RepoDigests field
with the manifest digests, and Docker 1.10 and 1.11 populate it when
images are pulled by digest.
This commit changes `ImageID` to point to the the manifest digest when
available, using the prefix `docker-pullable://` (instead of
`docker://`)
Previously, the `InspectImage` method of the Docker interface expected a
"pullable" image ref (name, tag, or manifest digest). If you tried to
inspect an image by its ID (config digest), the inspect would fail to
validate the image against the input identifier. This commit changes
the original method to be named `InspectImageByRef`, and introduces a
new method called `InspectImageByID` which validates that the input
identifier was an image ID.
Previously, we used the docker config digest (also called "image ID"
by Docker) for the value of the `ImageID` field in the container status.
This was not particularly useful, since the config manifest is not
what's used to identify the image in a registry, which uses the manifest
digest instead. Docker 1.12+ always populates the RepoDigests field
with the manifest digests, and Docker 1.10 and 1.11 populate it when
images are pulled by digest.
This commit changes `ImageID` to point to the the manifest digest when
available, using the prefix `docker-pullable://` (instead of
`docker://`)
Previously, the `InspectImage` method of the Docker interface expected a
"pullable" image ref (name, tag, or manifest digest). If you tried to
inspect an image by its ID (config digest), the inspect would fail to
validate the image against the input identifier. This commit changes
the original method to be named `InspectImageByRef`, and introduces a
new method called `InspectImageByID` which validates that the input
identifier was an image ID.
Automatic merge from submit-queue
Kubelet: fix port forward for dockershim
This PR fixes port forward for dockershim and also adds a `kubecontainer.FormatPod`.
Locally cluster has passed `--ginkgo.focus=Port\sforwarding'` tests.
cc/ @Random-Liu @yujuhong
Automatic merge from submit-queue
Add seccomp and apparmor support.
This PR adds seccomp and apparmor support in new CRI.
This a WIP because I'm still adding unit test for some of the functions. Sent this PR here for design discussion.
This PR is similar with https://github.com/kubernetes/kubernetes/pull/33450.
The differences are:
* This PR passes seccomp and apparmor configuration via annotations;
* This PR keeps the seccomp handling logic in docker shim because current seccomp implementation is very docker specific, and @timstclair told me that even the json seccomp profile file is defined by docker.
Notice that this PR almost passes related annotations in `api.Pod` to the runtime directly instead of introducing new CRI annotation.
@yujuhong @feiskyer @timstclair
Automatic merge from submit-queue
Apply default image tags for all runtimes
Move the docker-specific logic up to the ImageManager to allow code sharing
among different implementations.
Part of #31459
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Pod creation moved outside of docker manager tests
**What this PR does / why we need it**:
It cleans up docker manager tests a little.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: related to #31550
**Special notes for your reviewer**:
I don't claim that working on this issue is finished, I cleaned up the tests just a bit
**Release note**:
```release-note
NONE
```
This allows runtimes in different packages (dockertools, rkt, kuberuntime) to
share the same logic. Before this change, only dockertools support this
feature. Now all three packages support image pull throttling.
Docker 1.10 does not guarantee that the pulled digest matches the digest
on disk when dealing with v1 schemas stored in a Docker registry. This
is the case for images like
centos/ruby-23-centos7@sha256:940584acbbfb0347272112d2eb95574625c0c60b4e2fdadb139de5859cf754bf
which as a result of #30366 cannot be pulled by Kube from a Docker 1.10
system.
Instead, use RepoDigests field as the primary match, validating the
digest, and then fall back to ID (also validating the match). Adds more
restrictive matching.