Automatic merge from submit-queue
Add e2e node test for log path
fixes#34661
A node e2e test to check if container logs files are properly created with right content.
Since the log files under `/var/log/containers` are actually symbolic of docker containers log files, we can not use a pod to mount them in and do check (symbolic doesn't supported by docker volume).
cc @Random-Liu
Automatic merge from submit-queue
Use indirect streaming path for remote CRI shim
Last step for https://github.com/kubernetes/kubernetes/issues/29579
- Wire through the remote indirect streaming methods in the docker remote shim
- Add the docker streaming server as a handler at `<node>:10250/cri/{exec,attach,portforward}`
- Disable legacy streaming for dockershim
Note: This requires PR https://github.com/kubernetes/kubernetes/pull/34987 to work.
Tested manually on an E2E cluster.
/cc @euank @feiskyer @kubernetes/sig-node
Automatic merge from submit-queue
kubelet bootstrap: start hostNetwork pods before we have PodCIDR
Network readiness was checked in the pod admission phase, but pods that
fail admission are not retried. Move the check to the pod start phase.
Issue #35409
Issue #35521
Automatic merge from submit-queue
CRI: rearrange kubelet rutnime initialization
Consolidate the code used by docker+cri and remote+cri for consistency, and to
prevent changing one without the other. Enforce that
`--experimental-runtime-integration-type` has to be set in order for kubelet
use the CRI interface, *even for out-of-process shims`. This simplifies the
temporary `if` logic in kubelet while CRI still co-exists with older logic.
Automatic merge from submit-queue
CRI: Add Status into CRI.
For https://github.com/kubernetes/kubernetes/issues/35701.
Fixes https://github.com/kubernetes/kubernetes/issues/35701.
This PR added a `Status` call in CRI, and the `RuntimeStatus` is defined as following:
``` protobuf
message RuntimeCondition {
// Type of runtime condition.
optional string type = 1;
// Status of the condition, one of true/false.
optional bool status = 2;
// Brief reason for the condition's last transition.
optional string reason = 3;
// Human readable message indicating details about last transition.
optional string message = 4;
}
message RuntimeStatus {
// Conditions is an array of current observed runtime conditions.
repeated RuntimeCondition conditions = 1;
}
```
Currently, only `conditions` is included in `RuntimeStatus`, and the definition is almost the same with `NodeCondition` and `PodCondition` in K8s api.
@yujuhong @feiskyer @bprashanth If this makes sense, I'll send a follow up PR to let dockershim return `RuntimeStatus` and let kubelet make use of it.
@yifan-gu @euank Does this make sense to rkt?
/cc @kubernetes/sig-node
Automatic merge from submit-queue
[AppArmor] Hold bad AppArmor pods in pending rather than rejecting
Fixes https://github.com/kubernetes/kubernetes/issues/32837
Overview of the fix:
If the Kubelet needs to reject a Pod for a reason that the control plane doesn't understand (e.g. which AppArmor profiles are installed on the node), then it might contiinuously try to run the pod on the same rejecting node. This change adds a concept of "soft rejection", in which the Pod is admitted, but not allowed to run (and therefore held in a pending state). This prevents the pod from being retried on other nodes, but also prevents the high churn. This is consistent with how other missing local resources (e.g. volumes) is handled.
A side effect of the change is that Pods which are not initially runnable will be retried. This is desired behavior since it avoids a race condition when a new node is brought up but the AppArmor profiles have not yet been loaded on it.
``` release-note
Pods with invalid AppArmor configurations will be held in a Pending state, rather than rejected (failed). Check the pod status message to find out why it is not running.
```
@kubernetes/sig-node @timothysc @rrati @davidopp
Consolidate the code used by docker+cri and remote+cri for consistency, and to
prevent changing one without the other. Enforce that
`--experimental-runtime-integration-type` has to be set in order for kubelet
use the CRI interface, *even for out-of-process shims`. This simplifies the
temporary `if` logic in kubelet while CRI still co-exists with older logic.
This change add a container manager inside the dockershim to move docker daemon
and associated processes to a specified cgroup. The original kubelet container
manager will continue checking the name of the cgroup, so that kubelet know how
to report runtime stats.
Automatic merge from submit-queue
Add kubelet awareness to taint tolerant match caculator.
Add kubelet awareness to taint tolerant match caculator.
Ref: #25320
This is required by `TaintEffectNoScheduleNoAdmit` & `TaintEffectNoScheduleNoAdmitNoExecute `, so that node will know if it should expect the taint&tolerant
Automatic merge from submit-queue
Add node event for container/image GC failure
Follow up to #31988. Add an event for a node when container/image GC fails.
Automatic merge from submit-queue
Add seccomp and apparmor support.
This PR adds seccomp and apparmor support in new CRI.
This a WIP because I'm still adding unit test for some of the functions. Sent this PR here for design discussion.
This PR is similar with https://github.com/kubernetes/kubernetes/pull/33450.
The differences are:
* This PR passes seccomp and apparmor configuration via annotations;
* This PR keeps the seccomp handling logic in docker shim because current seccomp implementation is very docker specific, and @timstclair told me that even the json seccomp profile file is defined by docker.
Notice that this PR almost passes related annotations in `api.Pod` to the runtime directly instead of introducing new CRI annotation.
@yujuhong @feiskyer @timstclair
Automatic merge from submit-queue
Node-ip is not used when cloud provider is used
Currently --node-ip in kubelet is not being used when kubelet is configured with a cloud provider. With this fix, kubelet will get a list of IPs from the provider and parse it to return the one that matches node-ip.
This fixes#23568
Automatic merge from submit-queue
Use strongly-typed types.NodeName for a node name
We had another bug where we confused the hostname with the NodeName.
Also, if we want to use different values for the Node.Name (which is
an important step for making installation easier), we need to keep
better control over this.
A tedious but mechanical commit therefore, to change all uses of the
node name to use types.NodeName
Automatic merge from submit-queue
Move Kubelet pod-management code into kubelet_pods.go
Finish the kubelet code moves started during the 1.3 dev cycle -- move pod management code into a file called `kubelet_pods.go`.
We had another bug where we confused the hostname with the NodeName.
To avoid this happening again, and to make the code more
self-documenting, we use types.NodeName (a typedef alias for string)
whenever we are referring to the Node.Name.
A tedious but mechanical commit therefore, to change all uses of the
node name to use types.NodeName
Also clean up some of the (many) places where the NodeName is referred
to as a hostname (not true on AWS), or an instanceID (not true on GCE),
etc.
Automatic merge from submit-queue
Add positive logging for GC events
We have no positive logging for GC events. This PR:
1. Adds positive logging at V(4) for success cases
2. Adds positive logging at V(1) for the first successful GC after a failure
Automatic merge from submit-queue
Move image pull throttling logic to pkg/kubelet/images
This is part of #31458
This allows runtimes in different packages (dockertools, rkt, kuberuntime) to
share the same logic. Before this change, only dockertools support this
feature. Now all three packages support image pull throttling.
/cc @kubernetes/sig-node
This allows runtimes in different packages (dockertools, rkt, kuberuntime) to
share the same logic. Before this change, only dockertools support this
feature. Now all three packages support image pull throttling.
Automatic merge from submit-queue
simplify RC and SVC listers
Make the RC and SVC listers use the common list functions that more closely match client APIs, are consistent with other listers, and avoid unnecessary copies.
The new flag, if specified, and if --container-runtime=docker, switches
kubelet to use the new CRI implementation for testing. This is hidden flag
since the feature is still under heavy development and the flag may be changed
in the near future.
Automatic merge from submit-queue
Check kubeClient nil in Kubelet and bugfix
1. check kubeClient nil first before using as it maybe nil
2. configMaps and secrets map do not be used properly and should use it as cache
Automatic merge from submit-queue
Fixed TODO: move predicate check into a pod admitter
refractoring AdmitPod func to move predicate check into a pod admitter
Automatic merge from submit-queue
Fix hang/websocket timeout when streaming container log with no content
When streaming and following a container log, no response headers are sent from the kubelet `containerLogs` endpoint until the first byte of content is written to the log. This propagates back to the API server, which also will not send response headers until it gets response headers from the kubelet. That includes upgrade headers, which means a websocket connection upgrade is not performed and can time out.
To recreate, create a busybox pod that runs `/bin/sh -c 'sleep 30 && echo foo && sleep 10'`
As soon as the pod starts, query the kubelet API:
```
curl -N -k -v 'https://<node>:10250/containerLogs/<ns>/<pod>/<container>?follow=true&limitBytes=100'
```
or the master API:
```
curl -N -k -v 'http://<master>:8080/api/v1/<ns>/pods/<pod>/log?follow=true&limitBytes=100'
```
In both cases, notice that the response headers are not sent until the first byte of log content is available.
This PR:
* does a 0-byte write prior to handing off to the container runtime stream copy. That commits the response header, even if the subsequent copy blocks waiting for the first byte of content from the log.
* fixes a bug with the "ping" frame sent to websocket streams, which was not respecting the requested protocol (it was sending a binary frame to a websocket that requested a base64 text protocol)
* fixes a bug in the limitwriter, which was not propagating 0-length writes, even before the writer's limit was reached
This refactor removes the legacy KubeletConfig object and adds a new
KubeletDeps object, which contains injected runtime objects and
separates them from static config. It also reduces NewMainKubelet to two
arguments: a KubeletConfiguration and a KubeletDeps.
Some mesos and kubemark code was affected by this change, and has been
modified accordingly.
And a few final notes:
KubeletDeps:
KubeletDeps will be a temporary bin for things we might consider
"injected dependencies", until we have a better dependency injection
story for the Kubelet. We will have to discuss this eventually.
RunOnce:
We will likely not pull new KubeletConfiguration from the API server
when in runonce mode, so it doesn't make sense to make this something
that can be configured centrally. We will leave it as a flag-only option
for now. Additionally, it is increasingly looking like nobody actually uses the
Kubelet's runonce mode anymore, so it may be a candidate for deprecation
and removal.
Automatic merge from submit-queue
Kubelet code move: volume / util
Addresses some odds and ends that I apparently missed earlier. Preparation for kubelet code-move ENDGAME.
cc @kubernetes/sig-node
Automatic merge from submit-queue
Add kubelet --network-plugin-mtu flag for MTU selection
* Add network-plugin-mtu option which lets us pass down a MTU to a network provider (currently processed by kubenet)
* Add a test, and thus make sysctl testable
MTU selection is difficult, and if there is a transport such as IPSEC in
use may be impossible. So we allow specification of the MTU with the
network-plugin-mtu flag, and we pass this down into the network
provider.
Currently implemented by kubenet.
Automatic merge from submit-queue
Kubelet: add --container-runtime-endpoint and --image-service-endpoint
Flag `--container-runtime-endpoint` (overrides `--container-runtime`) is introduced to identify the unix socket file of the remote runtime service. And flag `--image-service-endpoint` is introduced to identify the unix socket file of the image service.
This PR is part of #28789 Milestone 0.
CC @yujuhong @Random-Liu
The serviceAccountName is occasionally useful for clients running on
Kube that need to know who they are when talking to other components.
The nodeName is useful for PetSet or DaemonSet pods that need to make
calls back to the API to fetch info about their node.
Both fields are immutable, and cannot easily be retrieved in another
way.
Automatic merge from submit-queue
Always return command output for exec probes and kubelet RunInContainer
Always return command output for exec probes and kubelet RunInContainer, even if the command invocation returns nonzero.
When #24921 replaced RunInContainer with ExecInContainer, it introduced a change where an exec probe that failed no longer included the stdout/stderr from the probe in the event. For example, when running at log level 4, you see:
```
I0816 15:01:36.259826 29713 exec.go:38] Exec probe response: "Failed to access the status endpoint : HTTP Error 404: Not Found.\nHawkular metrics has only been running for 7\n seconds not aborting yet.\n"
```
But the event looks like this:
```
54s 22s 5 hawkular-metrics-hjme4 Pod spec.containers{hawkular-metrics} Warning Unhealthy {kubelet corbeau} Readiness probe failed:
```
Note the absence of the exec probe response after "Readiness probe failed". This PR restores the previous behavior.
cc @kubernetes/rh-cluster-infra @mwringe
xref https://github.com/openshift/origin/issues/10424
Automatic merge from submit-queue
Unblock iterative development on pod-level cgroups
In order to allow forward progress on this feature, it takes the commits from #28017#29049 and then it globally disables the flag that allows these features to be exercised in the kubelet. The flag can be re-added to the kubelet when its actually ready.
/cc @vishh @dubstack @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
Add Events for operation_executor to show status of mounts, failed/successful to show in describe events
Fixes#27590
@saad-ali @pmorie @erinboyd
After talking with @pmorie last week about the above issue, I decided to poke around and see if I could remedy. The refactoring broke my previous UXP merged PR's that correctly showed failed mount errors in the describe events. However, Not sure I implemented correctly, but it tested out and seems to be working, let me know what I missed or if this is not the correct approach.
```
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
2m 2m 1 {default-scheduler } Normal Scheduled Successfully assigned nfs-bb-pod1 to 127.0.0.1
44s 44s 1 {kubelet 127.0.0.1} Warning FailedMount Unable to mount volumes for pod "nfs-bb-pod1_default(a94f64f1-37c9-11e6-9aa5-52540073d346)": timeout expired waiting for volumes to attach/mount for pod "nfs-bb-pod1"/"default". list of unattached/unmounted volumes=[nfsvol]
44s 44s 1 {kubelet 127.0.0.1} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "nfs-bb-pod1"/"default". list of unattached/unmounted volumes=[nfsvol]
38s 38s 1 {kubelet } Warning FailedMount Unable to mount volumes for pod "a94f64f1-37c9-11e6-9aa5-52540073d346": Mount failed: exit status 32
Mounting arguments: nfs1.rhs:/opt/data99 /var/lib/kubelet/pods/a94f64f1-37c9-11e6-9aa5-52540073d346/volumes/kubernetes.io~nfs/nfsvol nfs []
Output: mount.nfs: Connection timed out
Resolution hint: Check and make sure the NFS Server exists (ensure that correct IPAddress/Hostname was given) and is available/reachable.
Also make sure firewall ports are open on both client and NFS Server (2049 v4 and 2049, 20048 and 111 for v3).
Use commands telnet <nfs server> <port> and showmount <nfs server> to help test connectivity.
```
New flag --container-runtime-endpoint (overrides --container-runtime)
is introduced to kubelet which identifies the unix socket file of
the remote runtime service. And new flag --image-service-endpoint is
introduced to kubelet which identifies the unix socket file of the
image service.
Automatic merge from submit-queue
Fix default resource limits (node allocatable) for downward api volumes and env vars
@kubernetes/rh-cluster-infra @pmorie @derekwaynecarr