Automatic merge from submit-queue
[AppArmor] Promote AppArmor annotations to beta
Justification for promoting AppArmor to beta:
1. We will provide an upgrade path to GA
2. We don't anticipate any major changes to the design, and will continue to invest in this feature
3. We will thoroughly test it. If any serious issues are uncovered we can reevaluate, and we're committed to fixing them.
4. We plan to provide beta-level support for the feature anyway (responding quickly to issues).
Note that this does not include the yet-to-be-merged status annotation (https://github.com/kubernetes/kubernetes/pull/31382). I'd like to propose keeping that one alpha for now because I'm not sure the PodStatus is the right long-term home for it (I think a separate monitoring channel, e.g. cAdvisor, would be a better solution).
/cc @thockin @matchstick @erictune
Automatic merge from submit-queue
Set imagefs rank and reclaim functions when nodefs+imagefs share comm…
Fixes#31192
I decided that the behavior should match the current output of the kubelet summary API. With no dedicated imagefs, the ranking and reclaim functions will match the nodefs ranking and reclaim functions.
/cc @ronnielai @vishh
Automatic merge from submit-queue
Add AppArmor feature gate
Add option to disable AppArmor via a feature gate. This PR treats AppArmor as Beta, and thus depends on https://github.com/kubernetes/kubernetes/pull/31471 (I will remove `do-not-merge` once that merges).
Note that disabling AppArmor means that pods with AppArmor annotations will be rejected in validation. It does not mean that the components act as though AppArmor was never implemented. This is by design, because we want to make it difficult to accidentally run a Pod with an AppArmor annotation without AppArmor protection.
/cc @dchen1107
Automatic merge from submit-queue
Add validation preventing recycle of / in a hostPath PV
Adds a validation that prevents a user from recycling `/` when it is used in a hostPath PV
cc @kubernetes/sig-storage
Automatic merge from submit-queue
add validation for PV spec to ensure correct values are used for ReclaimPolicy on initial create
k8 currently allows invalid values for ReclaimPolicy (i.e. "scotto") - this allows the PV to be created and even bound, however, when the pvc or pod is deleted and the recycler is triggered, an error is thrown
```
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
36s 36s 1 {persistentvolume-controller } Warning VolumeUnknownReclaimPolicy Volume has unrecognized PersistentVolumeReclaimPolicy
```
New behavior will not allow the user to create the PV:
```
[root@k8dev nfs]# kubectl create -f nfs-pv-bad.yaml
The PersistentVolume "pv-gce" is invalid: spec.persistentVolumeReclaimPolicy: Unsupported value: "scotto": supported values: Delete, Recycle, Retain
```
Automatic merge from submit-queue
Add get/delete cluster, delete context to kubectl config
Fixes#29794 by adding `get-clusters`, `delete-cluster` and `delete-context` actions to `kubectl config`.
Automatic merge from submit-queue
Use updated deployment after rollback
@kubernetes/deployment
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
NONE
```
Automatic merge from submit-queue
Remove annoying log
In large clusters, those were ~85% of logs in controller manager (2.3M lines), which doesn't provide any value...
Automatic merge from submit-queue
Fix hang/websocket timeout when streaming container log with no content
When streaming and following a container log, no response headers are sent from the kubelet `containerLogs` endpoint until the first byte of content is written to the log. This propagates back to the API server, which also will not send response headers until it gets response headers from the kubelet. That includes upgrade headers, which means a websocket connection upgrade is not performed and can time out.
To recreate, create a busybox pod that runs `/bin/sh -c 'sleep 30 && echo foo && sleep 10'`
As soon as the pod starts, query the kubelet API:
```
curl -N -k -v 'https://<node>:10250/containerLogs/<ns>/<pod>/<container>?follow=true&limitBytes=100'
```
or the master API:
```
curl -N -k -v 'http://<master>:8080/api/v1/<ns>/pods/<pod>/log?follow=true&limitBytes=100'
```
In both cases, notice that the response headers are not sent until the first byte of log content is available.
This PR:
* does a 0-byte write prior to handing off to the container runtime stream copy. That commits the response header, even if the subsequent copy blocks waiting for the first byte of content from the log.
* fixes a bug with the "ping" frame sent to websocket streams, which was not respecting the requested protocol (it was sending a binary frame to a websocket that requested a base64 text protocol)
* fixes a bug in the limitwriter, which was not propagating 0-length writes, even before the writer's limit was reached
Automatic merge from submit-queue
Fix getting pods from all namespaces
**What this PR does / why we need it**:
Use Heapster handler for pods from all namespaces (added in the new version).
Depends on #30993
Automatic merge from submit-queue
persist services need to be retried in service controller cache.
fix issue reported by @anguslees
more detail on #25189
Automatic merge from submit-queue
rkt: Force `rkt fetch` to fetch from remote to conform the image pull policy.
Fix https://github.com/kubernetes/kubernetes/issues/27646
Use `--no-store` option for `rkt fetch` to force it to fetch from remote.
However, `--no-store` will fetch the remote image regardless of whether the content of the image has changed or not.
This causes performance downgrade when the image tag is ':latest' and the image pull policy is 'always'.
The issue is tracked in https://github.com/coreos/rkt/issues/2937.
Automatic merge from submit-queue
Add ReclaimPolicy to the resource printer for 'get pv'
Propose we add the RECLAIMPOLICY (persistentVolumeReclaimPolicy) from resource_printer.go to show the policy when a user does a ```kubectl get pv```
```
[root@k8dev nfs]# kubectl get pv
NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM REASON AGE
pv-nfs 1Gi RWO Retain Available 1m
pv-nfs2 1Gi RWO Delete Available 4s
```
Automatic merge from submit-queue
Allow services which use same port, different protocol to use the same nodePort for both
fix#20092
@thockin @smarterclayton ptal.
Automatic merge from submit-queue
Filter internal Kubernetes labels from Prometheus metrics
**What this PR does / why we need it**:
Kubernetes uses Docker labels as storage for some internal labels. The
majority of these labels are not meaningful metric labels and a few of
them are even harmful as they're not static and cause wrong aggregation
results.
This change provides a custom labels func to only attach meaningful
labels to cAdvisor exported metrics.
**Which issue this PR fixes**
google/cadvisor#1312
**Special notes for your reviewer**:
Depends on google/cadvisor#1429. Once that is merged, I'll update the vendor update commit.
**Release note**:
```release-note
Remove environment variables and internal Kubernetes Docker labels from cAdvisor Prometheus metric labels.
Old behavior:
- environment variables explicitly whitelisted via --docker-env-metadata-whitelist were exported as `container_env_*=*`. Default is zero so by default non were exported
- all docker labels were exported as `container_label_*=*`
New behavior:
- Only `container_name`, `pod_name`, `namespace`, `id`, `image`, and `name` labels are exposed
- no environment variables will be exposed ever via /metrics, even if whitelisted
```
---
Given that we have full control over the exported label set, I shortened the pod_name, pod_namespace and container_name label names. Below an example of the change (reformatted for readability).
```
# BEFORE
container_cpu_cfs_periods_total{
container_label_io_kubernetes_container_hash="5af8c3b4",
container_label_io_kubernetes_container_name="sync",
container_label_io_kubernetes_container_restartCount="1",
container_label_io_kubernetes_container_terminationMessagePath="/dev/termination-log",
container_label_io_kubernetes_pod_name="popularsearches-web-3165456836-2bfey",
container_label_io_kubernetes_pod_namespace="popularsearches",
container_label_io_kubernetes_pod_terminationGracePeriod="30",
container_label_io_kubernetes_pod_uid="6a291e48-47c4-11e6-84a4-c81f66bdf8bd",
id="/docker/68e1f15353921f4d6d4d998fa7293306c4ac828d04d1284e410ddaa75cf8cf25",
image="redacted.com/popularsearches:42-16-ba6bd88",
name="k8s_sync.5af8c3b4_popularsearches-web-3165456836-2bfey_popularsearches_6a291e48-47c4-11e6-84a4-c81f66bdf8bd_c02d3775"
} 72819
# AFTER
container_cpu_cfs_periods_total{
container_name="sync",
pod_name="popularsearches-web-3165456836-2bfey",
namespace="popularsearches",
id="/docker/68e1f15353921f4d6d4d998fa7293306c4ac828d04d1284e410ddaa75cf8cf25",
image="redacted.com/popularsearches:42-16-ba6bd88",
name="k8s_sync.5af8c3b4_popularsearches-web-3165456836-2bfey_popularsearches_6a291e48-47c4-11e6-84a4-c81f66bdf8bd_c02d3775"
} 72819
```
Feedback requested on:
* Label names. Other suggestions? Should we keep these very long ones?
* Do we need to export io.kubernetes.pod.uid? It makes working with the metrics a bit more complicated and the pod name is already unique at any time (but not over time). The UID is aslo part of `name`.
As discussed with @timstclair, this should be added to v1.4 as the current labels are harmful.
PTAL @jimmidyson @fabxc @vishh
Automatic merge from submit-queue
Split the version metric out to its own package
This PR breaks a client dependency on prometheus. Combined with #30638, the client will no longer depend on these packages.
Automatic merge from submit-queue
return destroy func to clean up internal resources of storage
What?
Provide a destroy func to clean up internal resources of storage.
It changes **unit tests** to clean up resources. (Maybe fix integration test in another PR.)
Why?
Although apiserver is designed to be long running, there are some cases that it's not.
See https://github.com/kubernetes/kubernetes/issues/31262#issuecomment-242208771
We need to gracefully shutdown and clean up resources.
Most of the contents of docs/ has moved to kubernetes.github.io.
Development of the docs and accompanying files has continued there, making
the copies in this repo stale. I've removed everything but the .md files
which remain to redirect old links. The .yaml config files in the docs
were used by some tests, these have been moved to test/fixtures/doc-yaml,
and can remain there to be used by tests or other purposes.
Automatic merge from submit-queue
Improve godoc for goroutinemap
Improves the godoc of goroutinemap; found while preparing to use this type in another PR.
@saad-ali
Automatic merge from submit-queue
Refactor to simplify the hard-traveled path of the KubeletConfiguration object
### There are two main goals of this PR:
- Make `NewMainKubelet` take `KubeletConfiguration` and `KubeletDeps` as its only arguments.
- Finally eliminate the legacy `KubeletConfig` type.
### Why am I doing this?
Long story short, I started adding an endpoint to the Kubelet to display the *current* config that the Kubelet was running with, and I realized a few things:
- There were so many transformations to the configuration, in so many different places, before it was used that I wasn't confident the values initially passed in on the `KubeletConfiguration` would be the correct values to report by the time someone used the endpoint to check on them.
- Trying to reconstruct a `KubeletConfiguration` object from a mix of the `Kubelet` object and the legacy `KubeletConfig` object would just add to the mess (not to mention maintenance burden), and it would be much easier if we passed the `KubeletConfiguration` all the way down to where we construct the `Kubelet` object, and then just store a reference to the `KubeletConfiguration` object on the `Kubelet` for later retrieval.
- My hope is that by eliminating unnecessary internal transformations to the config information, and by consolidating the remaining ones in a single place (`NewMainKubelet`), we can have a much clearer understanding of what happens to the config before it makes it to the `Kubelet` object, and also a better ability to report up-to-date information on the status of the Kubelet.
So I started cleaning things up :-).
### Discussion points
It was relatively simple to get `NewMainKubelet` to just take the legacy `KubeletConfig` as its only argument, because most of its arguments were just passing through `KubeletConfig` fields or passing information that was generated solely from `KubeletConfig` fields.
Completely eliminating the legacy `KubeletConfig` type has been more difficult, because the fields of the `KubeletConfiguration` do not have a one-to-one relationship with the fields of the `KubeletConfig`. While I was able to eliminate many of the `KubeletConfig` fields, I'm starting to get into the nontrivial stuff and I'd like to get a discussion started on what should happen with the remaining fields (pending cherry-picking notwithstanding).
On my `kconf-refactor` branch, the legacy `KubeletConfig` object is down to the following 27 fields (from the initial 93). I'd really appreciate any guidance people have on what should happen with these fields.
```
type KubeletConfig struct {
Auth server.AuthInterface
AutoDetectCloudProvider bool
Builder KubeletBuilder
CAdvisorInterface cadvisor.Interface
Cloud cloudprovider.Interface
ContainerManager cm.ContainerManager
DockerClient dockertools.DockerInterface
EventClient *clientset.Clientset
Hostname string
HostNetworkSources []string
HostPIDSources []string
HostIPCSources []string
KubeClient *clientset.Clientset
Mounter mount.Interface
NetworkPlugins []network.NetworkPlugin
NodeName string
OOMAdjuster *oom.OOMAdjuster
OSInterface kubecontainer.OSInterface
PodConfig *config.PodConfig
Recorder record.EventRecorder
Reservation kubetypes.Reservation
TLSOptions *server.TLSOptions
Writer kubeio.Writer
VolumePlugins []volume.VolumePlugin
EvictionConfig eviction.Config
ContainerRuntimeOptions []kubecontainer.Option
Options []Option
}
```
The patterns I've seen so far with respect to eliminating `KubeletConfig` fields may be of some help:
- Some fields could just be eliminated, because they were either the same on `KubeletConfiguration` or just a typecast away from being the same.
- Some fields from `KubeletConfiguration` just ended up in substructures of `KubeletConfig`; it was easy to just remove those substructure fields from `KubeletConfig` and construct them using local vars in `NewMainKubelet` instead.
- Some fields, e.g. `Runonce`, were able to move into the `KubeletConfiguration`.
**P.S.** Part of the way I'm making the transition is by adding an extra `KubeletConfiguration` argument to functions that originally took a `KubeletConfig`, and field-by-field, switching those functions over to using information from the `KubeletConfiguration`. Once the `KubeletConfig` is gone, I'll remove the `KubeletConfig` argument, and the transition will be complete.
**Final note:**
Please try to keep in mind that this is not a general Kubelet cleanup effort, it is just me cleaning things up that are directly in the path of what I'm trying to do. Let's keep this focused on cleanup related to the path that config takes on it's way to the Kubelet.
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```
Removed Flags
- Removes the --auth-path flag. This has been deprecated in favor of --kubeconfig for two releases.
```