Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Disable memcg for testing prior to 1.11 release
**What this PR does / why we need it**:
Turn off kubelet memory cgroup notifications on GCE to unblock scalability testing.
Related issue: #62808
```release-note
NONE
```
/sig node
/kind bug
/priority critical-urgent
/assign @shyamjvs @yujuhong
Automatic merge from submit-queue (batch tested with PRs 64060, 63904, 64218, 64208, 64247). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Revert enable PodPreset admission and also enable settings.k8s.io/v1a…
…lpha1 api resource
**What this PR does / why we need it**:
Enable PodPreset admission for there are alpha feature test cases covering it. Simultaneously enable sttings.k8s.io/v1alpha1 api resource.
Fixes#63843
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 63569, 63918, 63980, 63295, 63989). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
New event exporter config with support for new stackdriver resources
New event exporter, with support for use new and old stackdriver resource model.
This should also be cherry-picked to release-1.10 branch, as all fluentd-gcp components support new and stackdriver resource model.
```release-note
Update event-exporter to version v0.2.0 that supports old (gke_container/gce_instance) and new (k8s_container/k8s_node/k8s_pod) stackdriver resources.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Deprecate InfluxDB cluster monitoring
InfluxDB cluster monitoring addon will no longer be supported and will be removed in k8s 1.12.
Default monitoring solution will be changed to `standalone`.
Heapster will still be deployed for backward compatibility of `kubectl top`
```release-note
Stop using InfluxDB as default cluster monitoring
InfluxDB cluster monitoring is deprecated and will be removed in v1.12
```
cc @piosz
Automatic merge from submit-queue (batch tested with PRs 63492, 62379, 61984, 63805, 63807). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
remove PodPreset and enable resources for Priority admission plugins in e2e-gce
**What this PR does / why we need it**:
e2e-gce start kube-apiserver without admission PodPreset and enable resources for Priority
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#62377
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Use the logging agent's node name as the metadata agent URL.
The Stackdriver Logging agent should use the node's hostname when it constructs the Stackdriver Metadata Agent's URL, currently, it's using the GKE Master's hostname, which is a bug.
**Release note:**
```release-note
[fluentd-gcp addon] Use the logging agent's node name as the metadata agent URL.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update all script shebangs to use /usr/bin/env interpreter instead of /bin/interpreter
This is required to support systems where bash doesn't reside in /bin (such as NixOS, or the *BSD family) and allow users to specify a different interpreter version through $PATH manipulation.
https://www.cyberciti.biz/tips/finding-bash-perl-python-portably-using-env.html
```release-note
Use /usr/bin/env in all script shebangs to increase portability.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add prometheus cluster monitoring addon.
This PR adds new cluster monitoring addon based on prometheus.
It adds prometheus deployment with e2e tests.
Additional components will be added iterativly in future.
Manifests based on current Helm chart.
At current state it's not intended for production use.
cc @piosz @kawych @miekg
```release-note
Add prometheus cluster monitoring addon to kube-up
```
/sig instrumentation
/kind feature
/priority important-soon
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Reimplement migrate-if-needed.sh in go
The `migrate-if-needed.sh` script was already partially implemented in go (see the attachlease and rollback sub-dirs), but was still unnecessarily difficult to understand and test. This closely reimplements the original logic but with improved code structure, error handling and testing.
Where possible, go code that was previously executed as separate binaries is now statically linked into a single 'migrate' go cobra CLI app, which is then thinly wrapped by`migrate-if-needed.sh`.
There are numerous additional improvements that need to be made, but will be submitted in future PRs. This PR is focused on achieving parity with the pre-existing functionality and introducing some much needed test coverage, in particular HA cluster upgrade test coverage.
It appears that the `attachlease` and `rollback` go binaries are no longer needed as standalones and so I have consolidated them into the new `migrate` go binary. Other than that, this change aims to be 100% backward compatible.
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add support to ingest log entries to Stackdriver against new "k8s_container" and "k8s_node" resources.
**What this PR does / why we need it**:
**Which issue(s) this PR fixes**
Fluentd 0.14 has some memory leak issues that caused the e2e tests to be flaky. Downgrading to v0.12.
**Special notes for your reviewer**:
We never released any previous version with Fluentd v0.14. Only upgraded it very recently. So this downgrading is not visible to users.
**Release note**:
```release-note
Add support to ingest log entries to Stackdriver against new "k8s_container" and "k8s_node" resources.
```
Automatic merge from submit-queue (batch tested with PRs 62162, 60628, 62172). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
When using custom network with IP-alias, use the former's subnet for the latter too
Currently, when we're using custom subnet and ip-alias simultaneously, the cluster fails to come up.
The reason is because we're creating a subnet in the former with one name, but expecting a differently named subnet for the latter.
This is causing [continuous failures in our 100-node job](https://k8s-testgrid.appspot.com/sig-scalability-gce#gce) where I recently turned both of them on.
cc @kubernetes/sig-network-bugs
```release-note
NONE
```
Set the default to cos-stable-65 (which is what we are using on GKE for
latest 1.9 and 1.8) and set config-test to use cos-beta-66, so that we
can get more exposure to it.
The testgrid seems to be fairly happy with these images. (both
e2e-gce-cosdev-k8sdev-default and e2e-gce-cosbeta-k8sdev-default are
generally green.)
Automatic merge from submit-queue (batch tested with PRs 61904, 61565, 61401, 61432, 61772). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove rktnetes code
**What this PR does / why we need it**:
rktnetes is scheduled to be deprecated in 1.10 (#53601). According to the deprecation policy for beta CLI and flags, we can remove the feature in 1.11.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#58721
**Special notes for your reviewer**:
**Release note**:
```release-note
Removed rknetes code, which was deprecated in 1.10.
```
/assign @yujuhong
/hold
Hold until the end of the freeze.
Automatic merge from submit-queue (batch tested with PRs 60166, 61706, 61769). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Change HAIRPIN_MODE to hairpin-veth as default
**What this PR does / why we need it**:
Change the default HAIRPIN_MODE back to "hairpin-veth".
It was previously "promiscuous-bridge" in order to workaround a kernel bug which deadlocked the machine when hairpin-veth was used. (#27498)
After some thorough manual testing on ubuntu clusters, we feel confident now that the kernel bug is fixed so we should switch back to using hairpin-veth. This will allow us to clean up some ebtables rules that were put in place to make "promiscuous-bridge" work properly.
Once this change goes in, we need to carefully monitor our e2e tests to make sure the bug has not resurfaced.
**Release note**:
```release-note
In a GCE cluster, the default HAIRPIN_MODE is now "hairpin-veth".
```
/cc @freehan @prameshj
/assign @roberthbailey
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update GCP fluentd configmap for COS audit logging on GKE node
**What this PR does / why we need it**:
This PR adds a placeholder in fluentd configmap for COS audit logging on GKE node.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
NONE
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Bump to etcd 3.1.12 to pick up critical fix
etcd [3.1.12](https://github.com/coreos/etcd/releases/tag/v3.1.12) (as well as 3.2.17 and 3.3.2) was released yesterday to fix a bug critical to kubernetes:
Fix [mvcc "unsynced" watcher restore operation](https://github.com/coreos/etcd/pull/9297).
- "unsynced" watcher is watcher that needs to be in sync with events that have happened.
- That is, "unsynced" watcher is the slow watcher that was requested on old revision.
- "unsynced" watcher restore operation was not correctly populating its underlying watcher group.
- Which possibly causes [missing events from "unsynced" watchers](https://github.com/coreos/etcd/issues/9086).
This will be backported to 1.9 as well.
Release note:
```release-note
Upgrade the default etcd server version to 3.1.12 to pick up critical etcd "mvcc "unsynced" watcher restore operation" fix.
```
cc @gyuho @wojtek-t @shyamjvs @timothysc @jdumars
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Setting REMOUNT_VOLUME_PLUGIN_DIR for COS images in kube-env
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#60725
**Special notes for your reviewer**: Not sure if it's the best place to set `REMOUNT_VOLUME_PLUGIN_DIR`.
/sig storage
/sig cluster-lifecycle
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Make log audit backend configurable in GCE
This PR will allow to enable audit logging batching by default in e2e tests, after https://github.com/kubernetes/kubernetes/pull/60739 is merged. This is an important step to prevent a regression in scale tests.
/cc @tallclair @sttts
/assign @roberthbailey
Robert, please approve
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 58171, 58036, 60540). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Changing Flexvolume plugin directory on COS in GCE to a durable directory
**What this PR does / why we need it**: The original `/etc/srv/...` directory is in an overlayfs over a path in /tmp, so Flexvolume drivers are erased across node restarts for any reason. Changing it to non-tmpfs location.
Also removing redundant Flexvolume path injection in `config-test.sh` because it's already in `cluster/common.sh`.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#57353
**Release note**:
```release-note
[action required] Default Flexvolume plugin directory for COS images on GCE is changed to `/home/kubernetes/flexvolume`.
```
/assign @roberthbailey @saad-ali
/cc @chakri-nelluri @wongma7
/sig storage
Automatic merge from submit-queue (batch tested with PRs 60433, 59982, 59128, 60243, 60440). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
[fluentd-gcp addon] Update to use Stackdriver Agent image.
Update the fluentd DaemonSet to use the Stackdriver Logging Agent container image.
The Stackdriver Logging Agent container image uses fluentd v0.14.25.
We add a special label to each log record as a signal to logging backends to handle both new and legacy resource types.
**Release note:**
```release-note
[fluentd-gcp addon] Switch to the image, provided by Stackdriver.
```
Automatic merge from submit-queue (batch tested with PRs 59310, 60424, 60308, 60436, 60020). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Move kubelet flag generation from the node to the client
Pass the kubelet flags through a new variable in kube-env (KUBELET_ARGS).
Remove vars from kube-env that were only used for kubelet flags.
This will make it simpler to gradually migrate to dynamic kubelet
config, because we can gradually replace flags with config file
options in a single place without worrying about the plumbing to
move variables from the client onto the node.
/cc @verult (re: https://github.com/kubernetes/kubernetes/pull/58171)
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
action required: [GCP kube-up.sh] Some variables that were part of kube-env are no longer being set (ones only used for kubelet flags) and are being replaced by a more portable mechanism (kubelet configuration file). The individual variables in the kube-env metadata entry were never meant to be a stable interface and this release note only applies if you are depending on them.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Enable protection tests
**What this PR does / why we need it**:
- StorageObjectInUseProtection feature is enabled by default so the test can run in regular e2e test suite
- Rename PVC protection test, it tests only PVCs and not whole storage.
**Release note**:
```release-note
NONE
```
pass the kubelet flags through a new variable in kube-env
(KUBELET_ARGS).
Remove vars from kube-env that were only used for kubelet flags.
This will make it simpler to gradually migrate to dynamic kubelet
config, because we can gradually replace flags with config file
options in a single place without worrying about the plumbing to
move variables from the client onto the node.
- StorageObjectInUseProtection is enabled by default now so the test can run in regular tests.
- Enable StorageObjectInUseProtection admission plugins during tests
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Introduce e2e test for Stackdriver Metadata Agent
**What this PR does / why we need it**:
Introduce e2e test for Stackdriver Metadata Agent
**Release note**:
```release-note
None
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Enable scaling fluentd-gcp resources using ScalingPolicy.
See https://github.com/justinsb/scaler for more details about ScalingPolicy resource.
**What this PR does / why we need it**:
This is adding a way to override fluentd-gcp resources in a running cluster. The resources syncing for fluentd-gcp is decoupled from addon manager.
**Special notes for your reviewer**:
**Release note**:
```release-note
fluentd-gcp resources can be modified via a ScalingPolicy
```
cc @kawych @justinsb
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Upload container runtime log to sd/es.
I've verified this in my environment. My stackdriver has an extra `container-runtime` entry for node log, and it collects container runtime daemon log correctly.
@yujuhong @feiskyer @crassirostris @piosz
@kubernetes/sig-node-pr-reviews @kubernetes/sig-instrumentation-pr-reviews
Signed-off-by: Lantao Liu <lantaol@google.com>
**Release note**:
```release-note
Container runtime daemon (e.g. dockerd) logs in GCE cluster will be uploaded to stackdriver and elasticsearch with tag `container-runtime`
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add a new environment variable to the gce startup scripts called KUBE_PROXY_MODE
**What this PR does / why we need it**:
This PR adds a new environment variable called KUBE_PROXY_MODE to the startup scripts for gce. This variable will allow a user to specify the kube-proxy implementation they want to use, with the choices being ipvs or iptables (iptables is default).
Next steps:
1. Need to remove use of feature gateway when IPVS goes GA
2. Need to add logic of loading required ipvs kernel modules in the scripts
Question: If the proxier is IPVS, is it necessary to have the iptables sync period flags?
**Release note**:
```release-note
None
```
This is the 2nd attempt. The previous was reverted while we figured out
the regional mirrors (oops).
New plan: k8s.gcr.io is a read-only facade that auto-detects your source
region (us, eu, or asia for now) and pulls from the closest. To publish
an image, push k8s-staging.gcr.io and it will be synced to the regionals
automatically (similar to today). For now the staging is an alias to
gcr.io/google_containers (the legacy URL).
When we move off of google-owned projects (working on it), then we just
do a one-time sync, and change the google-internal config, and nobody
outside should notice.
We can, in parallel, change the auto-sync into a manual sync - send a PR
to "promote" something from staging, and a bot activates it. Nice and
visible, easy to keep track of.
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
doc: fix typo in cluster
**What this PR does / why we need it**:
fix typo in cluster
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 54278, 56259, 56762). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add NODE_LOCAL_SSDS_EXT to config-test
**What this PR does / why we need it**:
Add NODE_LOCAL_SSDS_EXT to config-test so we can specify it for CI.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#57468
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 56386, 57204, 55692, 57107, 57177). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
GCE: bump COS image version to cos-stable-63-10032-71-0
```release-note
GCE: bump COS image version to cos-stable-63-10032-71-0
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add CoreDNS as an optional addon in kube-up
**What this PR does / why we need it**:
This PR adds the option of installing CoreDNS as an addon instead of kube-dns in kube-up.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#56439
**Special notes for your reviewer**:
**Release note**:
```release-note
kube-up: Add optional addon CoreDNS.
Install CoreDNS instead of kube-dns by setting CLUSTER_DNS_CORE_DNS value to 'true'.
```
Automatic merge from submit-queue (batch tested with PRs 54826, 53576, 55591, 54946, 54825). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Run nvidia-gpu device-plugin daemonset as an addon on GCE nodes that have nvidia GPUs attached
- Instead of the old `Accelerators` feature that added `alpha.kubernetes.io/nvidia-gpu` resource, use the new `DevicePlugins` feature that adds vendor specific resources. (In case of nvidia GPUs it will
add `nvidia.com/gpu` resource.)
- Add node label to GCE nodes with accelerators attached. This node label is the same as what GKE attaches to node pools with accelerators attached. (For example, for nvidia-tesla-p100 GPU, the label would be `cloud.google.com/gke-accelerator=nvidia-tesla-p100`) This will help us target accelerator specific
daemonsets etc. to these nodes.
- Run nvidia-gpu device-plugin daemonset as an addon on GCE nodes that have nvidia GPUs attached.
- Some minor documentation improvements in addon manager.
**Release note**:
```release-note
GCE nodes with NVIDIA GPUs attached now expose `nvidia.com/gpu` as a resource instead of `alpha.kubernetes.io/nvidia-gpu`.
```
/sig cluster-lifecycle
/sig scheduling
/area hw-accelerators
https://github.com/kubernetes/features/issues/368