Automatic merge from submit-queue (batch tested with PRs 61818, 61800). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add CRI container log format support back for elastic search.
The CRI container log format support was removed accidentally in https://github.com/kubernetes/kubernetes/pull/58525. This PR adds that back.
I've tested it, and it works:
```
SSSSS
------------------------------
[sig-instrumentation] Cluster level logging using Elasticsearch [Feature:Elasticsearch]
should check that logs from containers are ingested into Elasticsearch
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/instrumentation/logging/elasticsearch/basic.go:39
[BeforeEach] [sig-instrumentation] Cluster level logging using Elasticsearch [Feature:Elasticsearch]
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:141
STEP: Creating a kubernetes client
Mar 28 08:09:01.724: INFO: >>> kubeConfig: /home/lantaol/.kube/config
STEP: Building a namespace api object
Mar 28 08:09:02.952: INFO: No PodSecurityPolicies found; assuming PodSecurityPolicy is disabled.
STEP: Waiting for a default service account to be provisioned in namespace
[BeforeEach] [sig-instrumentation] Cluster level logging using Elasticsearch [Feature:Elasticsearch]
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/instrumentation/logging/elasticsearch/basic.go:32
[It] should check that logs from containers are ingested into Elasticsearch
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/instrumentation/logging/elasticsearch/basic.go:39
Mar 28 08:09:02.988: INFO: Checking the Elasticsearch service exists.
Mar 28 08:09:03.025: INFO: Checking to make sure the Elasticsearch pods are running
Mar 28 08:09:03.066: INFO: Checking to make sure we are talking to an Elasticsearch service.
Mar 28 08:09:03.176: INFO: Checking health of Elasticsearch service.
Mar 28 08:09:03.299: INFO: Starting repeating logging pod synthlogger
STEP: Waiting for logs to ingest
Mar 28 08:09:17.420: INFO: Sending a search request to Elasticsearch with the following query: kubernetes.pod_name:synthlogger AND kubernetes.namespace_name:e2e-tests-es-logging-pqlx7
Mar 28 08:09:27.420: INFO: Sending a search request to Elasticsearch with the following query: kubernetes.pod_name:synthlogger AND kubernetes.namespace_name:e2e-tests-es-logging-pqlx7
Mar 28 08:09:37.420: INFO: Sending a search request to Elasticsearch with the following query: kubernetes.pod_name:synthlogger AND kubernetes.namespace_name:e2e-tests-es-logging-pqlx7
Mar 28 08:09:47.420: INFO: Sending a search request to Elasticsearch with the following query: kubernetes.pod_name:synthlogger AND kubernetes.namespace_name:e2e-tests-es-logging-pqlx7
Mar 28 08:09:57.420: INFO: Sending a search request to Elasticsearch with the following query: kubernetes.pod_name:synthlogger AND kubernetes.namespace_name:e2e-tests-es-logging-pqlx7
Mar 28 08:10:07.420: INFO: Sending a search request to Elasticsearch with the following query: kubernetes.pod_name:synthlogger AND kubernetes.namespace_name:e2e-tests-es-logging-pqlx7
[AfterEach] [sig-instrumentation] Cluster level logging using Elasticsearch [Feature:Elasticsearch]
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:142
Mar 28 08:10:07.607: INFO: Waiting up to 3m0s for all (but 0) nodes to be ready
STEP: Destroying namespace "e2e-tests-es-logging-pqlx7" for this suite.
Mar 28 08:10:57.758: INFO: Waiting up to 30s for server preferred namespaced resources to be successfully discovered
Mar 28 08:11:00.046: INFO: namespace: e2e-tests-es-logging-pqlx7, resource: bindings, ignored listing per whitelist
Mar 28 08:11:00.338: INFO: namespace e2e-tests-es-logging-pqlx7 deletion completed in 52.693713026s
• [SLOW TEST:118.614 seconds]
[sig-instrumentation] Cluster level logging using Elasticsearch [Feature:Elasticsearch]
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/instrumentation/common/framework.go:23
should check that logs from containers are ingested into Elasticsearch
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/instrumentation/logging/elasticsearch/basic.go:39
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSMar 28 08:11:00.346: INFO: Running AfterSuite actions on all node
Mar 28 08:11:00.346: INFO: Running AfterSuite actions on node 1
Ran 1 of 845 Specs in 123.981 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 844 Skipped PASS
Ginkgo ran 1 suite in 2m4.323020647s
Test Suite Passed
2018/03/28 08:11:00 process.go:152: Step './hack/ginkgo-e2e.sh --ginkgo.focus=Cluster\slevel\slogging\susing\sElasticsearch' finished in 2m5.943972428s
2018/03/28 08:11:00 e2e.go:83: Done
```
Mark 1.10, because this is a regression for CRI container runtimes in 1.10.
The original support was added in 1.9. https://github.com/kubernetes/kubernetes/pull/54777
**Release note**:
```release-note
none
```
- Fix for kube-dns returns NXDOMAIN when not yet synced with
apiserver.
- Don't generate empty record for externalName service.
- Add validation for upstreamNameserver port.
- Update go version to 1.9.3.
Automatic merge from submit-queue (batch tested with PRs 60465, 61773, 61371, 61146). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Enable partial success in fluentd-gcp
Enable partial success in fluentd-gcp. This will allow to reduce amount of lost data in case of invalid (e.g. too big) entries: instead of dropping the whole request, only failed entries will be dropped.
```release-note
[fluentd-gcp addon] Partial success option is enabled in fluentd.
```
/assign @x13n
/cc @bmoyles0117
Automatic merge from submit-queue (batch tested with PRs 60465, 61773, 61371, 61146). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Adding resource constraints for fluentd-gcp
**What this PR does / why we need it**:
Adds resource constraints to `fluentd-gcp`. Values mostly lifted from `fluentd-es`, cpu cap set to a sensible value after reviewing various threads.
**Which issue(s) this PR fixes**
Fixes#55416
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 61452, 61727, 61462, 61692, 61738). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update event-exporter image
This is a follow-up of https://github.com/GoogleCloudPlatform/k8s-stackdriver/pull/126 to apply the latest patch to the base image of event-exporter.
```release-note
[fluentd-gcp addon] Update event-exporter image to have the latest base image.
```
/assign @x13n
Could you please take a look?
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update GCP fluentd configmap for COS audit logging on GKE node
**What this PR does / why we need it**:
This PR adds a placeholder in fluentd configmap for COS audit logging on GKE node.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
NONE
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 61396, 61321, 61443, 60911, 61461). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Bump Heapster to v1.5.2
**What this PR does / why we need it**:
Bump Heapster to v1.5.2
**Release note**:
```release-note
Bump Heapster to v1.5.2
```
Automatic merge from submit-queue (batch tested with PRs 61354, 61366, 61386, 61394, 60755). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove 'system' prefix from Metadata Agent rbac configuration
**What this PR does / why we need it**:
Remove 'system' prefix from Metadata Agent rbac configuration.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add Troubleshooting sections to Heapster and Metrics Server addons documentation
**What this PR does / why we need it**:
Add Troubleshooting sections to Heapster and Metrics Server addons documentation
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Added network-unavailable tolerations when hostNetwork=true.
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#61050
**Release note**:
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 60722, 61269). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Bump fluentd-gcp-scaler version
**What this PR does / why we need it**:
This version fixes a bug in which scaler was setting resources for all containers in the pod, not only fluentd-gcp one.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#60763
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Remove mapping to /host/lib from fluentd-gcp container.
**What this PR does / why we need it**:
This mapping is no longer needed since fluentd-gcp v2.0.16, in which it started using a container image based on Debian Stretch, in which the systemd libraries already include support for all the supported
compression algorithms.
The `/run.sh` in the image no longer accesses `/host/lib` anyways, so let's stop mapping it here.
Related changes:
- fluentd-gcp on GoogleCloudPlatform/k8s-stackdriver#101
- fluentd-es on GoogleCloudPlatform/google-fluentd#80
/assign @timstclair
/cc @crassirostris @bmoyles0117
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
N/A
**Special notes for your reviewer**:
N/A
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 60737, 60739, 61080, 60968, 60951). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Find most recent modified date for fluentd buffers recursively.
Fixes#60762
**What this PR does / why we need it**:
Due to updates in Fluent v0.14, the buffers directory modified date is no
longer updated when files inside the directory are changed. Therefore we
must find the most recent modified date recursively to fix liveness probe.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Bump to etcd 3.1.12 to pick up critical fix
etcd [3.1.12](https://github.com/coreos/etcd/releases/tag/v3.1.12) (as well as 3.2.17 and 3.3.2) was released yesterday to fix a bug critical to kubernetes:
Fix [mvcc "unsynced" watcher restore operation](https://github.com/coreos/etcd/pull/9297).
- "unsynced" watcher is watcher that needs to be in sync with events that have happened.
- That is, "unsynced" watcher is the slow watcher that was requested on old revision.
- "unsynced" watcher restore operation was not correctly populating its underlying watcher group.
- Which possibly causes [missing events from "unsynced" watchers](https://github.com/coreos/etcd/issues/9086).
This will be backported to 1.9 as well.
Release note:
```release-note
Upgrade the default etcd server version to 3.1.12 to pick up critical etcd "mvcc "unsynced" watcher restore operation" fix.
```
cc @gyuho @wojtek-t @shyamjvs @timothysc @jdumars
Due to updates in Fluent v0.14, the buffers directory modified date is no
longer updated when files inside the directory are changed. Therefore we
must find the most recent modified date recursively to fix liveness probe.
Automatic merge from submit-queue (batch tested with PRs 60891, 60935). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Rollback etcd server version to 3.1.11 due to #60589
Ref https://github.com/kubernetes/kubernetes/issues/60589#issuecomment-371171837
The dependencies were a bit complex (so many things relying on it) + the version was updated to 3.2.16 on top of the original bump.
So I had to mostly make manual reverting changes on a case-by-case basis - so likely to have errors :)
/cc @wojtek-t @jpbetz
```release-note
Downgrade default etcd server version to 3.1.11 due to #60589
```
(I'm not sure if we should instead remove release-notes of the original PRs)
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Reverting the anti-affinity from CoreDNS pods
**What this PR does / why we need it**:
Following #54164 and #59357, removing the anti-affinity from CoreDNS.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
This mapping is no longer needed since fluentd-gcp v2.0.16, in which it
started using a container image based on Debian Stretch, in which the
systemd libraries already include support for all the supported
compression algorithms.
The /run.sh in the image no longer accesses /host/lib anyways, so let's
stop mapping it here.
Related changes:
- fluentd-gcp on GoogleCloudPlatform/k8s-stackdriver#101
- fluentd-es on GoogleCloudPlatform/google-fluentd#80
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Update gke nvidia-gpu-device-plugin to the latest version that supports
both v1alpha and v1beta1 device plugin versions.
Re-enables nvidia-gpus e2e test after verifying the test passes now.
**What this PR does / why we need it**:
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 60433, 59982, 59128, 60243, 60440). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
[fluentd-gcp addon] Update to use Stackdriver Agent image.
Update the fluentd DaemonSet to use the Stackdriver Logging Agent container image.
The Stackdriver Logging Agent container image uses fluentd v0.14.25.
We add a special label to each log record as a signal to logging backends to handle both new and legacy resource types.
**Release note:**
```release-note
[fluentd-gcp addon] Switch to the image, provided by Stackdriver.
```