github/k3s - k3s - https://git.xinac.net

Commit Graph

Author	SHA1	Message	Date
Marek Siarkowicz	9e9b906047	Update gcp images with security patches [stackdriver addon] Bump prometheus-to-sd to v0.5.0 to pick up security fixes. [fluentd-gcp addon] Bump fluentd-gcp-scaler to v0.5.1 to pick up security fixes. [fluentd-gcp addon] Bump event-exporter to v0.2.4 to pick up security fixes. [fluentd-gcp addon] Bump prometheus-to-sd to v0.5.0 to pick up security fixes. [metatada-proxy addon] Bump prometheus-to-sd v0.5.0 to pick up security fixes.	2019-03-15 09:24:32 +01:00
Kubernetes Prow Robot	45e5f6053b	Merge pull request #74424 from liggitt/drop-k8s-io-node-labels Clean up self-set node labels	2019-03-06 08:24:26 -08:00
Jordan Liggitt	8975233788	Finish migration of fluentd to daemonset	2019-02-26 11:42:23 -05:00
Florent Delannoy	e627474e8f	Fix fluentd-gcp addon liveness probe Fix three issues with the fluentd-gcp liveness probe: h1. STUCK_THRESHOLD_SECONDS was overridden by LIVENESS_THRESHOLD_SECONDS if defined Probably a copy/paste issue introduced in `edf1ffc074` h1. `[[` is [a bashism](https://stackoverflow.com/a/47576482), and will always failed when called with `/bin/sh` Introduced by `a844523c20` Given that we call the liveness probe with `/bin/sh`, we cannot use the double-bracketed `[[` syntax for test, as it is not POSIX-compliant and will throw an error. Annoyingly, even through it prints an error, `sh` returns with exit code 0 in this case: ```bash root@fluentd-7mprs:/# sh liveness.sh liveness.sh: 8: liveness.sh: [[: not found liveness.sh: 15: liveness.sh: [[: not found root@fluentd-7mprs:/# echo $? 0 ``` Which means the liveness probe is considered successful by Kubernetes, despite failing to test things as it was intended. This is also probably the reason why this bug wasn't reported sooner :) Thankfully, the test in this case can just as easily be written as POSIX-compliant as it doesn't use any bash-specific features within the `[[` block. h1. Buffers are transient and cannot be relied upon for monitoring Finally, after fixing the above issue, we started seeing the fluentd containers being restarted very often, and found an issue with the underlying logic of the liveness probe. The probe checks that the pod is still alive by running the following command: `find /var/log/fluentd-buffers -type f -newer /tmp/marker-stuck -print -quit` This checks if any _regular_ file exists under `/var/log/fluentd-buffers` that is more recent than a predetermined time, and will return an empty string otherwise. The issue is that these buffers are temporary and volatile, they get created and deleted constantly. Here is an example of running that check every second on a running fluentd: ``` root@fluentd-eks-playground-jdc8m:/# LIVENESS_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-300}; root@fluentd-eks-playground-jdc8m:/# STUCK_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-900}; root@fluentd-eks-playground-jdc8m:/# touch -d "${STUCK_THRESHOLD_SECONDS} seconds ago" /tmp/marker-stuck; root@fluentd-eks-playground-jdc8m:/# touch -d "${LIVENESS_THRESHOLD_SECONDS} seconds ago" /tmp/marker-liveness; root@fluentd-eks-playground-jdc8m:/# while true; do date ; find /var/log/fluentd-buffers -type f -newer /tmp/marker-stuck -print -quit ; sleep 1 ; done Fri Feb 22 10:52:57 UTC 2019 Fri Feb 22 10:52:58 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964ccf4c7004103c3fa7c8533f85.log Fri Feb 22 10:52:59 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964ccf4c7004103c3fa7c8533f85.log Fri Feb 22 10:53:00 UTC 2019 Fri Feb 22 10:53:01 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964fb8b2eedcccd2763ea7775cc2.log Fri Feb 22 10:53:02 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964fb8b2eedcccd2763ea7775cc2.log Fri Feb 22 10:53:03 UTC 2019 Fri Feb 22 10:53:04 UTC 2019 Fri Feb 22 10:53:05 UTC 2019 Fri Feb 22 10:53:06 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827965564883997b673d703af54848b.log Fri Feb 22 10:53:07 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827965564883997b673d703af54848b.log Fri Feb 22 10:53:08 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827965564883997b673d703af54848b.log Fri Feb 22 10:53:09 UTC 2019 Fri Feb 22 10:53:10 UTC 2019 Fri Feb 22 10:53:11 UTC 2019 Fri Feb 22 10:53:12 UTC 2019 Fri Feb 22 10:53:13 UTC 2019 Fri Feb 22 10:53:14 UTC 2019 Fri Feb 22 10:53:15 UTC 2019 Fri Feb 22 10:53:16 UTC 2019 ``` We can see buffers being created, then disappearing. The LivenessProbe running under these conditions has a ~50% chance of failing, despite fluentd being perfectly happy. I believe that check is probably ok for fluentd installs using large amounts of buffers, in which case the liveness probe will be correct more often than not, but fluentd installs that use buffering less intensively will be negatively impacted by this. My solution to fix this is to check the last updated time of buffering _folders_ within `/var/log/fluentd_buffers`. These _do_ get updated when buffers are created, and do not get deleted as buffers are emptied, making them the perfect candidate for our use. Here's an example with the `-d` flag for directories: ``` root@fluentd-eks-playground-jdc8m:/# while true; do date ; find /var/log/fluentd-buffers -type d -newer /tmp/marker-stuck -print -quit ; sleep 1 ; done Fri Feb 22 10:57:51 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer Fri Feb 22 10:57:52 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer Fri Feb 22 10:57:53 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer Fri Feb 22 10:57:54 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer Fri Feb 22 10:57:55 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer Fri Feb 22 10:57:56 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer Fri Feb 22 10:57:57 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer Fri Feb 22 10:57:58 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer Fri Feb 22 10:57:59 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer Fri Feb 22 10:58:00 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer Fri Feb 22 10:58:01 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer Fri Feb 22 10:58:02 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer Fri Feb 22 10:58:03 UTC 2019 /var/log/fluentd-buffers/kubernetes.system.buffer ``` And example of the directory being updated as new buffers come in: ``` root@fluentd-eks-playground-jdc8m:/# ls -lah /var/log/fluentd-buffers/kubernetes.system.buffer total 0 drwxr-xr-x 2 root root 6 Feb 22 11:17 . drwxr-xr-x 3 root root 38 Feb 22 11:14 .. root@fluentd-eks-playground-jdc8m:/# ls -lah /var/log/fluentd-buffers/kubernetes.system.buffer total 16K drwxr-xr-x 2 root root 224 Feb 22 11:18 . drwxr-xr-x 3 root root 38 Feb 22 11:14 .. -rw-r--r-- 1 root root 1.8K Feb 22 11:18 buffer.b58279be6e21e8b29fc333a7d50096ed0.log -rw-r--r-- 1 root root 215 Feb 22 11:18 buffer.b58279be6e21e8b29fc333a7d50096ed0.log.meta -rw-r--r-- 1 root root 429 Feb 22 11:18 buffer.b58279be6f09bdfe047a96486a525ece2.log -rw-r--r-- 1 root root 195 Feb 22 11:18 buffer.b58279be6f09bdfe047a96486a525ece2.log.meta root@fluentd-eks-playground-jdc8m:/# ls -lah /var/log/fluentd-buffers/kubernetes.system.buffer total 0 drwxr-xr-x 2 root root 6 Feb 22 11:18 . drwxr-xr-x 3 root root 38 Feb 22 11:14 .. ```	2019-02-25 11:48:31 +00:00
Roy Lenferink	b43c04452f	Updated OWNERS files to include link to docs	2019-02-04 22:33:12 +01:00
Yu-Ju Hong	9c892243f6	GCE: update addon DaemonSets to select node OS These DaemonSets supports only Linux today, so this change updates the specs to reflect this limitation. The labels have recently been promoted to GA. Using the beta labels for now until node-master version skew problem no longer exists.	2019-01-23 09:01:40 -08:00
Kubernetes Prow Robot	a938f8b25e	Merge pull request #72243 from cezarygerard/patch-1 [GCP] Update scaler-deployment.yaml CPU_LIMITS	2019-01-05 05:08:15 -08:00
Jordan Liggitt	d2c1fdbcfa	Fixup apps/v1 addon manifests	2018-12-26 15:19:01 -05:00
Cezary Zawadka	1affe568e9	replace single quotes with double quotes in yaml	2018-12-20 15:23:41 +01:00
Jordan Liggitt	cc680273e8	Change add-on manifests to apps/v1	2018-12-19 17:30:59 -05:00
Cezary Zawadka	7b3946776c	Update scaler-deployment.yaml CPU_LIMITS setting CPU_LIMITS to '1' fixes the following log appearing every 60 seconds: Running: kubectl set resources -n kube-system ds fluentd-gcp-v3.1.0 -c fluentd-gcp --requests=cpu=100m,memory=200Mi --limits=cpu=1000m,memory=500Mi error: info: {extensions v1beta1 daemonsets} "fluentd-gcp-v3.1.0" was not changed this PR does not change scaler's behaviour, pods are scaled correctly despite error in the logs	2018-12-19 21:00:22 +01:00
k8s-ci-robot	396271cf52	Merge pull request #70954 from qingling128/master Upgrade Stackdriver Logging Agent addon image to 0.6-1.6.0-1 to use Fluentd v1.2.	2018-11-25 23:09:07 -08:00
k8s-ci-robot	a19bf332de	Merge pull request #71124 from Random-Liu/make-fluentd-container-runtime-service-configurable Make fluentd container runtime service configurable.	2018-11-21 07:49:42 -08:00
Mike Danese	98c468de8d	update PSPs to allow projected volumes	2018-11-16 19:32:44 +00:00
Lantao Liu	1670b4089a	Make fluentd container runtime service configurable.	2018-11-16 02:17:55 -08:00
Ling Huang	02b7ed3291	Upgrade Stackdriver Logging Agent addon image to 0.6-1.6.0-1 to use Fluentd v1.2.	2018-11-12 13:21:44 -05:00
Ling Huang	85d8b5069b	Add tolerations for Stackdriver Logging and Metadata Agents.	2018-10-12 11:15:33 -04:00
k8s-ci-robot	1aef63124b	Merge pull request #68920 from qingling128/master Enable insertId generation, and update Stackdriver Logging Agent image to 0.5-1.5.36-1-k8s.	2018-10-11 13:44:51 -07:00
Ling Huang	d8da1baf48	Enable insertId generation, update Stackdriver Logging Agent image to 0.5-1.5.36-1-k8s and add priorityClassName for Metadata Agent.	2018-10-09 13:42:40 -04:00
Daniel Kłobuszewski	9454876318	Bump version of fluentd-gcp-scaler	2018-09-19 17:15:05 +02:00
Kubernetes Submit Queue	e2d6362c09	Merge pull request #67691 from loburm/security_fixes Automatic merge from submit-queue (batch tested with PRs 67691, 68147). If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md. Bump versions of components with latest security patches. What this PR does / why we need it: Upgrade versions of monitoring components used on GCP, to include latest security patches. Release note: ```release-note [fluentd-gcp-scaler addon] Bump fluentd-gcp-scaler to 0.4 to pick up security fixes. [prometheus-to-sd addon] Bump prometheus-to-sd to 0.3.1 to pick up security fixes, bug fixes and new features. [event-exporter addon] Bump event-exporter to 0.2.3 to pick up security fixes. ```	2018-09-05 09:49:31 -07:00
Kubernetes Submit Queue	888546c325	Merge pull request #68029 from neolit123/fluentd-owners Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md. cluster/addons: add labels to fluentd owner files What this PR does / why we need it: this PR adds SIG labels to fluentd OWNER files: - cluster/addons/fluentd-elasticsearch/OWNERS - cluster/addons/fluentd-gcp/OWNERS Which issue(s) this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged): Fixes # Special notes for your reviewer: let me know if the labels need adjustment. Release note: ```release-note NONE ``` /assign @roberthbailey @mikedanese /cc @timothysc /sig gcp /sig instrumentation /kind cleanup	2018-09-02 12:51:38 -07:00
Arnold Szederjesi	fcdef3ffcc	Put fluentd back to host network	2018-08-30 10:44:04 +02:00
Lubomir I. Ivanov	aefb5b3c0e	cluser/addons: add labels to fluentd owner files	2018-08-30 00:38:08 +03:00
Marian Lobur	ffa934a939	Bump versions of components with latest security patches.	2018-08-22 11:27:36 +02:00
liangwei	5ea138f4e9	remove rescheduler	2018-08-22 11:49:14 +08:00
Karol Wychowaniec	d5b32d8830	Fix parameter for fluentd-gcp-scaler	2018-08-16 16:18:51 +02:00
Bryan Moyles	32c2bfadfd	A large set of improvements to the Stackdriver components. Metadata Agent Improvements Bump metadata agent version to 0.2-0.0.21-1. Expand the metadata agent's access to all API groups. Remove metadata agent config maps in favor of command line flags. Update the metadata agent's liveness probe to a new /healthz handler. Logging Agent Improvements Bump logging agent version to 0.2-1.5.33-1-k8s-1. Appropriately set log severity for k8s_container. Fix detect exceptions plugin to analyze message field instead of log field. Fix detect exceptions plugin to analyze streams based on local resource id. Disable the metadata agent for monitored resource construction in logging. Disable timestamp adjustment in logs to optimize performance. Reduce logging agent buffer chunk limit to 512k to optimize performance.	2018-08-06 11:26:35 -04:00
Marian Lobur	3b8dfb38bb	Bump version of event-exporter.	2018-07-13 13:20:58 +02:00
Robert Jacob	8f340c6c6a	Use correct field for exception detection.	2018-06-22 12:58:41 +02:00
Daniel Kłobuszewski	7773f8f5eb	Increase fluentd-gcp grace termination period to 1min By default, all pods have 30s for graceful termination. This gives fluentd additional 30s to export logs when the node is shutting down.	2018-06-14 10:44:13 +02:00
RaviSantosh Gudimetla	872addf9e3	Revert "Remove rescheduler and corresponding tests from master"	2018-05-31 22:18:49 -04:00
ravisantoshgudimetla	aeccffc339	Phase out rescheduler in favor of priority and preemption	2018-05-29 19:52:06 -04:00
Kubernetes Submit Queue	f105ae3e6d	Merge pull request #63918 from cezarygerard/sd-event-exporter Automatic merge from submit-queue (batch tested with PRs 63569, 63918, 63980, 63295, 63989). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. New event exporter config with support for new stackdriver resources New event exporter, with support for use new and old stackdriver resource model. This should also be cherry-picked to release-1.10 branch, as all fluentd-gcp components support new and stackdriver resource model. ```release-note Update event-exporter to version v0.2.0 that supports old (gke_container/gce_instance) and new (k8s_container/k8s_node/k8s_pod) stackdriver resources. ```	2018-05-18 09:54:16 -07:00
Cezary Zawadka	d611aeac80	new event exporter config with support for new stackdriver resource types	2018-05-18 10:37:47 +02:00
Zhen Wang	6351e25203	Use runtime/default as default seccomp profile for unprivileged PodSecurityPolicy	2018-05-15 09:39:37 -07:00
Kubernetes Submit Queue	b617748f7b	Merge pull request #62905 from serathius/event-exporter-region Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. [fluentd-gcp addon] Pass region in seperate field This PR makes location passed to event-exporter based on `MULTIZONE` env. Fixes https://github.com/kubernetes/kubernetes/issues/62399 ```release-note NONE ``` /cc @loburm	2018-05-11 06:00:44 -07:00
Marek Siarkowicz	f351b00a99	[fluentd-gcp addon] Pass region in seperate field	2018-05-11 09:50:07 +02:00
Bryan Moyles	a0a7686e38	Use the logging agent's node name as the metadata agent URL.	2018-05-02 10:12:35 +02:00
Slava Semushin	044bf2e415	Update addon manifests to use policy/v1beta1 and grant permissions in policy API group.	2018-04-17 14:56:55 +02:00
Bryan Moyles	19f14ad8e2	Increase CPU limit to 1000 millicores to support 100kb/s throughput.	2018-04-11 18:08:53 -04:00
Kubernetes Submit Queue	1efd5f9456	Merge pull request #62198 from thockin/gcr-vanity Automatic merge from submit-queue (batch tested with PRs 61918, 62180, 62198). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Pass 2: k8s GCR vanity URL Also push out the old URL deprecation since we have not started the community transition yet and there are some instances of it still floating about. ```release-note NONE ```	2018-04-06 11:56:10 -07:00
Kubernetes Submit Queue	4009cb3b8b	Merge pull request #62076 from qingling128/master Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add support to ingest log entries to Stackdriver against new "k8s_container" and "k8s_node" resources. What this PR does / why we need it: Which issue(s) this PR fixes Fluentd 0.14 has some memory leak issues that caused the e2e tests to be flaky. Downgrading to v0.12. Special notes for your reviewer: We never released any previous version with Fluentd v0.14. Only upgraded it very recently. So this downgrading is not visible to users. Release note: ```release-note Add support to ingest log entries to Stackdriver against new "k8s_container" and "k8s_node" resources. ```	2018-04-06 09:51:32 -07:00
Tim Hockin	89ceb7ef46	Pass 2: k8s GCR vanity URL	2018-04-06 08:14:58 -07:00
Ling Huang	cbec62ada4	Add support to ingest log entries to Stackdriver against new "k8s_container" and "k8s_node" resources.	2018-04-06 08:47:19 -04:00
Mikhail Vyatskov	300fe8f179	Remove crassirostris from owners and reviewers Signed-off-by: Mikhail Vyatskov <crassirostris@yandex.com>	2018-04-04 18:36:44 +02:00
Mik Vyatskov	d6cef02a9d	Revert "Enable partial success in fluentd-gcp"	2018-03-29 11:48:01 +02:00
Kubernetes Submit Queue	70463ec4e2	Merge pull request #61773 from crassirostris/fluentd-partial-success Automatic merge from submit-queue (batch tested with PRs 60465, 61773, 61371, 61146). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Enable partial success in fluentd-gcp Enable partial success in fluentd-gcp. This will allow to reduce amount of lost data in case of invalid (e.g. too big) entries: instead of dropping the whole request, only failed entries will be dropped. ```release-note [fluentd-gcp addon] Partial success option is enabled in fluentd. ``` /assign @x13n /cc @bmoyles0117	2018-03-28 01:34:48 -07:00
Kubernetes Submit Queue	943f8e8231	Merge pull request #60465 from frapposelli/GH-55416 Automatic merge from submit-queue (batch tested with PRs 60465, 61773, 61371, 61146). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Adding resource constraints for fluentd-gcp What this PR does / why we need it: Adds resource constraints to `fluentd-gcp`. Values mostly lifted from `fluentd-es`, cpu cap set to a sensible value after reviewing various threads. Which issue(s) this PR fixes Fixes #55416 Special notes for your reviewer: Release note: ```release-note NONE ```	2018-03-28 01:34:45 -07:00
Kubernetes Submit Queue	cc859a8624	Merge pull request #61727 from crassirostris/update-event-exporter Automatic merge from submit-queue (batch tested with PRs 61452, 61727, 61462, 61692, 61738). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Update event-exporter image This is a follow-up of https://github.com/GoogleCloudPlatform/k8s-stackdriver/pull/126 to apply the latest patch to the base image of event-exporter. ```release-note [fluentd-gcp addon] Update event-exporter image to have the latest base image. ``` /assign @x13n Could you please take a look?	2018-03-27 09:47:11 -07:00

1 2 3 4 5 ...

256 Commits (92a2076149c9b6f7fef4baf6be47bf843c110f6a)