k3s/cluster/gce
Kubernetes Submit Queue 8e03228c1a
Merge pull request #64643 from dashpole/memcg_poll
Automatic merge from submit-queue (batch tested with PRs 64503, 64903, 64643, 64987). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Use unix.EpollWait to determine when memcg events are available to be Read

**What this PR does / why we need it**:
This fixes a file descriptor leak introduced in https://github.com/kubernetes/kubernetes/pull/60531 when the `--experimental-kernel-memcg-notification` kubelet flag is enabled.  The root of the issue is that `unix.Read` blocks indefinitely when reading from an event file descriptor and there is nothing to read.  Since we refresh the memcg notifications, these reads accumulate until the memcg threshold is crossed, at which time all reads complete.  However, if the node never comes under memory pressure, the node can run out of file descriptors.

This PR changes the eviction manager to use `unix.EpollWait` to wait, with a 10 second timeout, for events to be available on the eventfd.  We only read from the eventfd when there is an event available to be read, preventing an accumulation of `unix.Read` threads, and allowing the event file descriptors to be reclaimed by the kernel.

This PR also breaks the creation, and updating of the memcg threshold into separate portions, and performs creation before starting the periodic synchronize calls.  It also moves the logic of configuring memory thresholds into memory_threshold_notifier into a separate file.

This also reverts https://github.com/kubernetes/kubernetes/pull/64582, as the underlying leak that caused us to disable it for testing is fixed here.

Fixes #62808

**Release note**:
```release-note
NONE
```

/sig node
/kind bug
/priority critical-urgent
2018-06-11 17:29:19 -07:00
..
addons Use runtime/default as default seccomp profile for unprivileged PodSecurityPolicy 2018-05-15 09:39:37 -07:00
gci Merge pull request #64503 from kgolab/kg-ca-rbac 2018-06-11 17:29:13 -07:00
manifests Merge pull request #64503 from kgolab/kg-ca-rbac 2018-06-11 17:29:13 -07:00
BUILD Add unit test for configure-helper. 2018-04-23 12:18:57 -07:00
OWNERS Add jingax10 as both reviewer and approver in cluster/gce. 2018-02-21 22:11:32 -08:00
config-common.sh Use IP_ALIAS_SIZE to calculate and update IP_ALIAS_SIZE. Error added when ip-alias is not enabled when IP_ALIAS_SIZE is not empty. 2018-05-04 14:10:08 -07:00
config-default.sh Merge pull request #64592 from ravisantoshgudimetla/revert-64364-remove-rescheduler 2018-06-04 16:56:11 -07:00
config-test.sh re-enable memcg for testing on gce 2018-06-07 13:03:38 -07:00
cos Symlink cluster/gce/cos to cluster/gce/gci 2017-03-15 15:31:51 -07:00
custom add folder named custom in gce 2018-01-05 15:36:53 -08:00
delete-stranded-load-balancers.sh Update all script to use /usr/bin/env bash in shebang 2018-04-19 13:20:13 +02:00
list-resources.sh Update all script to use /usr/bin/env bash in shebang 2018-04-19 13:20:13 +02:00
ubuntu Makes cluster/gce/ubuntu to be a symlink to cluster/gce/gci and changes the gci's [master|node].yaml to enable kubernetes.target. 2017-04-25 16:19:00 -07:00
upgrade-aliases.sh Update all script to use /usr/bin/env bash in shebang 2018-04-19 13:20:13 +02:00
upgrade.sh Move uncordon to after the node is ready 2018-05-11 09:57:04 -07:00
util.sh remove deprecated option '--enable-custom-metrics' 2018-06-05 11:19:23 +08:00