Automatic merge from submit-queue
etcd3/store: watcher implementation
ref: https://github.com/kubernetes/kubernetes/issues/22448
This PR does:
- Provide a watcher that uses etcd v3 API to watch changes via etcd and process them based on existing logic of storage.Interface.Watch(), WatchList().
- By using the watcher, very trivial to implement Watch() and WatchList() in etcd3 storage.Interface implementation.
Automatic merge from submit-queue
shared controller informers
Related to https://github.com/kubernetes/kubernetes/issues/14978
This demonstrates how controllers which use an `Informer`, would be able to share the same watch and store. A similar "setup and run" approach could be done for an `IndexInformer` to share that cache. I found adding listeners here to be easier than intercepting at the watch interface (problems with resourceVersion) or the reflector (same plumbing, but you have to fan out to multiple stores).
We could also use the cache we build here to back several of the admission plugins that currently run their own lookup caches today.
If there's interest, I can finish out the `SharedInformer` and switch the low hanging fruit over.
@kubernetes/rh-cluster-infra @smarterclayton @liggitt @wojtek-t
Automatic merge from submit-queue
Fix PullImage and add corresponding node e2e test
Fixes#24101. This is a bug introduced by #23506, since ref #23563.
The root cause of #24101 is described [here](https://github.com/kubernetes/kubernetes/issues/24101#issuecomment-208547623).
This PR
1) Fixes#24101 by decoding the messages returned during pulling image, and return error if any of the messages contains error.
2) Add the node e2e test to detect this kind of failure.
3) Get present check out of `ConformanceImage.Remove()` and `ConformanceImage.Pull()`. Because sometimes we may expect error to occur in `PullImage()` and `RemoveImage()`, but even that doesn't happen, the `Present()` check will still return error and let the test pass.
@yujuhong @freehan @liangchenye
Also /cc @resouer, because he is doing the image related functions refactoring.
Automatic merge from submit-queue
Strip comments from configure-vm.sh for gce
We are getting very close to the 32KiB limit on GCE metadata entry length. We used to strip comments before putting the value in metadata, but I think we removed it in a refactor because it wasn't absolutely necessary, and leaving it out made the scripts slightly cleaner. It's close to being necessary again.
Removing comments reduces the size from 31,609B to 27,221B: https://www.diffchecker.com/0xmmecvw.
Automatic merge from submit-queue
kubenet: Load bridge netfilter module in Init().
This lets the kubenet loads the bridge netfilter module and set bridge-nf-call-iptables=1
Fix#24018
Follow up PRs would be appreciate if we also load the module in the bridge plugin binary itself. Ref https://github.com/kubernetes/kubernetes/issues/24018#issuecomment-207682514
cc @kubernetes/sig-node @sjpotter @euank
Automatic merge from submit-queue
Use correct defaults when binding apiserver flags
defaults should be set in the struct-creating function, then the current struct field value used as the default when binding the flag
Automatic merge from submit-queue
Expose SummaryProvider for reuse by other parts of kubelet
To support out of resource killing in the kubelet, we will introduce a new top-level module that will ensure node stability by checking if eviction thresholds have been met for memory and file-system usage on the node. In addition, it will then need information about pod memory and disk usage in order to make an eviction selection. Currently, this information is collected in `SummaryProvider` but it's hidden away and not available for re-use by other top-level modules of the kubelet. This initial refactor adds the ability to get summary stat information from the `ResourceAnalyzer` so it can be reused by other top-level modules.
I suspect we will further re-factor this area as code evolves, but this unblocks further progress on out-of-resource killing.
/cc @vishh @timothysc @kubernetes/sig-node @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
Use the first version as thirdparty resource preferredVersion
First commit is a one-liner, which implements the server-half of #23985.
The other two commits rearrange the test code, and add back a commented out test of thirdparty resource.
@lavalamp @nikhiljindal
Automatic merge from submit-queue
add HOME env variable for kube-addons service
Fix https://github.com/kubernetes/kubernetes/issues/23973.
Briefly, systemd service does not know the `HOME` environment variable which causes the kubectl write schema file into `/.kube` while it is expected to be `/root/.kube`.
Automatic merge from submit-queue
e2e_node: port privileged pod tests from test/e2e/priviliged.go
The ported test is functionally the same as the original test.
The main difference between the two tests is that the original test relies on
`kubectl` to exec into the container, while the latter directly uses the REST
client of the apiserver. This avoids the need to copy kubectl to the node under
test.
Automatic merge from submit-queue
Bump up etcd dependency to fix data race
ref: https://github.com/kubernetes/kubernetes/pull/23694
What this PR does
- Bumping up the godep of etcd to fix data race in etcd watcher. Without this change, watcher PR builds will fail in race detection.
- Small changes to fix builds after upgrade
Automatic merge from submit-queue
Add easy-rsa to hyperkube container
Otherwise gets downloaded a runtime, which kind of breaks the container model.
See [comment](https://github.com/kubernetes/kubernetes/issues/20514#issuecomment-195835786) in #20514 - this causes dockerized install of k8s to fail if you're behind a proxy. make-ca-cert.sh already looks for a local copy of easy-rsa.tar.gz before downloading it, so this drops the tarball in the expected place in the container.
Automatic merge from submit-queue
Add memory available to summary stats provider
To support out of resource killing when low on memory, we want to let operators specify eviction thresholds based on available memory instead of memory usage for ease of use when working with heterogeneous nodes.
So for example, a valid eviction threshold would be the following:
* If node.memory.available < 200Mi for 30s, then evict pod(s)
For the node, `memory.availableBytes` is always known since the `memory.limit_in_bytes` is always known for root cgroup. For individual containers in pods, we only populate the `availableBytes` if the container was launched with a memory limit specified. When no memory limit is specified, the cgroupfs sets a value of 1 << 63 in the `memory.limit_in_bytes` so we look for a similar max value to handle unbounded limits, and ignore setting `memory.availableBytes`.
FYI @vishh @timstclair - as discussed on Slack.
/cc @kubernetes/sig-node @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
Move /resetMetrics to DELETE /metrics
Reduces the surface area of the API server slightly and allows
downstream components to have deleteable metrics. After this change
genericapiserver will *not* have metrics unless the caller defines it
(allows different apiserver implementations to make that choice on their
own).
@wojtek-t
Automatic merge from submit-queue
Make etcd cache size configurable
Instead of the prior 50K limit, allow users to specify a more sensible size for their cluster.
I'm not sure what a sensible default is here. I'm still experimenting on my own clusters. 50 gives me a 270MB max footprint. 50K caused my apiserver to run out of memory as it exceeded >2GB. I believe that number is far too large for most people's use cases.
There are some other fundamental issues that I'm not addressing here:
- Old etcd items are cached and potentially never removed (it stores using modifiedIndex, and doesn't remove the old object when it gets updated)
- Cache isn't LRU, so there's no guarantee the cache remains hot. This makes its performance difficult to predict. More of an issue with a smaller cache size.
- 1.2 etcd entries seem to have a larger memory footprint (I never had an issue in 1.1, even though this cache existed there). I suspect that's due to image lists on the node status.
This is provided as a fix for #23323
Automatic merge from submit-queue
Kubelet: Refactor container related functions in DockerInterface
For #23563.
Based on #23506, will rebase after #23506 is merged.
The last 4 commits of this PR are new.
This PR refactors all container lifecycle related functions in DockerInterface, including:
* ListContainers
* InspectContainer
* CreateContainer
* StartContainer
* StopContainer
* RemoveContainer
@kubernetes/sig-node
Automatic merge from submit-queue
Add watch.Until, a conditional watch mechanism
A more powerful tool than wait.Poll, allows a watch interface to drive conditionals to react to changes on a resource or resources. Provide a set of standard conditions that are in common use in the code, and updates e2e to use a few of these.
Extracted from #23567
Automatic merge from submit-queue
the component status health check should check whether the scheme of backend storage url is https or not
fix https://github.com/kubernetes/kubernetes/issues/23897, when querying the component status of etcd (backend storage), the scheme of url is not checked and use `http` always, this commit aims to fix this.
Automatic merge from submit-queue
Flexvolume: Add support for multiple secrets
This PR adds support to pass multiple secrets for flexvolume plugins.
To allow multiple secrets, secrets are now passed as:
"kubernetes.io/secret/id-rsa":"value-2\r\n\r\n","kubernetes.io/secret/id-rsa.pub":"value-1\r\n"
Automatic merge from submit-queue
Fix expired event logic to use 404 instead of 500
It seems this logic was never updated once apiserver started returning 404s for expired (missing) events.
This change corrects it to use a 404 so events will get resent correctly if they were expired in etcd.
Fixes#23637.
Automatic merge from submit-queue
hack: specify --advertise-address in hack/local-up-cluster.sh
This fixes the bug where the script fails to launch an apiserver on a
machine without active networking (issue #24272).
Automatic merge from submit-queue
Fix spacing in usage_from_stdin and info_from_stdin (issue #24186).
If "a" is a bash array, then the syntax to append the contents of $line as a
new element to the array is a+=("$line"), not messages+=$line
Using the former syntax just seems to append to the first element, creating a
long string and thus losing newline information.
Fixing this allows us to drop some empty lines from invocations of
usage_from_stdin.
Automatic merge from submit-queue
Asynchronous bindings
This increases scheduler throughput with "trivial algorithm" (choose random node) by at least 5x.
Such optimization is necessary if we want to significantly improve scheduling throughput.
Fix#24192
@gmarek @kubernetes/sig-scalability @hongchaodeng
Automatic merge from submit-queue
add labels to kube component static pods
```
$ k --namespace=kube-system get po -l 'tier in (control-plane)'
NAME READY STATUS RESTARTS AGE
kube-apiserver-k-7-master 1/1 Running 2 1m
kube-controller-manager-k-7-master 1/1 Running 1 1m
kube-scheduler-k-7-master 1/1 Running 0 54s
$ k --namespace=kube-system get po -l 'tier in (node)'
NAME READY STATUS RESTARTS AGE
kube-proxy-k-7-minion-eheu 1/1 Running 0 1m
kube-proxy-k-7-minion-mwo9 1/1 Running 0 1m
kube-proxy-k-7-minion-xw6m 1/1 Running 0 1m
```
cc @bgrant0607 @thockin @gmarek
Fixes#21267
Automatic merge from submit-queue
add config-test.sh to cluster/centos so we can run e2e test on centos/fedora/rhel
so I can run e2e test on centos locally using the following command
```console
KUBERNETES_PROVIDER=centos KUBERNETES_CONFORMANCE_TEST=y ./cluster/test-e2e.sh
```