Automatic merge from submit-queue
Refactor image related functions to use docker engine-api
ref #23563
Hopes can do some help, cc @Random-Liu
If it's ok, will add more work here.
Automatic merge from submit-queue
rkt: Return `FinishedAt` for pod
This is implemented via touching a file on stop as a hook in the systemd
unit. The ctime of this file is then used to get the `finishedAt` time
in the future.
In addition, this changes the `startedAt` and `createdAt` to use the api
server's results rather than the annotations it previously used.
It's possible we might want to move this into the api in the future.
Fixes#23887
I did the following manual testing:
```
$ cat ./examples/output/exit-output.yml
apiVersion: v1
kind: Pod
metadata:
labels:
name: exit
name: exit-output
spec:
restartPolicy: Never
containers:
- name: exit
image: busybox
command: ["sh", "-c", "echo Exiting in 60; sleep 60; echo goodbye"]
$ kubectl create -f ./examples/exit/exit-output.yaml
$ # wait
$ kubectl describe pod exit-output | grep State -A 4
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 19 Apr 2016 13:23:13 -0700
Finished: Tue, 19 Apr 2016 13:24:13 -0700
$ kubectl logs exit-output
Exiting in 60
goodbye
```
I double checked as well that the file at `/var/lib/kubelet/pods/$id/finished-$id` existed and looked as expected.
This is related to https://github.com/coreos/rkt/issues/1789#issuecomment-207111814 and follows https://github.com/kubernetes/kubernetes/pull/24367 + https://github.com/coreos/rkt/issues/2445
cc @jonboulle @iaguis @yifan-gu @kubernetes/sig-node
Automatic merge from submit-queue
stop changing the root path of the root webservice
We shouldn't mutate the root path of the root webservice (see usage). Just write the path we want.
Automatic merge from submit-queue
update controllers watching all pods to share an informer
This plumbs the shared pod informer through the various controllers to avoid duplicated watches.
Automatic merge from submit-queue
let admission plugins indicate they want nothing
An admission plugin can return `nil, nil` for construction. This is useful for dealing with cases where the `config` passed to you effectively means, "no work". The calling code already handles this.
@derekwaynecarr
Automatic merge from submit-queue
Enforce --max-pods in kubelet admission; previously was only enforced in scheduler
This is an ugly hack - I spent some time trying to understand what one NodeInfo has in common with the other one, but at some point decided that I just don't have time to do that.
Fixes#24262Fixes#20263
cc @HaiyangDING @lavalamp
This adds a poll-and-timeout procedure after the pod is
started, to make sure the post-start hooks execute when the
container is actually running.
This is a temporal workaround for implementing post-hooks,
a long term solution is to use lifecycle event to trigger
those hooks, see https://github.com/kubernetes/kubernetes/issues/23084.
Also this fixes a bug of getting container ID for a non-running
container when running pre-stop hook.
This is implemented via touching a file on stop as a hook in the systemd
unit. The ctime of this file is then used to get the `finishedAt` time
in the future.
In addition, this changes the `startedAt` and `createdAt` to use the api
server's results rather than the annotations it previously used.
It's possible we might want to move this into the api in the future.
Fixes#23887
Add tests to watch behavior in both protocols (http and websocket)
against all 3 media types. Adopt the
`application/vnd.kubernetes.protobuf;stream=watch` media type for the
content that comes back from a watch call so that it can be
distinguished from a Status result.
Automatic merge from submit-queue
Default conversion for byte slices is incorrect
Nil slices are getting allocated, which is incorrect and changes
behavior in some cases.
[]byte(nil) -> []byte(nil)
@wojtek-t
Automatic merge from submit-queue
All clients under ClientSet share one RateLimiter.
Currently we create a rate limiter for each client in client set. It makes the reasoning about rate limiting behavior much harder. This PR changes this behavior and now all clients in the set share single rate limiter. Ref. #24157
cc @lavalamp @wojtek-t
Automatic merge from submit-queue
Kubelet: Refactor all but image related functions in DockerInterface
For #23563.
Based on #23699 and #23844.
Only last 3 commits are new. This PR refactored all functions except image related functions, including:
* CreateExec
* StartExec
* InspectExec
* AttachToContainer
* Logs
* Info
* Version
@kubernetes/sig-node
Automatic merge from submit-queue
Client auth provider plugin framework
Allows client plugins to modify the underlying transport to, for example, add custom authorization headers.
Automatic merge from submit-queue
Rackspace improvements (OpenStack Cinder)
This adds PV support via Cinder on Rackspace clusters. Rackspace Cloud Block Storage is pretty much vanilla OpenStack Cinder, so there is no need for a separate Volume Plugin. Instead I refactored the Cinder/OpenStack interaction a bit (by introducing a CinderProvider Interface and moving the device path detection logic to the OpenStack part).
Right now this is limited to `AttachDisk` and `DetachDisk`. Creation and deletion of Block Storage is not in scope of this PR.
Also the `ExternalID` and `InstanceID` cloud provider methods have been implemented for Rackspace.
Automatic merge from submit-queue
Add mpio support for iscsi
This allows the iscsi volume to check if a iscsi device belongs to a mpio device
If it does belong to the device then we make sure we mount the mpio device instead of
the raw device.
The code is based on the current FibreChannel volume support for mpio
example
/dev/disk/by-path/iqn-example.com.2999 -> /dev/sde
Then we check
/sys/block/[dm-X]/slaves/xx
until we find the [dm-X] containing /dev/sde and mount it
Additional work that can be done in future
1. Add multiple portal support to iscsi
2. Move the FibreChannel volume provider to use the code that has been extracted
Automatic merge from submit-queue
docker daemon complains SHM size must be greater than 0
Fixes https://github.com/kubernetes/kubernetes/issues/24588
I am hitting this on Fedora 23 w/ docker 1.9.1 using systemd cgroup-driver.
```
$ docker version
Client:
Version: 1.9.1
API version: 1.21
Package version: docker-1.9.1-9.gitee06d03.fc23.x86_64
Go version: go1.5.3
Git commit: ee06d03/1.9.1
Built:
OS/Arch: linux/amd64
Server:
Version: 1.9.1
API version: 1.21
Package version: docker-1.9.1-9.gitee06d03.fc23.x86_64
Go version: go1.5.3
Git commit: ee06d03/1.9.1
Built:
OS/Arch: linux/amd64
```
Not sure why I am on the only one hitting it right now, but putting this out here for comment.
/cc @kubernetes/sig-node @kubernetes/rh-cluster-infra @smarterclayton
Automatic merge from submit-queue
Make fake client actions use fully qualified resource
The output of a versioned clientset is version object. The fake client used to assume only internal objects will be returned. This PR removes this assumption by making fake actions initialized with a fully qualified resource instead of a resource string.
We have to regenerate fake clients in release_1_2 clientset to let it compile. For the test fakes, we are breaking the backwards compatibility promise.
Part of #24155.
Automatic merge from submit-queue
make storage enablement, serialization, and location orthogonal
This allows a caller (command-line, config, code) to specify multiple separate pieces of config information regarding storage and have them properly composed at runtime. The information provided is exposed through interfaces to allow alternate implementations, which allows us to change the expression of the config moving forward. I also fixed up the types to be correct as I moved through.
The same options still exist, but they're composed slightly differently
1. specify target etcd servers per Group or per GroupResource
1. specify storage GroupVersions per Groups or per GroupResource
1. specify etcd prefixes per GroupVersion or per GroupResource
1. specify that multiple GroupResources share the same location in etcd
1. enable GroupResources by GroupVersion or by GroupResource whitelist or GroupResource blacklist
The `storage.Interface` is built per GroupResource by:
1. find the set of possible storage GroupResource based on the priority list of cohabitators
1. choose a GroupResource from the set by looking at which Groups have the resource enabled
1. find the target etcd server, etcd prefix, and storage encoding based on the GroupResource
The API server can have its resources separately enabled, but for now I've kept them linked.
@liggitt I think we need this (or something like it) to be able to go from config to these interfaces. Given another round of refactoring, we may be able to reshape these to be more forward driving.
@smarterclayton this is important for rebasing and for a seamless 1.2 to 1.3 migration for us.
Automatic merge from submit-queue
etcd3 store: provide compactor util
What's this PR?
- Provides a util to compact keys in etcd.
Reason:
We want to save the most recent 10 minutes event history. It should be more than enough for slow watchers. It is not number based, so it can tolerate event bursts too. We do not want to save longer since the current storage API cannot take advantage of the multi-version key yet. We might keep a longer history in the future.
Automatic merge from submit-queue
Remove requirement that Endpoints IPs be IPv4
Signed-off-by: André Martins <aanm90@gmail.com>
Release Note: The `Endpoints` API object now allows IPv6 addresses to be stored. Other components of the system are not ready for IPv6 yet, and many cloud providers are not IPv6 compatible, but installations that use their own controller logic can now store v6 endpoints.
Automatic merge from submit-queue
Enable protobuf compilation by default
Enables protobuf compilation, build verification checks, and generates all initial code.
kubectl is now 47M on OSX, build time from clean on a 2014 MBP (4 core) on Go 1.6 is ~150s.
@wojtek-t
Automatic merge from submit-queue
etcd3/store: support TTL in Create, Update
ref: https://github.com/kubernetes/kubernetes/issues/22448
What's included in the PR?
- Support TTL keys in Create() and Update()
- Simple testing for create
Note
- Have TTL keys in a long window (5min) in one lease. I'm not sure if we should do it right now. We could just implement it as simple as is. Once we have etcd3 work done we could start testing and measuring with some realistic load.
Automatic merge from submit-queue
client-gen: use serializer instead of codec for versioned client
For a versioned client, because the output of every client method is a versioned object, so it should use a serializer instead of a codec that does conversion.
@lavalamp @krousey
Automatic merge from submit-queue
Fix unintended change of Service.spec.ports[].nodePort during kubectl apply
Please refer #23551 for more detail. @bgrant0607 I think this simple fix should be ok to reuse nodePort. @thockin ptal.
Release note: Fix unintended change of `Service.spec.ports[].nodePort` during `kubectl apply`.
Automatic merge from submit-queue
genericapiserver: Moving more flags to ServerRunOptions
Moving more apiserver flags to generic api server.
With this change, most of the Config is created in genericapiserver.NewConfig(). The plan is to move everything there.
I didnt touch Storage related params as they are being changed in https://github.com/kubernetes/kubernetes/pull/23208.
And I will handle authz and authn in another PR (need to figure out a few things).
cc @lavalamp @kubernetes/sig-api-machinery
Automatic merge from submit-queue
Fix session ended hint for kubectl run
Fixes#23602
Before:
```console
$ kubectl run -i --tty busybox --image=busybox
Waiting for pod default/busybox-3797442026-mt8zk to be running, status is Pending, pod ready: false
Hit enter for command prompt
/ #
/ # exit
Session ended, resume using ' busybox-3797442026-mt8zk -c busybox -i -t' command when the pod is running
↑
(incomplete command)
```
After:
```console
Session ended, resume using 'kubectl attach busybox-3797442026-mt8zk -c busybox -i -t' command when the pod is running
```
@kubernetes/kubectl
Automatic merge from submit-queue
Don't log private SSH key
Log files may have more inclusive permissions than private SSH keys, and as such we should not log the key, even if it looks invalid. I accidentally leaked my key this way when posting e2e test logs.
If it does belong to the device then we make sure we mount the mpio device instead of
the raw device.
Heuristics
Login into /dev/disk/by-path/iqn-example.com.2999 -> /dev/sde
Check if sde existsin in /sys/block/[dm-X]/slaves/xx
If it does mount /dev/[dm-x] which will look like /dev/mapper/mpiodevicename in mount
examples/iscsi has more details
Automatic merge from submit-queue
Fix goroutine leak in ssh-tunnel healthcheck.
Tunnel healthchecks were not closing the HTTP response body, leading to many open goroutines.
Automatic merge from submit-queue
etcd3/store: watcher implementation
ref: https://github.com/kubernetes/kubernetes/issues/22448
This PR does:
- Provide a watcher that uses etcd v3 API to watch changes via etcd and process them based on existing logic of storage.Interface.Watch(), WatchList().
- By using the watcher, very trivial to implement Watch() and WatchList() in etcd3 storage.Interface implementation.
Automatic merge from submit-queue
shared controller informers
Related to https://github.com/kubernetes/kubernetes/issues/14978
This demonstrates how controllers which use an `Informer`, would be able to share the same watch and store. A similar "setup and run" approach could be done for an `IndexInformer` to share that cache. I found adding listeners here to be easier than intercepting at the watch interface (problems with resourceVersion) or the reflector (same plumbing, but you have to fan out to multiple stores).
We could also use the cache we build here to back several of the admission plugins that currently run their own lookup caches today.
If there's interest, I can finish out the `SharedInformer` and switch the low hanging fruit over.
@kubernetes/rh-cluster-infra @smarterclayton @liggitt @wojtek-t
Automatic merge from submit-queue
Fix PullImage and add corresponding node e2e test
Fixes#24101. This is a bug introduced by #23506, since ref #23563.
The root cause of #24101 is described [here](https://github.com/kubernetes/kubernetes/issues/24101#issuecomment-208547623).
This PR
1) Fixes#24101 by decoding the messages returned during pulling image, and return error if any of the messages contains error.
2) Add the node e2e test to detect this kind of failure.
3) Get present check out of `ConformanceImage.Remove()` and `ConformanceImage.Pull()`. Because sometimes we may expect error to occur in `PullImage()` and `RemoveImage()`, but even that doesn't happen, the `Present()` check will still return error and let the test pass.
@yujuhong @freehan @liangchenye
Also /cc @resouer, because he is doing the image related functions refactoring.
Automatic merge from submit-queue
kubenet: Load bridge netfilter module in Init().
This lets the kubenet loads the bridge netfilter module and set bridge-nf-call-iptables=1
Fix#24018
Follow up PRs would be appreciate if we also load the module in the bridge plugin binary itself. Ref https://github.com/kubernetes/kubernetes/issues/24018#issuecomment-207682514
cc @kubernetes/sig-node @sjpotter @euank
Automatic merge from submit-queue
Use correct defaults when binding apiserver flags
defaults should be set in the struct-creating function, then the current struct field value used as the default when binding the flag
Automatic merge from submit-queue
Expose SummaryProvider for reuse by other parts of kubelet
To support out of resource killing in the kubelet, we will introduce a new top-level module that will ensure node stability by checking if eviction thresholds have been met for memory and file-system usage on the node. In addition, it will then need information about pod memory and disk usage in order to make an eviction selection. Currently, this information is collected in `SummaryProvider` but it's hidden away and not available for re-use by other top-level modules of the kubelet. This initial refactor adds the ability to get summary stat information from the `ResourceAnalyzer` so it can be reused by other top-level modules.
I suspect we will further re-factor this area as code evolves, but this unblocks further progress on out-of-resource killing.
/cc @vishh @timothysc @kubernetes/sig-node @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
Use the first version as thirdparty resource preferredVersion
First commit is a one-liner, which implements the server-half of #23985.
The other two commits rearrange the test code, and add back a commented out test of thirdparty resource.
@lavalamp @nikhiljindal
Automatic merge from submit-queue
Bump up etcd dependency to fix data race
ref: https://github.com/kubernetes/kubernetes/pull/23694
What this PR does
- Bumping up the godep of etcd to fix data race in etcd watcher. Without this change, watcher PR builds will fail in race detection.
- Small changes to fix builds after upgrade
Automatic merge from submit-queue
Add memory available to summary stats provider
To support out of resource killing when low on memory, we want to let operators specify eviction thresholds based on available memory instead of memory usage for ease of use when working with heterogeneous nodes.
So for example, a valid eviction threshold would be the following:
* If node.memory.available < 200Mi for 30s, then evict pod(s)
For the node, `memory.availableBytes` is always known since the `memory.limit_in_bytes` is always known for root cgroup. For individual containers in pods, we only populate the `availableBytes` if the container was launched with a memory limit specified. When no memory limit is specified, the cgroupfs sets a value of 1 << 63 in the `memory.limit_in_bytes` so we look for a similar max value to handle unbounded limits, and ignore setting `memory.availableBytes`.
FYI @vishh @timstclair - as discussed on Slack.
/cc @kubernetes/sig-node @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
Move /resetMetrics to DELETE /metrics
Reduces the surface area of the API server slightly and allows
downstream components to have deleteable metrics. After this change
genericapiserver will *not* have metrics unless the caller defines it
(allows different apiserver implementations to make that choice on their
own).
@wojtek-t
Automatic merge from submit-queue
Make etcd cache size configurable
Instead of the prior 50K limit, allow users to specify a more sensible size for their cluster.
I'm not sure what a sensible default is here. I'm still experimenting on my own clusters. 50 gives me a 270MB max footprint. 50K caused my apiserver to run out of memory as it exceeded >2GB. I believe that number is far too large for most people's use cases.
There are some other fundamental issues that I'm not addressing here:
- Old etcd items are cached and potentially never removed (it stores using modifiedIndex, and doesn't remove the old object when it gets updated)
- Cache isn't LRU, so there's no guarantee the cache remains hot. This makes its performance difficult to predict. More of an issue with a smaller cache size.
- 1.2 etcd entries seem to have a larger memory footprint (I never had an issue in 1.1, even though this cache existed there). I suspect that's due to image lists on the node status.
This is provided as a fix for #23323
Automatic merge from submit-queue
Kubelet: Refactor container related functions in DockerInterface
For #23563.
Based on #23506, will rebase after #23506 is merged.
The last 4 commits of this PR are new.
This PR refactors all container lifecycle related functions in DockerInterface, including:
* ListContainers
* InspectContainer
* CreateContainer
* StartContainer
* StopContainer
* RemoveContainer
@kubernetes/sig-node
Automatic merge from submit-queue
Add watch.Until, a conditional watch mechanism
A more powerful tool than wait.Poll, allows a watch interface to drive conditionals to react to changes on a resource or resources. Provide a set of standard conditions that are in common use in the code, and updates e2e to use a few of these.
Extracted from #23567
Automatic merge from submit-queue
the component status health check should check whether the scheme of backend storage url is https or not
fix https://github.com/kubernetes/kubernetes/issues/23897, when querying the component status of etcd (backend storage), the scheme of url is not checked and use `http` always, this commit aims to fix this.
Automatic merge from submit-queue
Flexvolume: Add support for multiple secrets
This PR adds support to pass multiple secrets for flexvolume plugins.
To allow multiple secrets, secrets are now passed as:
"kubernetes.io/secret/id-rsa":"value-2\r\n\r\n","kubernetes.io/secret/id-rsa.pub":"value-1\r\n"
Automatic merge from submit-queue
Fix expired event logic to use 404 instead of 500
It seems this logic was never updated once apiserver started returning 404s for expired (missing) events.
This change corrects it to use a 404 so events will get resent correctly if they were expired in etcd.
Fixes#23637.
Automatic merge from submit-queue
Make kubectl edit not convert GV on edits
Previously, kubectl edit was using a decoder to load in edits that
converted to the internal version. It would then re-encode this
decoded value to produce a patch. However, if you were editing
in the object in a GroupVersion that was not the internal version,
this would cause the kubectl edit command to attempt to produce
a patch which changed the GroupVersion, which would fail.
Now, we use a plain deserializer instead, so no conversion or
defaulting occurs when loading in the edited file.
Ref #23378
Automatic merge from submit-queue
phase 2 of cassandra example overhaul
Here's the next iteration in overhauling this example, towards https://github.com/kubernetes/kubernetes/issues/20961. This removes the pod adoption part, but doesn't (yet) otherwise change any of the resources used.
It also includes some README cleanup, and removes some explicit specification of labels in the rc yaml.
This PR doesn't yet add any commentary on how we're using the seed provider (re: https://github.com/kubernetes/kubernetes/issues/20961#issuecomment-190405959 etc.). Maybe we should add that.
Also: LMK if this PR should include any changes to the links out to the docs.
cc @bgrant0607 @johndmulhausen
Automatic merge from submit-queue
rkt: Fix hostnetwork.
Mount hosts' /etc/hosts, /etc/resolv.conf, set host's hostname
when running the pod in the host's network.
Fix#24235
cc @kubernetes/sig-node
Automatic merge from submit-queue
rkt: Use rkt pod's uuid as the systemd service file's name.
Previously, the service file's name is 'k8s_${POD_UID}.service',
which means we need to `systemctl daemon-reload` if the we replace
the content of the service file (e.g. pod is restarted).
However this makes the journal in the previous pod get disconnected.
This PR solves the issue by using the unique rkt uuid as the service
file's name. After the change, the service file's name will be:
'k8s_${rkt_uuid}.service'.
Fix#23691