Automatic merge from submit-queue
Define interfaces for kubelet pod admission and eviction
There is too much code and logic in `kubelet.go` that makes it hard to test functions in discrete pieces.
I propose an interface that an internal module can implement that will let it make an admission decision for a pod. If folks are ok with the pattern, I want to move the a) predicate checking, b) out of disk, c) eviction preventing best-effort pods being admitted into their own dedicated handlers that would be easier for us to mock test. We can then just write tests to ensure that the `Kubelet` calls a call-out, and we can write easier unit tests to ensure that dedicated handlers do the right thing.
The second interface I propose was a `PodEvictor` that is invoked in the main kubelet sync loop to know if pods should be pro-actively evicted from the machine. The current active deadline check should move into a simple evictor implementation, and I want to plug the out of resource killer code path as an implementation of the same interface.
@vishh @timothysc - if you guys can ack on this, I will add some unit testing to ensure we do the call-outs.
/cc @kubernetes/sig-node @kubernetes/rh-cluster-infra
Automatic merge from submit-queue
Remove nodeName from predicate signature.
With this approach, I'm getting the initial throughput (in empty cluster) in 1000-node cluster of ~95pods/s.
Which is ~30% improvement.
@kubernetes/sig-scalability
Automatic merge from submit-queue
Kubelet: Cleanup with new engine api
Finish step 2 of #23563
This PR:
1) Cleanup go-dockerclient reference in the code.
2) Bump up the engine-api version.
3) Cleanup the code with new engine-api.
Fixes#24076.
Fixes#23809.
/cc @yujuhong
Automatic merge from submit-queue
Promote Pod Hostname & Subdomain to fields (were annotations)
Deprecating the podHostName, subdomain and PodHostnames annotations and created corresponding new fields for them on PodSpec and Endpoints types.
Annotation doc: #22564
Annotation code: #20688
Automatic merge from submit-queue
Store node information in NodeInfo
This is significantly improving scheduler throughput.
On 1000-node cluster:
- empty cluster: ~70pods/s
- full cluster: ~45pods/s
Drop in throughput is mostly related to priority functions, which I will be looking into next (I already have some PR #24095, but we need for more things before).
This is roughly ~40% increase.
However, we still need better understanding of predicate function, because in my opinion it should be even faster as it is now. I'm going to look into it next week.
@gmarek @hongchaodeng @xiang90
This is implemented via touching a file on stop as a hook in the systemd
unit. The ctime of this file is then used to get the `finishedAt` time
in the future.
In addition, this changes the `startedAt` and `createdAt` to use the api
server's results rather than the annotations it previously used.
It's possible we might want to move this into the api in the future.
Fixes#23887
Automatic merge from submit-queue
Kubelet: Better-defined Container Waiting state
For issue #20478 and #21125.
This PR corrected logic and add unit test for `ShouldContainerBeRestarted()`, cleaned up `Waiting` state related code and added unit test for `generateAPIPodStatus()`.
Fixes#20478Fixes#17971
@yujuhong
Automatic merge from submit-queue
rkt: Add pre-stop lifecycle hooks for rkt.
When a pod is being terminated, the pre-stop hooks of all the containers
will be run before the containers are stopped.
cc @yujuhong @Random-Liu @sjpotter
Automatic merge from submit-queue
Move predicates into library
This PR tries to implement #12744
Any suggestions/ideas are welcome. @davidopp
current state: integration test fails if including podCount check in Kubelet.
DONE:
1. refactor all predicates: predicates return fitOrNot(bool) and error(Error) in which the latter is of type PredicateFailureError or InsufficientResourceError
2. GeneralPredicates() is a predicate function, which includes serveral other predicate functions (PodFitsResource, PodFitsHost, PodFitsHostPort). It is registered as one of the predicates in DefaultAlgorithmProvider, and is also called in canAdmitPod() in Kubelet and should be called by other components (like rescheduler, etc if necessary. See discussion in issue #12744
TODO:
1. determine which predicates should be included in GeneralPredicates()
2. separate GeneralPredicates() into: a.) GeneralPredicatesEvictPod() and b.) GeneralPredicatesNotEvictPod()
3. DaemonSet should use GeneralPredicates()
DONE:
1. refactor all predicates: predicates return fitOrNot(bool) and error(Error) in which the latter is of type
PredicateFailureError or InsufficientResourceError. (For violation of either MaxEBSVolumeCount or
MaxGCEPDVolumeCount, returns one same error type as ErrMaxVolumeCountExceeded)
2. GeneralPredicates() is a predicate function, which includes serveral other predicate functions (PodFitsResource,
PodFitsHost, PodFitsHostPort). It is registered as one of the predicates in DefaultAlgorithmProvider, and
is also called in canAdmitPod() in Kubelet and should be called by other components (like rescheduler, etc)
if necessary. See discussion in issue #12744
3. remove podNumber check from GeneralPredicates
4. HostName is now verified in Kubelet's canAdminPod(). add TestHostNameConflicts in kubelet_test.go
5. add getNodeAnyWay() method in Kubelet to get node information in standaloneMode
TODO:
1. determine which predicates should be included in GeneralPredicates()
2. separate GeneralPredicates() into:
a. GeneralPredicatesEvictPod() and
b. GeneralPredicatesNotEvictPod()
3. DaemonSet should use GeneralPredicates()
Automatic merge from submit-queue
Kubelet: Start using the official docker engine-api
For #23563.
This is the **first step** in the roadmap of switching to docker [engine-api](https://github.com/docker/engine-api).
In this PR, I keep the old `DockerInterface` and implement it with the new engine-api.
With this approach, we could switch to engine-api with minimum change, so that we could:
* Test the engine-api without huge refactoring.
* Send following PRs to refactor functions in `DockerInterface` separately so as to avoid a huge change in one PR.
I've tested this PR locally, it passed all the node conformance test:
```
make test_e2e_node
Ran 19 of 19 Specs in 823.395 seconds
SUCCESS! -- 19 Passed | 0 Failed | 0 Pending | 0 Skipped PASS
Ginkgo ran 1 suite in 13m49.429979585s
Test Suite Passed
```
And it also passed the jenkins gce e2e test:
```
go run hack/e2e.go -test -v --test_args="--ginkgo.skip=\[Slow\]|\[Serial\]|\[Disruptive\]|\[Flaky\]|\[Feature:.+\]"
Ran 161 of 268 Specs in 4570.214 seconds
SUCCESS! -- 161 Passed | 0 Failed | 0 Pending | 107 Skipped PASS
Ginkgo ran 1 suite in 1h16m16.325934558s
Test Suite Passed
2016/03/25 15:12:42 e2e.go:196: Step 'Ginkgo tests' finished in 1h16m18.918754301s
```
I'm writing the design document, and will post the switching roadmap in an umbrella issue soon.
@kubernetes/sig-node
Allow network plugins to declare that they handle shaping and that
Kuberenetes should not. Will be first used by openshift-sdn which
handles shaping through OVS, but this triggers a warning when
kubelet notices the bandwidth annotations.
Add GeneratePodHostNameAndDomain() to RuntimeHelper to
get the hostname of the pod from kubelet.
Also update the logging flag to change the journal match from
_HOSTNAME to _MACHINE_ID.