Lightweight Kubernetes
 
 
 
 
Go to file
Jiaying Zhang 5514a1f4dd Fixes the races around devicemanager Allocate() and endpoint deletion.
There is a race in predicateAdmitHandler Admit() that getNodeAnyWayFunc()
could get Node with non-zero deviceplugin resource allocatable for a
non-existing endpoint. That race can happen when a device plugin fails,
but is more likely when kubelet restarts as with the current registration
model, there is a time gap between kubelet restart and device plugin
re-registration. During this time window, even though devicemanager could
have removed the resource initially during GetCapacity() call, Kubelet
may overwrite the device plugin resource capacity/allocatable with the
old value when node update from the API server comes in later. This
could cause a pod to be started without proper device runtime config set.

To solve this problem, introduce endpointStopGracePeriod. When a device
plugin fails, don't immediately remove the endpoint but set stopTime in
its endpoint. During kubelet restart, create endpoints with stopTime set
for any checkpointed registered resource. The endpoint is considered to be
in stopGracePeriod if its stoptime is set. This allows us to track what
resources should be handled by devicemanager during the time gap.
When an endpoint's stopGracePeriod expires, we remove the endpoint and
its resource. This allows the resource to be exported through other channels
(e.g., by directly updating node status through API server) if there is such
use case. Currently endpointStopGracePeriod is set as 5 minutes.

Given that an endpoint is no longer immediately removed upon disconnection,
mark all its devices unhealthy so that we can signal the resource allocatable
change to the scheduler to avoid scheduling more pods to the node.
When a device plugin endpoint is in stopGracePeriod, pods requesting the
corresponding resource will fail admission handler.
2018-03-09 17:00:57 -08:00
.github Merge pull request #54114 from xiangpengzhao/fix-pr-template 2017-10-30 18:37:06 -07:00
Godeps bump(6644d4): spf13/cobra: support bash completion for aliases 2018-03-02 21:28:13 +05:30
api API Changes for RunAsGroup and Implementation and e2e 2018-02-28 22:09:56 -08:00
build Merge pull request #60669 from ixdy/bazel-test-visibility 2018-03-02 15:13:21 -08:00
cluster Merge pull request #60237 from crassirostris/audit-use-buffered-backend 2018-03-01 11:42:48 -08:00
cmd Merge pull request #60237 from crassirostris/audit-use-buffered-backend 2018-03-01 11:42:48 -08:00
docs Merge pull request #52077 from krmayankk/runas 2018-03-01 15:23:51 -08:00
examples Merge pull request #59149 from verult/flex-examples 2018-02-28 04:54:29 -08:00
hack Update gazelle to latest to fix vendoring issue 2018-03-02 11:58:31 -08:00
logo Don't use strokes in the logo SVG 2017-10-12 09:38:56 -07:00
pkg Fixes the races around devicemanager Allocate() and endpoint deletion. 2018-03-09 17:00:57 -08:00
plugin Merge pull request #55019 from mikedanese/svcacct 2018-02-27 10:50:46 -08:00
staging bump(6644d4): spf13/cobra: support bash completion for aliases 2018-03-02 21:28:13 +05:30
test Fixes the races around devicemanager Allocate() and endpoint deletion. 2018-03-09 17:00:57 -08:00
third_party Merge pull request #60506 from php-coder/fix_suppress_gdate_cmd 2018-02-28 07:20:25 -08:00
translations Merge pull request #51925 from zhanghuidinah/fix-broken-link 2018-02-27 21:40:21 -08:00
vendor bump(6644d4): spf13/cobra: support bash completion for aliases 2018-03-02 21:28:13 +05:30
.bazelrc move build related files out of the root directory 2017-05-15 15:53:54 -07:00
.generated_files Move .generated_docs to docs/ so docs OWNERS can review / approve 2017-02-16 10:11:57 -08:00
.gitattributes Hide generated files only on github 2018-01-22 10:58:48 +01:00
.gitignore fix all the typos across the project 2018-02-11 11:04:14 +08:00
.kazelcfg.json Switch from gazel to kazel, and move kazelcfg into build/root 2017-07-18 12:48:51 -07:00
BUILD.bazel move build related files out of the root directory 2017-05-15 15:53:54 -07:00
CHANGELOG-1.2.md Update TOC of CHANGELOG 2017-09-09 13:38:29 +08:00
CHANGELOG-1.3.md fix the format for github error 2018-01-31 14:49:29 +08:00
CHANGELOG-1.4.md fix the format for github error 2018-02-02 18:44:27 +08:00
CHANGELOG-1.5.md fix typo in kubeadm 2018-02-06 13:48:18 +08:00
CHANGELOG-1.6.md Fix typo 2018-02-01 19:11:19 +08:00
CHANGELOG-1.7.md Update CHANGELOG-1.7.md for v1.7.13. 2018-03-01 09:06:35 +00:00
CHANGELOG-1.8.md Update CHANGELOG-1.8.md for v1.8.8. 2018-02-09 15:01:39 -08:00
CHANGELOG-1.9.md Fix incorrectly formatted URL 2018-02-22 12:20:54 -08:00
CHANGELOG-1.10.md Update CHANGELOG-1.10.md for v1.10.0-beta.1. 2018-03-01 02:59:00 -05:00
CHANGELOG.md Update release note links for 1.10 2018-01-17 22:45:12 +01:00
CONTRIBUTING.md Pointed to community/contributors/guide/README.md 2017-12-15 22:08:34 +05:30
LICENSE LICENSE: revert modifications to Apache license 2016-11-22 11:44:46 -08:00
Makefile move build related files out of the root directory 2017-05-15 15:53:54 -07:00
Makefile.generated_files move build related files out of the root directory 2017-05-15 15:53:54 -07:00
OWNERS Fix my incorrect username in #46649 2017-08-10 11:59:54 -07:00
OWNERS_ALIASES Remove spxtr from various OWNERS files. 2018-02-28 13:04:32 -08:00
README.md Update README.md 2018-02-11 04:34:01 +00:00
SUPPORT.md Add a SUPPORT.md file for github 2017-08-11 14:42:36 -04:00
WORKSPACE move build related files out of the root directory 2017-05-15 15:53:54 -07:00
code-of-conduct.md Update code-of-conduct.md 2017-12-20 13:33:36 -05:00
labels.yaml Merge pull request #51848 from xiangpengzhao/milestone-label 2017-09-05 15:46:19 -07:00

README.md

Kubernetes

Submit Queue Widget GoDoc Widget CII Best Practices


Kubernetes is an open source system for managing containerized applications across multiple hosts; providing basic mechanisms for deployment, maintenance, and scaling of applications.

Kubernetes builds upon a decade and a half of experience at Google running production workloads at scale using a system called Borg, combined with best-of-breed ideas and practices from the community.

Kubernetes is hosted by the Cloud Native Computing Foundation (CNCF). If you are a company that wants to help shape the evolution of technologies that are container-packaged, dynamically-scheduled and microservices-oriented, consider joining the CNCF. For details about who's involved and how Kubernetes plays a role, read the CNCF announcement.


To start using Kubernetes

See our documentation on kubernetes.io.

Try our interactive tutorial.

Take a free course on Scalable Microservices with Kubernetes.

To start developing Kubernetes

The community repository hosts all information about building Kubernetes from source, how to contribute code and documentation, who to contact about what, etc.

If you want to build Kubernetes right away there are two options:

You have a working Go environment.
$ go get -d k8s.io/kubernetes
$ cd $GOPATH/src/k8s.io/kubernetes
$ make
You have a working Docker environment.
$ git clone https://github.com/kubernetes/kubernetes
$ cd kubernetes
$ make quick-release

For the full story, head over to the developer's documentation.

Support

If you need support, start with the troubleshooting guide, and work your way through the process that we've outlined.

That said, if you have questions, reach out to us one way or another.

Analytics