Merge pull request #10804 from saad-ali/clusterScaleDoc

Add documentation for creating large kubernetes clusters
pull/6/head
Tim Hockin 2015-07-13 16:19:42 -07:00
commit 32699e873a
2 changed files with 115 additions and 8 deletions

60
docs/cluster-large.md Normal file
View File

@ -0,0 +1,60 @@
# Kubernetes Large Cluster
## Support
At v1.0, Kubernetes supports clusters up to 100 nodes with 30-50 pods per node and 1-2 container per pod (as defined in the [1.0 roadmap](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/roadmap.md#reliability-and-performance)).
## Setup
Normally the number of nodes in a cluster is controlled by the the value `NUM_MINIONS` in the platform-specific `config-default.sh` file (for example, see [GCE's `config-default.sh`](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/cluster/gce/config-default.sh)).
Simply changing that value to something very large, however, may cause the setup script to fail for many cloud providers. A GCE deployment, for example, will run in to quota issues and fail to bring the cluster up.
When setting up a large Kubernetes cluster, the following must be taken into consideration.
### Quota Issues
To avoid running into cloud provider quota issues, when creating a cluster with many nodes, consider:
* Increase the quota for things like CPU, IPs, etc.
* In [GCE, for example,](https://cloud.google.com/compute/docs/resource-quotas) you'll want to increase the quota for:
* CPUs
* VM instances
* Total persistent disk reserved
* In-use IP addresses
* Firewall Rules
* Forwarding rules
* Routes
* Target pools
* Gating the setup script so that it brings up new node VMs in smaller batches with waits in between, because some cloud providers limit the number of VMs you can create during a given period.
### Addon Resources
To prevent memory leaks or other resource issues in [cluster addons](https://github.com/GoogleCloudPlatform/kubernetes/tree/master/cluster/addons/) from consuming all the resources available on a node, Kubernetes sets resource limits on addon containers to limit the CPU and Memory resources they can consume (See PR [#10653](https://github.com/GoogleCloudPlatform/kubernetes/pull/10653/files) and [#10778](https://github.com/GoogleCloudPlatform/kubernetes/pull/10778/files)).
For example:
```YAML
containers:
- image: gcr.io/google_containers/heapster:v0.15.0
name: heapster
resources:
limits:
cpu: 100m
memory: 200Mi
```
These limits, however, are based on data collected from addons running on 4-node clusters (see [#10335](https://github.com/GoogleCloudPlatform/kubernetes/issues/10335#issuecomment-117861225)). The addons consume a lot more resources when running on large deployment clusters (see [#5880](https://github.com/GoogleCloudPlatform/kubernetes/issues/5880#issuecomment-113984085)). So, if a large cluster is deployed without adjusting these values, the addons may continuously get killed because they keep hitting the limits.
To avoid running into cluster addon resource issues, when creating a cluster with many nodes, consider the following:
* Scale memory and CPU limits for each of the following addons, if used, along with the size of cluster (there is one replica of each handling the entire cluster so memory and CPU usage tends to grow proportionally with size/load on cluster):
* Heapster ([GCM/GCL backed](../cluster/addons/cluster-monitoring/google/heapster-controller.yaml), [InfluxDB backed](../cluster/addons/cluster-monitoring/influxdb/heapster-controller.yaml), [InfluxDB/GCL backed](../cluster/addons/cluster-monitoring/googleinfluxdb/heapster-controller-combined.yaml), [standalone](../cluster/addons/cluster-monitoring/standalone/heapster-controller.yaml))
* [InfluxDB and Grafana](../cluster/addons/cluster-monitoring/influxdb/influxdb-grafana-controller.yaml)
* [skydns, kube2sky, and dns etcd](../cluster/addons/dns/skydns-rc.yaml.in)
* [Kibana](../cluster/addons/fluentd-elasticsearch/kibana-controller.yaml)
* Scale number of replicas for the following addons, if used, along with the size of cluster (there are multiple replicas of each so increasing replicas should help handle increased load, but, since load per replica also increases slightly, also consider increasing CPU/memory limits):
* [elasticsearch](../cluster/addons/fluentd-elasticsearch/es-controller.yaml)
* Increase memory and CPU limits sligthly for each of the following addons, if used, along with the size of cluster (there is one replica per node but CPU/memory usage increases slightly along with cluster load/size as well):
* [FluentD with ElasticSearch Plugin](../cluster/saltbase/salt/fluentd-es/fluentd-es.yaml)
* [FluentD with GCP Plugin](../cluster/saltbase/salt/fluentd-gcp/fluentd-gcp.yaml)
For directions on how to detect if addon containers are hitting resource limits, see the [Troubleshooting section of Compute Resources](compute_resources.md#troubleshooting).
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/cluster-large.md?pixel)]()

View File

@ -21,6 +21,7 @@ certainly want the docs that go with that version.</h1>
- [How Pods with Resource Limits are Run](#how-pods-with-resource-limits-are-run)
- [Monitoring Compute Resource Usage](#monitoring-compute-resource-usage)
- [Troubleshooting](#troubleshooting)
- [Detecting Resource Starved Containers](#detecting-resource-starved-containers)
- [Planned Improvements](#planned-improvements)
When specifying a [pod](pods.md), you can optionally specify how much CPU and memory (RAM) each
@ -108,18 +109,14 @@ When using Docker:
**TODO: document behavior for rkt**
If a container exceeds its memory limit, it may be terminated. If it is restartable, it will be
restarted by kubelet, as will any other type of runtime failure. If it is killed for exceeding its
memory limit, you will see the reason `OOM Killed`, as in this example:
```
$ kubectl get pods/memhog
NAME READY REASON RESTARTS AGE
memhog 0/1 OOM Killed 0 1h
```
*OOM* stands for Out Of Memory.
restarted by kubelet, as will any other type of runtime failure.
A container may or may not be allowed to exceed its CPU limit for extended periods of time.
However, it will not be killed for excessive CPU usage.
To determine if a container cannot be scheduled or is being killed due to resource limits, see the
"Troubleshooting" section below.
## Monitoring Compute Resource Usage
The resource usage of a pod is reported as part of the Pod status.
@ -154,6 +151,56 @@ The [resource quota](resource_quota_admin.md) feature can be configured
to limit the total amount of resources that can be consumed. If used in conjunction
with namespaces, it can prevent one team from hogging all the resources.
### Detecting Resource Starved Containers
To check if a container is being killed because it is hitting a resource limit, call `kubectl describe pod`
on the pod you are interested in:
```
[12:54:41] $ ./cluster/kubectl.sh describe pod simmemleak-hra99
Name: simmemleak-hra99
Namespace: default
Image(s): saadali/simmemleak
Node: kubernetes-minion-tf0f/10.240.216.66
Labels: name=simmemleak
Status: Running
Reason:
Message:
IP: 10.244.2.75
Replication Controllers: simmemleak (1/1 replicas created)
Containers:
simmemleak:
Image: saadali/simmemleak
Limits:
cpu: 100m
memory: 50Mi
State: Running
Started: Tue, 07 Jul 2015 12:54:41 -0700
Ready: False
Restart Count: 5
Conditions:
Type Status
Ready False
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
Tue, 07 Jul 2015 12:53:51 -0700 Tue, 07 Jul 2015 12:53:51 -0700 1 {scheduler } scheduled Successfully assigned simmemleak-hra99 to kubernetes-minion-tf0f
Tue, 07 Jul 2015 12:53:51 -0700 Tue, 07 Jul 2015 12:53:51 -0700 1 {kubelet kubernetes-minion-tf0f} implicitly required container POD pulled Pod container image "gcr.io/google_containers/pause:0.8.0" already present on machine
Tue, 07 Jul 2015 12:53:51 -0700 Tue, 07 Jul 2015 12:53:51 -0700 1 {kubelet kubernetes-minion-tf0f} implicitly required container POD created Created with docker id 6a41280f516d
Tue, 07 Jul 2015 12:53:51 -0700 Tue, 07 Jul 2015 12:53:51 -0700 1 {kubelet kubernetes-minion-tf0f} implicitly required container POD started Started with docker id 6a41280f516d
Tue, 07 Jul 2015 12:53:51 -0700 Tue, 07 Jul 2015 12:53:51 -0700 1 {kubelet kubernetes-minion-tf0f} spec.containers{simmemleak} created Created with docker id 87348f12526a
```
The `Restart Count: 5` indicates that the `simmemleak` container in this pod was terminated and restarted 5 times.
Once [#10861](https://github.com/GoogleCloudPlatform/kubernetes/issues/10861) is resolved the reason for the termination of the last container will also be printed in this output.
Until then you can call `get pod` with the `-o template -t ...` option to fetch the status of previously terminated containers:
```
[13:59:01] $ ./cluster/kubectl.sh get pod -o template -t '{{range.status.containerStatuses}}{{"Container Name: "}}{{.name}}{{"\r\nLastState: "}}{{.lastState}}{{end}}' simmemleak-60xbc
Container Name: simmemleak
LastState: map[terminated:map[exitCode:137 reason:OOM Killed startedAt:2015-07-07T20:58:43Z finishedAt:2015-07-07T20:58:43Z containerID:docker://0e4095bba1feccdfe7ef9fb6ebffe972b4b14285d5acdec6f0d3ae8a22fad8b2]][13:59:03] clusterScaleDoc ~/go/src/github.com/GoogleCloudPlatform/kubernetes $
```
We can see that this container was terminated because `reason:OOM Killed`, where *OOM* stands for Out Of Memory.
## Planned Improvements
The current system only allows resource quantities to be specified on a container.