2015-07-28 18:18:50 +00:00
|
|
|
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
|
|
|
|
|
|
|
|
<!-- BEGIN STRIP_FOR_RELEASE -->
|
|
|
|
|
|
|
|
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
|
|
|
|
width="25" height="25">
|
|
|
|
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
|
|
|
|
width="25" height="25">
|
|
|
|
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
|
|
|
|
width="25" height="25">
|
|
|
|
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
|
|
|
|
width="25" height="25">
|
|
|
|
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
|
|
|
|
width="25" height="25">
|
|
|
|
|
|
|
|
<h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
|
|
|
|
|
|
|
|
If you are using a released version of Kubernetes, you should
|
|
|
|
refer to the docs that go with that version.
|
|
|
|
|
|
|
|
<strong>
|
|
|
|
The latest 1.0.x release of this document can be found
|
|
|
|
[here](http://releases.k8s.io/release-1.0/docs/design/aws_under_the_hood.md).
|
|
|
|
|
|
|
|
Documentation for other releases can be found at
|
|
|
|
[releases.k8s.io](http://releases.k8s.io).
|
|
|
|
</strong>
|
|
|
|
--
|
|
|
|
|
|
|
|
<!-- END STRIP_FOR_RELEASE -->
|
|
|
|
|
|
|
|
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
|
|
|
|
2015-09-19 16:53:19 +00:00
|
|
|
# Peeking under the hood of Kubernetes on AWS
|
2015-07-28 18:18:50 +00:00
|
|
|
|
2015-09-19 16:53:19 +00:00
|
|
|
This document provides high-level insight into how Kubernetes works on AWS and
|
|
|
|
maps to AWS objects. We assume that you are familiar with AWS.
|
2015-07-28 18:18:50 +00:00
|
|
|
|
2015-09-19 16:53:19 +00:00
|
|
|
We encourage you to use [kube-up](../getting-started-guides/aws.md) (or
|
2015-09-19 17:16:52 +00:00
|
|
|
[CloudFormation](../getting-started-guides/aws-coreos.md)) to create clusters on
|
2015-09-19 16:53:19 +00:00
|
|
|
AWS. We recommend that you avoid manual configuration but are aware that
|
|
|
|
sometimes it's the only option.
|
2015-07-28 18:18:50 +00:00
|
|
|
|
2015-09-19 16:53:19 +00:00
|
|
|
Tip: You should open an issue and let us know what enhancements can be made to
|
|
|
|
the scripts to better suit your needs.
|
|
|
|
|
|
|
|
That said, it's also useful to know what's happening under the hood when
|
|
|
|
Kubernetes clusters are created on AWS. This can be particularly useful if
|
|
|
|
problems arise or in circumstances where the provided scripts are lacking and
|
|
|
|
you manually created or configured your cluster.
|
|
|
|
|
|
|
|
### Architecture overview
|
|
|
|
|
|
|
|
Kubernetes is a cluster of several machines that consists of a Kubernetes
|
|
|
|
master and a set number of nodes (previously known as 'minions') for which the
|
|
|
|
master which is responsible. See the [Architecture](architecture.md) topic for
|
|
|
|
more details.
|
2015-07-28 18:18:50 +00:00
|
|
|
|
|
|
|
Other documents describe the general architecture of Kubernetes (all nodes run
|
|
|
|
Docker; the kubelet agent runs on each node and launches containers; the
|
|
|
|
kube-proxy relays traffic between the nodes etc).
|
|
|
|
|
|
|
|
By default on AWS:
|
|
|
|
|
|
|
|
* Instances run Ubuntu 15.04 (the official AMI). It includes a sufficiently
|
2015-09-19 17:16:52 +00:00
|
|
|
modern kernel that pairs well with Docker and doesn't require a
|
2015-09-19 16:53:19 +00:00
|
|
|
reboot. (The default SSH user is `ubuntu` for this and other ubuntu images.)
|
2015-07-28 18:18:50 +00:00
|
|
|
* By default we run aufs over ext4 as the filesystem / container storage on the
|
|
|
|
nodes (mostly because this is what GCE uses).
|
|
|
|
|
2015-09-19 16:53:19 +00:00
|
|
|
You can override these defaults by passing different environment variables to
|
2015-07-28 18:18:50 +00:00
|
|
|
kube-up.
|
|
|
|
|
|
|
|
### Storage
|
|
|
|
|
2015-09-19 17:16:52 +00:00
|
|
|
AWS supports persistent volumes by using [Elastic Block Store (EBS)](../user-guide/volumes.md#awselasticblockstore). These can then be
|
2015-09-19 16:53:19 +00:00
|
|
|
attached to pods that should store persistent data (e.g. if you're running a
|
|
|
|
database).
|
|
|
|
|
2015-09-19 17:16:52 +00:00
|
|
|
By default, nodes in AWS use [instance storage](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html)
|
2015-09-19 16:53:19 +00:00
|
|
|
unless you create pods with persistent volumes
|
2015-09-19 17:16:52 +00:00
|
|
|
[(EBS)](../user-guide/volumes.md#awselasticblockstore). In general, Kubernetes
|
|
|
|
containers do not have persistent storage unless you attach a persistent
|
|
|
|
volume, and so nodes on AWS use instance storage. Instance storage is cheaper,
|
|
|
|
often faster, and historically more reliable. This does mean that you should
|
|
|
|
pick an instance type that has sufficient instance storage, unless you can make
|
|
|
|
do with whatever space is left on your root partition.
|
|
|
|
|
|
|
|
Note: The master uses a persistent volume ([etcd](architecture.md#etcd)) to track
|
|
|
|
its state but similar to the nodes, containers are mostly run against instance
|
2015-09-19 16:53:19 +00:00
|
|
|
storage, except that we repoint some important data onto the peristent volume.
|
|
|
|
|
2015-09-19 17:16:52 +00:00
|
|
|
The default storage driver for Docker images is aufs. Specifying btrfs (by passing the environment
|
|
|
|
variable `DOCKER_STORAGE=btrfs` to kube-up) is also a good choice for a filesystem. btrfs
|
2015-09-19 16:53:19 +00:00
|
|
|
is relatively reliable with Docker and has improved its reliability with modern
|
|
|
|
kernels. It can easily span multiple volumes, which is particularly useful
|
|
|
|
when we are using an instance type with multiple ephemeral instance disks.
|
2015-07-28 18:18:50 +00:00
|
|
|
|
|
|
|
### AutoScaling
|
|
|
|
|
2015-09-19 17:16:52 +00:00
|
|
|
Nodes (but not the master) are run in an
|
|
|
|
[AutoScalingGroup](http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/AutoScalingGroup.html)
|
2015-09-19 16:53:19 +00:00
|
|
|
on AWS. Currently auto-scaling (e.g. based on CPU) is not actually enabled
|
|
|
|
([#11935](http://issues.k8s.io/11935)). Instead, the auto-scaling group means
|
2015-09-19 17:16:52 +00:00
|
|
|
that AWS will relaunch any nodes that are terminated.
|
2015-07-28 18:18:50 +00:00
|
|
|
|
|
|
|
We do not currently run the master in an AutoScalingGroup, but we should
|
2015-09-19 16:53:19 +00:00
|
|
|
([#11934](http://issues.k8s.io/11934)).
|
2015-07-28 18:18:50 +00:00
|
|
|
|
|
|
|
### Networking
|
|
|
|
|
|
|
|
Kubernetes uses an IP-per-pod model. This means that a node, which runs many
|
2015-09-19 16:53:19 +00:00
|
|
|
pods, must have many IPs. AWS uses virtual private clouds (VPCs) and advanced
|
|
|
|
routing support so each pod is assigned a /24 CIDR. Each pod is assigned a /24
|
|
|
|
CIDR; the assigned CIDR is then configured to route to an instance in the VPC
|
|
|
|
routing table.
|
|
|
|
|
|
|
|
It is also possible to use overlay networking on AWS, but that is not the
|
|
|
|
configuration of the kube-up script.
|
|
|
|
|
|
|
|
### NodePort and LoadBalancing
|
|
|
|
|
|
|
|
Kubernetes on AWS integrates with [Elastic Load Balancing
|
|
|
|
(ELB)](http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/US_SetUpASLBApp.html).
|
|
|
|
When you create a service with `Type=LoadBalancer`, Kubernetes (the
|
|
|
|
kube-controller-manager) will create an ELB, create a security group for the
|
|
|
|
ELB which allows access on the service ports, attach all the nodes to the ELB,
|
|
|
|
and modify the security group for the nodes to allow traffic from the ELB to
|
|
|
|
the nodes. This traffic reaches kube-proxy where it is then forwarded to the
|
|
|
|
pods.
|
|
|
|
|
|
|
|
ELB has some restrictions: it requires that all nodes listen on a single port,
|
|
|
|
and it acts as a forwarding proxy (i.e. the source IP is not preserved). To
|
2015-09-19 17:16:52 +00:00
|
|
|
work with these restrictions, in Kubernetes, [LoadBalancer
|
|
|
|
services](../user-guide/services.html#type-loadbalancer) are exposed as
|
|
|
|
[NodePort services](../user-guide/services.html#type-nodeport). Then
|
2015-09-19 16:53:19 +00:00
|
|
|
kube-proxy listens externally on the cluster-wide port that's assigned to
|
|
|
|
NodePort services and forwards traffic to the corresponding pods. So ELB is
|
|
|
|
configured to proxy traffic on the public port (e.g. port 80) to the NodePort
|
|
|
|
that is assigned to the service (e.g. 31234). Any in-coming traffic sent to
|
|
|
|
the NodePort (e.g. port 31234) is recognized by kube-proxy and then sent to the
|
|
|
|
correct pods for that service.
|
2015-07-28 18:18:50 +00:00
|
|
|
|
|
|
|
Note that we do not automatically open NodePort services in the AWS firewall
|
|
|
|
(although we do open LoadBalancer services). This is because we expect that
|
|
|
|
NodePort services are more of a building block for things like inter-cluster
|
|
|
|
services or for LoadBalancer. To consume a NodePort service externally, you
|
2015-09-19 16:53:19 +00:00
|
|
|
will likely have to open the port in the node security group
|
2015-07-28 18:18:50 +00:00
|
|
|
(`kubernetes-minion-<clusterid>`).
|
|
|
|
|
2015-09-19 16:53:19 +00:00
|
|
|
### Identity and Access Management (IAM)
|
2015-07-28 18:18:50 +00:00
|
|
|
|
|
|
|
kube-proxy sets up two IAM roles, one for the master called
|
2015-09-19 16:53:19 +00:00
|
|
|
[kubernetes-master](../../cluster/aws/templates/iam/kubernetes-master-policy.json)
|
2015-09-19 17:16:52 +00:00
|
|
|
and one for the nodes called
|
2015-09-19 16:53:19 +00:00
|
|
|
[kubernetes-minion](../../cluster/aws/templates/iam/kubernetes-minion-policy.json).
|
2015-07-28 18:18:50 +00:00
|
|
|
|
|
|
|
The master is responsible for creating ELBs and configuring them, as well as
|
|
|
|
setting up advanced VPC routing. Currently it has blanket permissions on EC2,
|
|
|
|
along with rights to create and destroy ELBs.
|
|
|
|
|
2015-09-19 17:16:52 +00:00
|
|
|
The nodes do not need a lot of access to the AWS APIs. They need to download
|
2015-09-19 16:53:19 +00:00
|
|
|
a distribution file, and then are responsible for attaching and detaching EBS
|
|
|
|
volumes from itself.
|
2015-07-28 18:18:50 +00:00
|
|
|
|
2015-09-19 17:16:52 +00:00
|
|
|
The node policy is relatively minimal. The master policy is probably overly
|
2015-07-28 18:18:50 +00:00
|
|
|
permissive. The security concious may want to lock-down the IAM policies
|
2015-09-19 16:53:19 +00:00
|
|
|
further ([#11936](http://issues.k8s.io/11936)).
|
2015-07-28 18:18:50 +00:00
|
|
|
|
|
|
|
We should make it easier to extend IAM permissions and also ensure that they
|
2015-09-19 16:53:19 +00:00
|
|
|
are correctly configured ([#14226](http://issues.k8s.io/14226)).
|
2015-07-28 18:18:50 +00:00
|
|
|
|
|
|
|
### Tagging
|
|
|
|
|
2015-09-19 16:53:19 +00:00
|
|
|
All AWS resources are tagged with a tag named "KuberentesCluster", with a value
|
|
|
|
that is the unique cluster-id. This tag is used to identify a particular
|
|
|
|
'instance' of Kubernetes, even if two clusters are deployed into the same VPC.
|
|
|
|
Resources are considered to belong to the same cluster if and only if they have
|
|
|
|
the same value in the tag named "KubernetesCluster". (The kube-up script is
|
|
|
|
not configured to create multiple clusters in the same VPC by default, but it
|
|
|
|
is possible to create another cluster in the same VPC.)
|
2015-07-28 18:18:50 +00:00
|
|
|
|
|
|
|
Within the AWS cloud provider logic, we filter requests to the AWS APIs to
|
2015-09-19 16:53:19 +00:00
|
|
|
match resources with our cluster tag. By filtering the requests, we ensure
|
|
|
|
that we see only our own AWS objects.
|
2015-07-28 18:18:50 +00:00
|
|
|
|
2015-09-19 16:53:19 +00:00
|
|
|
Important: If you choose not to use kube-up, you must pick a unique cluster-id
|
|
|
|
value, and ensure that all AWS resources have a tag with
|
|
|
|
`Name=KubernetesCluster,Value=<clusterid>`.
|
2015-07-28 18:18:50 +00:00
|
|
|
|
2015-09-19 16:53:19 +00:00
|
|
|
### AWS Objects
|
2015-07-28 18:18:50 +00:00
|
|
|
|
|
|
|
The kube-up script does a number of things in AWS:
|
|
|
|
|
2015-09-19 16:53:19 +00:00
|
|
|
* Creates an S3 bucket (`AWS_S3_BUCKET`) and then copies the Kubernetes distribution
|
2015-07-28 18:18:50 +00:00
|
|
|
and the salt scripts into it. They are made world-readable and the HTTP URLs
|
2015-09-19 16:53:19 +00:00
|
|
|
are passed to instances; this is how Kubernetes code gets onto the machines.
|
2015-09-19 17:16:52 +00:00
|
|
|
* Creates two IAM profiles based on templates in [cluster/aws/templates/iam](../../cluster/aws/templates/iam):
|
|
|
|
* `kubernetes-master` is used by the master
|
|
|
|
* `kubernetes-minion` is used by nodes.
|
2015-07-28 18:18:50 +00:00
|
|
|
* Creates an AWS SSH key named `kubernetes-<fingerprint>`. Fingerprint here is
|
|
|
|
the OpenSSH key fingerprint, so that multiple users can run the script with
|
2015-09-19 16:53:19 +00:00
|
|
|
different keys and their keys will not collide (with near-certainty). It will
|
2015-07-28 18:18:50 +00:00
|
|
|
use an existing key if one is found at `AWS_SSH_KEY`, otherwise it will create
|
|
|
|
one there. (With the default ubuntu images, if you have to SSH in: the user is
|
|
|
|
`ubuntu` and that user can `sudo`)
|
2015-09-19 16:53:19 +00:00
|
|
|
* Creates a VPC for use with the cluster (with a CIDR of 172.20.0.0/16) and
|
2015-07-28 18:18:50 +00:00
|
|
|
enables the `dns-support` and `dns-hostnames` options.
|
|
|
|
* Creates an internet gateway for the VPC.
|
|
|
|
* Creates a route table for the VPC, with the internet gateway as the default
|
|
|
|
route
|
|
|
|
* Creates a subnet (with a CIDR of 172.20.0.0/24) in the AZ `KUBE_AWS_ZONE`
|
2015-09-19 16:53:19 +00:00
|
|
|
(defaults to us-west-2a). Currently, each Kubernetes cluster runs in a
|
|
|
|
single AZ on AWS. Although, there are two philosophies in discussion on how to
|
2015-09-19 17:16:52 +00:00
|
|
|
achieve High Availability (HA):
|
2015-09-19 16:53:19 +00:00
|
|
|
* cluster-per-AZ: An independent cluster for each AZ, where each cluster
|
2015-09-19 17:16:52 +00:00
|
|
|
is entirely separate.
|
|
|
|
* cross-AZ-clusters: A single cluster spans multiple AZs.
|
2015-09-19 16:53:19 +00:00
|
|
|
The debate is open here, where cluster-per-AZ is discussed as more robust but
|
2015-09-19 17:16:52 +00:00
|
|
|
cross-AZ-clusters are more convenient.
|
2015-07-28 18:18:50 +00:00
|
|
|
* Associates the subnet to the route table
|
2015-09-19 17:16:52 +00:00
|
|
|
* Creates security groups for the master (`kubernetes-master-<clusterid>`)
|
|
|
|
and the nodes (`kubernetes-minion-<clusterid>`)
|
2015-09-19 16:53:19 +00:00
|
|
|
* Configures security groups so that masters and nodes can communicate. This
|
|
|
|
includes intercommunication between masters and nodes, opening SSH publicly
|
|
|
|
for both masters and nodes, and opening port 443 on the master for the HTTPS
|
|
|
|
API endpoints.
|
2015-09-19 17:16:52 +00:00
|
|
|
* Creates an EBS volume for the master of size `MASTER_DISK_SIZE` and type
|
2015-07-28 18:18:50 +00:00
|
|
|
`MASTER_DISK_TYPE`
|
2015-09-19 17:16:52 +00:00
|
|
|
* Launches a master with a fixed IP address (172.20.0.9) that is also
|
2015-09-19 16:53:19 +00:00
|
|
|
configured for the security group and all the necessary IAM credentials. An
|
|
|
|
instance script is used to pass vital configuration information to Salt. Note:
|
|
|
|
The hope is that over time we can reduce the amount of configuration
|
|
|
|
information that must be passed in this way.
|
|
|
|
* Once the instance is up, it attaches the EBS volume and sets up a manual
|
2015-07-28 18:18:50 +00:00
|
|
|
routing rule for the internal network range (`MASTER_IP_RANGE`, defaults to
|
|
|
|
10.246.0.0/24)
|
2015-09-19 16:53:19 +00:00
|
|
|
* For auto-scaling, on each nodes it creates a launch configuration and group.
|
|
|
|
The name for both is <*KUBE_AWS_INSTANCE_PREFIX*>-minion-group. The default
|
|
|
|
name is kubernetes-minion-group. The auto-scaling group has a min and max size
|
|
|
|
that are both set to NUM_MINIONS. You can change the size of the auto-scaling
|
|
|
|
group to add or remove the total number of nodes from within the AWS API or
|
|
|
|
Console. Each nodes self-configures, meaning that they come up; run Salt with
|
|
|
|
the stored configuration; connect to the master; are assigned an internal CIDR;
|
|
|
|
and then the master configures the route-table with the assigned CIDR. The
|
|
|
|
kube-up script performs a health-check on the nodes but it's a self-check that
|
|
|
|
is not required.
|
|
|
|
|
2015-07-28 18:18:50 +00:00
|
|
|
|
|
|
|
If attempting this configuration manually, I highly recommend following along
|
2015-09-19 17:16:52 +00:00
|
|
|
with the kube-up script, and being sure to tag everything with a tag with name
|
|
|
|
`KubernetesCluster` and value set to a unique cluster-id. Also, passing the
|
|
|
|
right configuration options to Salt when not using the script is tricky: the
|
|
|
|
plan here is to simplify this by having Kubernetes take on more node
|
|
|
|
configuration, and even potentially remove Salt altogether.
|
2015-07-28 18:18:50 +00:00
|
|
|
|
|
|
|
|
2015-09-19 16:53:19 +00:00
|
|
|
### Manual infrastructure creation
|
2015-07-28 18:18:50 +00:00
|
|
|
|
2015-09-19 16:53:19 +00:00
|
|
|
While this work is not yet complete, advanced users might choose to manually
|
2015-09-19 17:16:52 +00:00
|
|
|
create certain AWS objects while still making use of the kube-up script (to configure
|
2015-09-19 16:53:19 +00:00
|
|
|
Salt, for example). These objects can currently be manually created:
|
2015-07-28 18:18:50 +00:00
|
|
|
|
2015-09-19 16:53:19 +00:00
|
|
|
* Set the `AWS_S3_BUCKET` environment variable to use an existing S3 bucket.
|
|
|
|
* Set the `VPC_ID` environment variable to reuse an existing VPC.
|
|
|
|
* Set the `SUBNET_ID` environemnt variable to reuse an existing subnet.
|
|
|
|
* If your route table has a matching `KubernetesCluster` tag, it will
|
|
|
|
be reused.
|
2015-07-28 18:18:50 +00:00
|
|
|
* If your security groups are appropriately named, they will be reused.
|
|
|
|
|
2015-09-19 16:53:19 +00:00
|
|
|
Currently there is no way to do the following with kube-up:
|
|
|
|
|
|
|
|
* Use an existing AWS SSH key with an arbitrary name.
|
|
|
|
* Override the IAM credentials in a sensible way
|
|
|
|
([#14226](http://issues.k8s.io/14226)).
|
|
|
|
* Use different security group permissions.
|
|
|
|
* Configure your own auto-scaling groups.
|
2015-07-28 18:18:50 +00:00
|
|
|
|
2015-09-19 16:53:19 +00:00
|
|
|
If any of the above items apply to your situation, open an issue to request an
|
|
|
|
enhancement to the kube-up script. You should provide a complete description of
|
|
|
|
the use-case, including all the details around what you want to accomplish.
|
2015-07-28 18:18:50 +00:00
|
|
|
|
2015-09-19 16:53:19 +00:00
|
|
|
### Instance boot
|
2015-07-28 18:18:50 +00:00
|
|
|
|
|
|
|
The instance boot procedure is currently pretty complicated, primarily because
|
|
|
|
we must marshal configuration from Bash to Salt via the AWS instance script.
|
|
|
|
As we move more post-boot configuration out of Salt and into Kubernetes, we
|
|
|
|
will hopefully be able to simplify this.
|
|
|
|
|
|
|
|
When the kube-up script launches instances, it builds an instance startup
|
|
|
|
script which includes some configuration options passed to kube-up, and
|
|
|
|
concatenates some of the scripts found in the cluster/aws/templates directory.
|
|
|
|
These scripts are responsible for mounting and formatting volumes, downloading
|
2015-09-19 16:53:19 +00:00
|
|
|
Salt and Kubernetes from the S3 bucket, and then triggering Salt to actually
|
2015-07-28 18:18:50 +00:00
|
|
|
install Kubernetes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
|
|
|
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/aws_under_the_hood.md?pixel)]()
|
|
|
|
<!-- END MUNGE: GENERATED_ANALYTICS -->
|