mirror of https://github.com/k3s-io/k3s
Merge pull request #25162 from mikebrow/devel-tree-80col-updates-C
Automatic merge from submit-queue devel/ tree further minor edits Address line wrap issue #1488. Also cleans up other minor editing issues in the docs/devel/* tree such as spelling errors, links, content tables... Signed-off-by: Mike Brown <brownwm@us.ibm.com>pull/6/head
commit
7339b7c094
|
@ -31,34 +31,62 @@ Documentation for other releases can be found at
|
|||
<!-- END STRIP_FOR_RELEASE -->
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
GitHub Issues for the Kubernetes Project
|
||||
========================================
|
||||
|
||||
A quick overview of how we will review and prioritize incoming issues at https://github.com/kubernetes/kubernetes/issues
|
||||
## GitHub Issues for the Kubernetes Project
|
||||
|
||||
Priorities
|
||||
----------
|
||||
A quick overview of how we will review and prioritize incoming issues at
|
||||
https://github.com/kubernetes/kubernetes/issues
|
||||
|
||||
We use GitHub issue labels for prioritization. The absence of a
|
||||
priority label means the bug has not been reviewed and prioritized
|
||||
yet.
|
||||
### Priorities
|
||||
|
||||
We try to apply these priority labels consistently across the entire project, but if you notice an issue that you believe to be misprioritized, please do let us know and we will evaluate your counter-proposal.
|
||||
We use GitHub issue labels for prioritization. The absence of a priority label
|
||||
means the bug has not been reviewed and prioritized yet.
|
||||
|
||||
- **priority/P0**: Must be actively worked on as someone's top priority right now. Stuff is burning. If it's not being actively worked on, someone is expected to drop what they're doing immediately to work on it. TL's of teams are responsible for making sure that all P0's in their area are being actively worked on. Examples include user-visible bugs in core features, broken builds or tests and critical security issues.
|
||||
- **priority/P1**: Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
|
||||
- **priority/P2**: There appears to be general agreement that this would be good to have, but we don't have anyone available to work on it right now or in the immediate future. Community contributions would be most welcome in the mean time (although it might take a while to get them reviewed if reviewers are fully occupied with higher priority issues, for example immediately before a release).
|
||||
- **priority/P3**: Possibly useful, but not yet enough support to actually get it done. These are mostly place-holders for potentially good ideas, so that they don't get completely forgotten, and can be referenced/deduped every time they come up.
|
||||
We try to apply these priority labels consistently across the entire project,
|
||||
but if you notice an issue that you believe to be incorrectly prioritized,
|
||||
please do let us know and we will evaluate your counter-proposal.
|
||||
|
||||
Milestones
|
||||
----------
|
||||
- **priority/P0**: Must be actively worked on as someone's top priority right
|
||||
now. Stuff is burning. If it's not being actively worked on, someone is expected
|
||||
to drop what they're doing immediately to work on it. Team leaders are
|
||||
responsible for making sure that all P0's in their area are being actively
|
||||
worked on. Examples include user-visible bugs in core features, broken builds or
|
||||
tests and critical security issues.
|
||||
|
||||
We additionally use milestones, based on minor version, for determining if a bug should be fixed for the next release. These milestones will be especially scrutinized as we get to the weeks just before a release. We can release a new version of Kubernetes once they are empty. We will have two milestones per minor release.
|
||||
- **priority/P1**: Must be staffed and worked on either currently, or very soon,
|
||||
ideally in time for the next release.
|
||||
|
||||
- **priority/P2**: There appears to be general agreement that this would be good
|
||||
to have, but we may not have anyone available to work on it right now or in the
|
||||
immediate future. Community contributions would be most welcome in the mean time
|
||||
(although it might take a while to get them reviewed if reviewers are fully
|
||||
occupied with higher priority issues, for example immediately before a release).
|
||||
|
||||
- **priority/P3**: Possibly useful, but not yet enough support to actually get
|
||||
it done. These are mostly place-holders for potentially good ideas, so that they
|
||||
don't get completely forgotten, and can be referenced/deduped every time they
|
||||
come up.
|
||||
|
||||
### Milestones
|
||||
|
||||
We additionally use milestones, based on minor version, for determining if a bug
|
||||
should be fixed for the next release. These milestones will be especially
|
||||
scrutinized as we get to the weeks just before a release. We can release a new
|
||||
version of Kubernetes once they are empty. We will have two milestones per minor
|
||||
release.
|
||||
|
||||
- **vX.Y**: The list of bugs that will be merged for that milestone once ready.
|
||||
- **vX.Y-candidate**: The list of bug that we might merge for that milestone. A bug shouldn't be in this milestone for moe than a day or two towards the end of a milestone. It should be triaged either into vX.Y, or moved out of the release milestones.
|
||||
|
||||
The above priority scheme still applies, so P0 and P1 bugs are work we feel must get done before release, while P2 and P3 represent work we would merge into the release if it gets done, but we wouldn't block the release on it. A few days before release, we will probably move all P2 and P3 bugs out of that milestone tag in bulk.
|
||||
- **vX.Y-candidate**: The list of bug that we might merge for that milestone. A
|
||||
bug shouldn't be in this milestone for more than a day or two towards the end of
|
||||
a milestone. It should be triaged either into vX.Y, or moved out of the release
|
||||
milestones.
|
||||
|
||||
The above priority scheme still applies. P0 and P1 issues are work we feel must
|
||||
get done before release. P2 and P3 issues are work we would merge into the
|
||||
release if it gets done, but we wouldn't block the release on it. A few days
|
||||
before release, we will probably move all P2 and P3 bugs out of that milestone
|
||||
in bulk.
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/issues.md?pixel)]()
|
||||
|
|
|
@ -32,14 +32,14 @@ Documentation for other releases can be found at
|
|||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
Kubectl Conventions
|
||||
===================
|
||||
# Kubectl Conventions
|
||||
|
||||
Updated: 8/27/2015
|
||||
|
||||
**Table of Contents**
|
||||
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
||||
|
||||
- [Kubectl Conventions](#kubectl-conventions)
|
||||
- [Principles](#principles)
|
||||
- [Command conventions](#command-conventions)
|
||||
- [Create commands](#create-commands)
|
||||
|
@ -54,45 +54,89 @@ Updated: 8/27/2015
|
|||
## Principles
|
||||
|
||||
* Strive for consistency across commands
|
||||
|
||||
* Explicit should always override implicit
|
||||
|
||||
* Environment variables should override default values
|
||||
|
||||
* Command-line flags should override default values and environment variables
|
||||
* `--namespace` should also override the value specified in a specified resource
|
||||
|
||||
* `--namespace` should also override the value specified in a specified
|
||||
resource
|
||||
|
||||
## Command conventions
|
||||
|
||||
* Command names are all lowercase, and hyphenated if multiple words.
|
||||
|
||||
* kubectl VERB NOUNs for commands that apply to multiple resource types.
|
||||
|
||||
* Command itself should not have built-in aliases.
|
||||
* NOUNs may be specified as `TYPE name1 name2` or `TYPE/name1 TYPE/name2` or `TYPE1,TYPE2,TYPE3/name1`; TYPE is omitted when only a single type is expected.
|
||||
* Resource types are all lowercase, with no hyphens; both singular and plural forms are accepted.
|
||||
* NOUNs may also be specified by one or more file arguments: `-f file1 -f file2 ...`
|
||||
|
||||
* NOUNs may be specified as `TYPE name1 name2` or `TYPE/name1 TYPE/name2` or
|
||||
`TYPE1,TYPE2,TYPE3/name1`; TYPE is omitted when only a single type is expected.
|
||||
|
||||
* Resource types are all lowercase, with no hyphens; both singular and plural
|
||||
forms are accepted.
|
||||
|
||||
* NOUNs may also be specified by one or more file arguments: `-f file1 -f file2
|
||||
...`
|
||||
|
||||
* Resource types may have 2- or 3-letter aliases.
|
||||
* Business logic should be decoupled from the command framework, so that it can be reused independently of kubectl, cobra, etc.
|
||||
* Ideally, commonly needed functionality would be implemented server-side in order to avoid problems typical of "fat" clients and to make it readily available to non-Go clients.
|
||||
* Commands that generate resources, such as `run` or `expose`, should obey specific conventions, see [generators](#generators).
|
||||
* A command group (e.g., `kubectl config`) may be used to group related non-standard commands, such as custom generators, mutations, and computations.
|
||||
|
||||
* Business logic should be decoupled from the command framework, so that it can
|
||||
be reused independently of kubectl, cobra, etc.
|
||||
* Ideally, commonly needed functionality would be implemented server-side in
|
||||
order to avoid problems typical of "fat" clients and to make it readily
|
||||
available to non-Go clients.
|
||||
|
||||
* Commands that generate resources, such as `run` or `expose`, should obey
|
||||
specific conventions, see [generators](#generators).
|
||||
|
||||
* A command group (e.g., `kubectl config`) may be used to group related
|
||||
non-standard commands, such as custom generators, mutations, and computations.
|
||||
|
||||
|
||||
### Create commands
|
||||
|
||||
`kubectl create <resource>` commands fill the gap between "I want to try Kubernetes, but I don't know or care what gets created" (`kubectl run`) and "I want to create exactly this" (author yaml and run `kubectl create -f`).
|
||||
They provide an easy way to create a valid object without having to know the vagaries of particular kinds, nested fields, and object key typos that are ignored by the yaml/json parser.
|
||||
Because editing an already created object is easier than authoring one from scratch, these commands only need to have enough parameters to create a valid object and set common immutable fields. It should default as much as is reasonably possible.
|
||||
Once that valid object is created, it can be further manipulated using `kubectl edit` or the eventual `kubectl set` commands.
|
||||
`kubectl create <resource>` commands fill the gap between "I want to try
|
||||
Kubernetes, but I don't know or care what gets created" (`kubectl run`) and "I
|
||||
want to create exactly this" (author yaml and run `kubectl create -f`). They
|
||||
provide an easy way to create a valid object without having to know the vagaries
|
||||
of particular kinds, nested fields, and object key typos that are ignored by the
|
||||
yaml/json parser. Because editing an already created object is easier than
|
||||
authoring one from scratch, these commands only need to have enough parameters
|
||||
to create a valid object and set common immutable fields. It should default as
|
||||
much as is reasonably possible. Once that valid object is created, it can be
|
||||
further manipulated using `kubectl edit` or the eventual `kubectl set` commands.
|
||||
|
||||
`kubectl create <resource> <special-case>` commands help in cases where you need to perform non-trivial configuration generation/transformation tailored for a common use case.
|
||||
`kubectl create secret` is a good example, there's a `generic` flavor with keys mapping to files, then there's a `docker-registry` flavor that is tailored for creating an image pull secret,
|
||||
and there's a `tls` flavor for creating tls secrets. You create these as separate commands to get distinct flags and separate help that is tailored for the particular usage.
|
||||
`kubectl create <resource> <special-case>` commands help in cases where you need
|
||||
to perform non-trivial configuration generation/transformation tailored for a
|
||||
common use case. `kubectl create secret` is a good example, there's a `generic`
|
||||
flavor with keys mapping to files, then there's a `docker-registry` flavor that
|
||||
is tailored for creating an image pull secret, and there's a `tls` flavor for
|
||||
creating tls secrets. You create these as separate commands to get distinct
|
||||
flags and separate help that is tailored for the particular usage.
|
||||
|
||||
|
||||
## Flag conventions
|
||||
|
||||
* Flags are all lowercase, with words separated by hyphens
|
||||
* Flag names and single-character aliases should have the same meaning across all commands
|
||||
* Command-line flags corresponding to API fields should accept API enums exactly (e.g., `--restart=Always`)
|
||||
* Do not reuse flags for different semantic purposes, and do not use different flag names for the same semantic purpose -- grep for `"Flags()"` before adding a new flag
|
||||
* Use short flags sparingly, only for the most frequently used options, prefer lowercase over uppercase for the most common cases, try to stick to well known conventions for UNIX commands and/or Docker, where they exist, and update this list when adding new short flags
|
||||
|
||||
* Flag names and single-character aliases should have the same meaning across
|
||||
all commands
|
||||
|
||||
* Command-line flags corresponding to API fields should accept API enums
|
||||
exactly (e.g., `--restart=Always`)
|
||||
|
||||
* Do not reuse flags for different semantic purposes, and do not use different
|
||||
flag names for the same semantic purpose -- grep for `"Flags()"` before adding a
|
||||
new flag
|
||||
|
||||
* Use short flags sparingly, only for the most frequently used options, prefer
|
||||
lowercase over uppercase for the most common cases, try to stick to well known
|
||||
conventions for UNIX commands and/or Docker, where they exist, and update this
|
||||
list when adding new short flags
|
||||
|
||||
* `-f`: Resource file
|
||||
* also used for `--follow` in `logs`, but should be deprecated in favor of `-F`
|
||||
* `-l`: Label selector
|
||||
|
@ -111,51 +155,116 @@ and there's a `tls` flavor for creating tls secrets. You create these as separa
|
|||
* `-r`: Replicas
|
||||
* `-u`: Unix socket
|
||||
* `-v`: Verbose logging level
|
||||
* `--dry-run`: Don't modify the live state; simulate the mutation and display the output. All mutations should support it.
|
||||
* `--local`: Don't contact the server; just do local read, transformation, generation, etc., and display the output
|
||||
|
||||
|
||||
* `--dry-run`: Don't modify the live state; simulate the mutation and display
|
||||
the output. All mutations should support it.
|
||||
|
||||
* `--local`: Don't contact the server; just do local read, transformation,
|
||||
generation, etc., and display the output
|
||||
|
||||
* `--output-version=...`: Convert the output to a different API group/version
|
||||
|
||||
* `--validate`: Validate the resource schema
|
||||
|
||||
## Output conventions
|
||||
|
||||
* By default, output is intended for humans rather than programs
|
||||
* However, affordances are made for simple parsing of `get` output
|
||||
|
||||
* Only errors should be directed to stderr
|
||||
|
||||
* `get` commands should output one row per resource, and one resource per row
|
||||
* Column titles and values should not contain spaces in order to facilitate commands that break lines into fields: cut, awk, etc. Instead, use `-` as the word separator.
|
||||
|
||||
* Column titles and values should not contain spaces in order to facilitate
|
||||
commands that break lines into fields: cut, awk, etc. Instead, use `-` as the
|
||||
word separator.
|
||||
|
||||
* By default, `get` output should fit within about 80 columns
|
||||
|
||||
* Eventually we could perhaps auto-detect width
|
||||
* `-o wide` may be used to display additional columns
|
||||
* The first column should be the resource name, titled `NAME` (may change this to an abbreviation of resource type)
|
||||
* NAMESPACE should be displayed as the first column when --all-namespaces is specified
|
||||
|
||||
|
||||
* The first column should be the resource name, titled `NAME` (may change this
|
||||
to an abbreviation of resource type)
|
||||
|
||||
* NAMESPACE should be displayed as the first column when --all-namespaces is
|
||||
specified
|
||||
|
||||
* The last default column should be time since creation, titled `AGE`
|
||||
* `-Lkey` should append a column containing the value of label with key `key`, with `<none>` if not present
|
||||
* json, yaml, Go template, and jsonpath template formats should be supported and encouraged for subsequent processing
|
||||
* Users should use --api-version or --output-version to ensure the output uses the version they expect
|
||||
* `describe` commands may output on multiple lines and may include information from related resources, such as events. Describe should add additional information from related resources that a normal user may need to know - if a user would always run "describe resource1" and the immediately want to run a "get type2" or "describe resource2", consider including that info. Examples, persistent volume claims for pods that reference claims, events for most resources, nodes and the pods scheduled on them. When fetching related resources, a targeted field selector should be used in favor of client side filtering of related resources.
|
||||
* For fields that can be explicitly unset (booleans, integers, structs), the output should say `<unset>`. Likewise, for arrays `<none>` should be used. Lastly `<unknown>` should be used where unrecognized field type was specified.
|
||||
* Mutations should output TYPE/name verbed by default, where TYPE is singular; `-o name` may be used to just display TYPE/name, which may be used to specify resources in other commands
|
||||
|
||||
* `-Lkey` should append a column containing the value of label with key `key`,
|
||||
with `<none>` if not present
|
||||
|
||||
* json, yaml, Go template, and jsonpath template formats should be supported
|
||||
and encouraged for subsequent processing
|
||||
|
||||
* Users should use --api-version or --output-version to ensure the output
|
||||
uses the version they expect
|
||||
|
||||
|
||||
* `describe` commands may output on multiple lines and may include information
|
||||
from related resources, such as events. Describe should add additional
|
||||
information from related resources that a normal user may need to know - if a
|
||||
user would always run "describe resource1" and the immediately want to run a
|
||||
"get type2" or "describe resource2", consider including that info. Examples,
|
||||
persistent volume claims for pods that reference claims, events for most
|
||||
resources, nodes and the pods scheduled on them. When fetching related
|
||||
resources, a targeted field selector should be used in favor of client side
|
||||
filtering of related resources.
|
||||
|
||||
* For fields that can be explicitly unset (booleans, integers, structs), the
|
||||
output should say `<unset>`. Likewise, for arrays `<none>` should be used.
|
||||
Lastly `<unknown>` should be used where unrecognized field type was specified.
|
||||
|
||||
* Mutations should output TYPE/name verbed by default, where TYPE is singular;
|
||||
`-o name` may be used to just display TYPE/name, which may be used to specify
|
||||
resources in other commands
|
||||
|
||||
## Documentation conventions
|
||||
|
||||
* Commands are documented using Cobra; docs are then auto-generated by `hack/update-generated-docs.sh`.
|
||||
* Use should contain a short usage string for the most common use case(s), not an exhaustive specification
|
||||
* Commands are documented using Cobra; docs are then auto-generated by
|
||||
`hack/update-generated-docs.sh`.
|
||||
|
||||
* Use should contain a short usage string for the most common use case(s), not
|
||||
an exhaustive specification
|
||||
|
||||
* Short should contain a one-line explanation of what the command does
|
||||
* Long may contain multiple lines, including additional information about input, output, commonly used flags, etc.
|
||||
|
||||
* Long may contain multiple lines, including additional information about
|
||||
input, output, commonly used flags, etc.
|
||||
|
||||
* Example should contain examples
|
||||
* Start commands with `$`
|
||||
* A comment should precede each example command, and should begin with `#`
|
||||
|
||||
|
||||
* Use "FILENAME" for filenames
|
||||
* Use "TYPE" for the particular flavor of resource type accepted by kubectl, rather than "RESOURCE" or "KIND"
|
||||
|
||||
* Use "TYPE" for the particular flavor of resource type accepted by kubectl,
|
||||
rather than "RESOURCE" or "KIND"
|
||||
|
||||
* Use "NAME" for resource names
|
||||
|
||||
## Command implementation conventions
|
||||
|
||||
For every command there should be a `NewCmd<CommandName>` function that creates the command and returns a pointer to a `cobra.Command`, which can later be added to other parent commands to compose the structure tree. There should also be a `<CommandName>Config` struct with a variable to every flag and argument declared by the command (and any other variable required for the command to run). This makes tests and mocking easier. The struct ideally exposes three methods:
|
||||
For every command there should be a `NewCmd<CommandName>` function that creates
|
||||
the command and returns a pointer to a `cobra.Command`, which can later be added
|
||||
to other parent commands to compose the structure tree. There should also be a
|
||||
`<CommandName>Config` struct with a variable to every flag and argument declared
|
||||
by the command (and any other variable required for the command to run). This
|
||||
makes tests and mocking easier. The struct ideally exposes three methods:
|
||||
|
||||
* `Complete`: Completes the struct fields with values that may or may not be directly provided by the user, for example, by flags pointers, by the `args` slice, by using the Factory, etc.
|
||||
* `Validate`: performs validation on the struct fields and returns appropriate errors.
|
||||
* `Run<CommandName>`: runs the actual logic of the command, taking as assumption that the struct is complete with all required values to run, and they are valid.
|
||||
* `Complete`: Completes the struct fields with values that may or may not be
|
||||
directly provided by the user, for example, by flags pointers, by the `args`
|
||||
slice, by using the Factory, etc.
|
||||
|
||||
* `Validate`: performs validation on the struct fields and returns appropriate
|
||||
errors.
|
||||
|
||||
* `Run<CommandName>`: runs the actual logic of the command, taking as assumption
|
||||
that the struct is complete with all required values to run, and they are valid.
|
||||
|
||||
Sample command skeleton:
|
||||
|
||||
|
@ -221,19 +330,41 @@ func (o MineConfig) RunMine() error {
|
|||
}
|
||||
```
|
||||
|
||||
The `Run<CommandName>` method should contain the business logic of the command and as noted in [command conventions](#command-conventions), ideally that logic should exist server-side so any client could take advantage of it. Notice that this is not a mandatory structure and not every command is implemented this way, but this is a nice convention so try to be compliant with it. As an example, have a look at how [kubectl logs](../../pkg/kubectl/cmd/logs.go) is implemented.
|
||||
The `Run<CommandName>` method should contain the business logic of the command
|
||||
and as noted in [command conventions](#command-conventions), ideally that logic
|
||||
should exist server-side so any client could take advantage of it. Notice that
|
||||
this is not a mandatory structure and not every command is implemented this way,
|
||||
but this is a nice convention so try to be compliant with it. As an example,
|
||||
have a look at how [kubectl logs](../../pkg/kubectl/cmd/logs.go) is implemented.
|
||||
|
||||
## Generators
|
||||
|
||||
Generators are kubectl commands that generate resources based on a set of inputs (other resources, flags, or a combination of both).
|
||||
Generators are kubectl commands that generate resources based on a set of inputs
|
||||
(other resources, flags, or a combination of both).
|
||||
|
||||
The point of generators is:
|
||||
* to enable users using kubectl in a scripted fashion to pin to a particular behavior which may change in the future. Explicit use of a generator will always guarantee that the expected behavior stays the same.
|
||||
* to enable potential expansion of the generated resources for scenarios other than just creation, similar to how -f is supported for most general-purpose commands.
|
||||
|
||||
* to enable users using kubectl in a scripted fashion to pin to a particular
|
||||
behavior which may change in the future. Explicit use of a generator will always
|
||||
guarantee that the expected behavior stays the same.
|
||||
|
||||
* to enable potential expansion of the generated resources for scenarios other
|
||||
than just creation, similar to how -f is supported for most general-purpose
|
||||
commands.
|
||||
|
||||
Generator commands shoud obey to the following conventions:
|
||||
* A `--generator` flag should be defined. Users then can choose between different generators, if the command supports them (for example, `kubectl run` currently supports generators for pods, jobs, replication controllers, and deployments), or between different versions of a generator so that users depending on a specific behavior may pin to that version (for example, `kubectl expose` currently supports two different versions of a service generator).
|
||||
* Generation should be decoupled from creation. A generator should implement the `kubectl.StructuredGenerator` interface and have no dependencies on cobra or the Factory. See, for example, how the first version of the namespace generator is defined:
|
||||
|
||||
* A `--generator` flag should be defined. Users then can choose between
|
||||
different generators, if the command supports them (for example, `kubectl run`
|
||||
currently supports generators for pods, jobs, replication controllers, and
|
||||
deployments), or between different versions of a generator so that users
|
||||
depending on a specific behavior may pin to that version (for example, `kubectl
|
||||
expose` currently supports two different versions of a service generator).
|
||||
|
||||
* Generation should be decoupled from creation. A generator should implement the
|
||||
`kubectl.StructuredGenerator` interface and have no dependencies on cobra or the
|
||||
Factory. See, for example, how the first version of the namespace generator is
|
||||
defined:
|
||||
|
||||
```go
|
||||
// NamespaceGeneratorV1 supports stable generation of a namespace
|
||||
|
@ -264,8 +395,14 @@ func (g *NamespaceGeneratorV1) validate() error {
|
|||
}
|
||||
```
|
||||
|
||||
The generator struct (`NamespaceGeneratorV1`) holds the necessary fields for namespace generation. It also satisfies the `kubectl.StructuredGenerator` interface by implementing the `StructuredGenerate() (runtime.Object, error)` method which configures the generated namespace that callers of the generator (`kubectl create namespace` in our case) need to create.
|
||||
* `--dry-run` should output the resource that would be created, without creating it.
|
||||
The generator struct (`NamespaceGeneratorV1`) holds the necessary fields for
|
||||
namespace generation. It also satisfies the `kubectl.StructuredGenerator`
|
||||
interface by implementing the `StructuredGenerate() (runtime.Object, error)`
|
||||
method which configures the generated namespace that callers of the generator
|
||||
(`kubectl create namespace` in our case) need to create.
|
||||
|
||||
* `--dry-run` should output the resource that would be created, without
|
||||
creating it.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
|
|
@ -36,27 +36,37 @@ Documentation for other releases can be found at
|
|||
|
||||
## Introduction
|
||||
|
||||
Kubemark is a performance testing tool which allows users to run experiments on simulated clusters. The primary use case is scalability testing, as simulated clusters can be
|
||||
much bigger than the real ones. The objective is to expose problems with the master components (API server, controller manager or scheduler) that appear only on bigger
|
||||
clusters (e.g. small memory leaks).
|
||||
Kubemark is a performance testing tool which allows users to run experiments on
|
||||
simulated clusters. The primary use case is scalability testing, as simulated
|
||||
clusters can be much bigger than the real ones. The objective is to expose
|
||||
problems with the master components (API server, controller manager or
|
||||
scheduler) that appear only on bigger clusters (e.g. small memory leaks).
|
||||
|
||||
This document serves as a primer to understand what Kubemark is, what it is not, and how to use it.
|
||||
This document serves as a primer to understand what Kubemark is, what it is not,
|
||||
and how to use it.
|
||||
|
||||
## Architecture
|
||||
|
||||
On a very high level Kubemark cluster consists of two parts: real master components and a set of “Hollow” Nodes. The prefix “Hollow” means an implementation/instantiation of a
|
||||
component with all “moving” parts mocked out. The best example is HollowKubelet, which pretends to be an ordinary Kubelet, but does not start anything, nor mount any volumes -
|
||||
it just lies it does. More detailed design and implementation details are at the end of this document.
|
||||
On a very high level Kubemark cluster consists of two parts: real master
|
||||
components and a set of “Hollow” Nodes. The prefix “Hollow” means an
|
||||
implementation/instantiation of a component with all “moving” parts mocked out.
|
||||
The best example is HollowKubelet, which pretends to be an ordinary Kubelet, but
|
||||
does not start anything, nor mount any volumes - it just lies it does. More
|
||||
detailed design and implementation details are at the end of this document.
|
||||
|
||||
Currently master components run on a dedicated machine(s), and HollowNodes run on an ‘external’ Kubernetes cluster. This design has a slight advantage, over running master
|
||||
components on external cluster, of completely isolating master resources from everything else.
|
||||
Currently master components run on a dedicated machine(s), and HollowNodes run
|
||||
on an ‘external’ Kubernetes cluster. This design has a slight advantage, over
|
||||
running master components on external cluster, of completely isolating master
|
||||
resources from everything else.
|
||||
|
||||
## Requirements
|
||||
|
||||
To run Kubemark you need a Kubernetes cluster for running all your HollowNodes and a dedicated machine for a master. Master machine has to be directly routable from
|
||||
HollowNodes. You also need an access to some Docker repository.
|
||||
To run Kubemark you need a Kubernetes cluster for running all your HollowNodes
|
||||
and a dedicated machine for a master. Master machine has to be directly routable
|
||||
from HollowNodes. You also need an access to some Docker repository.
|
||||
|
||||
Currently scripts are written to be easily usable by GCE, but it should be relatively straightforward to port them to different providers or bare metal.
|
||||
Currently scripts are written to be easily usable by GCE, but it should be
|
||||
relatively straightforward to port them to different providers or bare metal.
|
||||
|
||||
## Common use cases and helper scripts
|
||||
|
||||
|
@ -66,71 +76,116 @@ Common workflow for Kubemark is:
|
|||
- monitoring test execution and debugging problems
|
||||
- turning down Kubemark cluster
|
||||
|
||||
Included in descrptions there will be comments helpful for anyone who’ll want to port Kubemark to different providers.
|
||||
Included in descrptions there will be comments helpful for anyone who’ll want to
|
||||
port Kubemark to different providers.
|
||||
|
||||
### Starting a Kubemark cluster
|
||||
|
||||
To start a Kubemark cluster on GCE you need to create an external cluster (it can be GCE, GKE or any other cluster) by yourself, build a kubernetes release (e.g. by running
|
||||
`make quick-release`) and run `test/kubemark/start-kubemark.sh` script. This script will create a VM for master components, Pods for HollowNodes and do all the setup necessary
|
||||
to let them talk to each other. It will use the configuration stored in `cluster/kubemark/config-default.sh` - you can tweak it however you want, but note that some features
|
||||
may not be implemented yet, as implementation of Hollow components/mocks will probably be lagging behind ‘real’ one. For performance tests interesting variables are
|
||||
`NUM_NODES` and `MASTER_SIZE`. After start-kubemark script is finished you’ll have a ready Kubemark cluster, a kubeconfig file for talking to the Kubemark
|
||||
cluster is stored in `test/kubemark/kubeconfig.loc`.
|
||||
To start a Kubemark cluster on GCE you need to create an external cluster (it
|
||||
can be GCE, GKE or any other cluster) by yourself, build a kubernetes release
|
||||
(e.g. by running `make quick-release`) and run `test/kubemark/start-kubemark.sh`
|
||||
script. This script will create a VM for master components, Pods for HollowNodes
|
||||
and do all the setup necessary to let them talk to each other. It will use the
|
||||
configuration stored in `cluster/kubemark/config-default.sh` - you can tweak it
|
||||
however you want, but note that some features may not be implemented yet, as
|
||||
implementation of Hollow components/mocks will probably be lagging behind ‘real’
|
||||
one. For performance tests interesting variables are `NUM_NODES` and
|
||||
`MASTER_SIZE`. After start-kubemark script is finished you’ll have a ready
|
||||
Kubemark cluster, a kubeconfig file for talking to the Kubemark cluster is
|
||||
stored in `test/kubemark/kubeconfig.loc`.
|
||||
|
||||
Currently we're running HollowNode with limit of 0.05 a CPU core and ~60MB or memory, which taking into account default cluster addons and fluentD running on an 'external'
|
||||
cluster, allows running ~17.5 HollowNodes per core.
|
||||
Currently we're running HollowNode with limit of 0.05 a CPU core and ~60MB or
|
||||
memory, which taking into account default cluster addons and fluentD running on
|
||||
an 'external' cluster, allows running ~17.5 HollowNodes per core.
|
||||
|
||||
#### Behind the scene details:
|
||||
|
||||
Start-kubemark script does quite a lot of things:
|
||||
- Creates a master machine called hollow-cluster-master and PD for it (*uses gcloud, should be easy to do outside of GCE*)
|
||||
- Creates a firewall rule which opens port 443\* on the master machine (*uses gcloud, should be easy to do outside of GCE*)
|
||||
- Builds a Docker image for HollowNode from the current repository and pushes it to the Docker repository (*GCR for us, using scripts from `cluster/gce/util.sh` - it may get
|
||||
tricky outside of GCE*)
|
||||
- Generates certificates and kubeconfig files, writes a kubeconfig locally to `test/kubemark/kubeconfig.loc` and creates a Secret which stores kubeconfig for HollowKubelet/
|
||||
HollowProxy use (*used gcloud to transfer files to Master, should be easy to do outside of GCE*).
|
||||
- Creates a ReplicationController for HollowNodes and starts them up. (*will work exactly the same everywhere as long as MASTER_IP will be populated correctly, but you’ll need
|
||||
to update docker image address if you’re not using GCR and default image name*)
|
||||
- Waits until all HollowNodes are in the Running phase (*will work exactly the same everywhere*)
|
||||
|
||||
<sub>\* Port 443 is a secured port on the master machine which is used for all external communication with the API server. In the last sentence *external* means all traffic
|
||||
coming from other machines, including all the Nodes, not only from outside of the cluster. Currently local components, i.e. ControllerManager and Scheduler talk with API server using insecure port 8080.</sub>
|
||||
- Creates a master machine called hollow-cluster-master and PD for it (*uses
|
||||
gcloud, should be easy to do outside of GCE*)
|
||||
|
||||
- Creates a firewall rule which opens port 443\* on the master machine (*uses
|
||||
gcloud, should be easy to do outside of GCE*)
|
||||
|
||||
- Builds a Docker image for HollowNode from the current repository and pushes it
|
||||
to the Docker repository (*GCR for us, using scripts from
|
||||
`cluster/gce/util.sh` - it may get tricky outside of GCE*)
|
||||
|
||||
- Generates certificates and kubeconfig files, writes a kubeconfig locally to
|
||||
`test/kubemark/kubeconfig.loc` and creates a Secret which stores kubeconfig for
|
||||
HollowKubelet/HollowProxy use (*used gcloud to transfer files to Master, should
|
||||
be easy to do outside of GCE*).
|
||||
|
||||
- Creates a ReplicationController for HollowNodes and starts them up. (*will
|
||||
work exactly the same everywhere as long as MASTER_IP will be populated
|
||||
correctly, but you’ll need to update docker image address if you’re not using
|
||||
GCR and default image name*)
|
||||
|
||||
- Waits until all HollowNodes are in the Running phase (*will work exactly the
|
||||
same everywhere*)
|
||||
|
||||
<sub>\* Port 443 is a secured port on the master machine which is used for all
|
||||
external communication with the API server. In the last sentence *external*
|
||||
means all traffic coming from other machines, including all the Nodes, not only
|
||||
from outside of the cluster. Currently local components, i.e. ControllerManager
|
||||
and Scheduler talk with API server using insecure port 8080.</sub>
|
||||
|
||||
### Running e2e tests on Kubemark cluster
|
||||
|
||||
To run standard e2e test on your Kubemark cluster created in the previous step you execute `test/kubemark/run-e2e-tests.sh` script. It will configure ginkgo to
|
||||
use Kubemark cluster instead of something else and start an e2e test. This script should not need any changes to work on other cloud providers.
|
||||
To run standard e2e test on your Kubemark cluster created in the previous step
|
||||
you execute `test/kubemark/run-e2e-tests.sh` script. It will configure ginkgo to
|
||||
use Kubemark cluster instead of something else and start an e2e test. This
|
||||
script should not need any changes to work on other cloud providers.
|
||||
|
||||
By default (if nothing will be passed to it) the script will run a Density '30 test. If you want to run a different e2e test you just need to provide flags you want to be
|
||||
passed to `hack/ginkgo-e2e.sh` script, e.g. `--ginkgo.focus="Load"` to run the Load test.
|
||||
By default (if nothing will be passed to it) the script will run a Density '30
|
||||
test. If you want to run a different e2e test you just need to provide flags you want to be
|
||||
passed to `hack/ginkgo-e2e.sh` script, e.g. `--ginkgo.focus="Load"` to run the
|
||||
Load test.
|
||||
|
||||
By default, at the end of each test, it will delete namespaces and everything under it (e.g. events, replication controllers) on Kubemark master, which takes a lot of time.
|
||||
Such work aren't needed in most cases: if you delete your Kubemark cluster after running `run-e2e-tests.sh`;
|
||||
you don't care about namespace deletion performance, specifically related to etcd; etc.
|
||||
There is a flag that enables you to avoid namespace deletion: `--delete-namespace=false`.
|
||||
Adding the flag should let you see in logs: `Found DeleteNamespace=false, skipping namespace deletion!`
|
||||
By default, at the end of each test, it will delete namespaces and everything
|
||||
under it (e.g. events, replication controllers) on Kubemark master, which takes
|
||||
a lot of time. Such work aren't needed in most cases: if you delete your
|
||||
Kubemark cluster after running `run-e2e-tests.sh`; you don't care about
|
||||
namespace deletion performance, specifically related to etcd; etc. There is a
|
||||
flag that enables you to avoid namespace deletion: `--delete-namespace=false`.
|
||||
Adding the flag should let you see in logs: `Found DeleteNamespace=false,
|
||||
skipping namespace deletion!`
|
||||
|
||||
### Monitoring test execution and debugging problems
|
||||
|
||||
Run-e2e-tests prints the same output on Kubemark as on ordinary e2e cluster, but if you need to dig deeper you need to learn how to debug HollowNodes and how Master
|
||||
machine (currently) differs from the ordinary one.
|
||||
Run-e2e-tests prints the same output on Kubemark as on ordinary e2e cluster, but
|
||||
if you need to dig deeper you need to learn how to debug HollowNodes and how
|
||||
Master machine (currently) differs from the ordinary one.
|
||||
|
||||
If you need to debug master machine you can do similar things as you do on your ordinary master. The difference between Kubemark setup and ordinary setup is that in Kubemark
|
||||
etcd is run as a plain docker container, and all master components are run as normal processes. There’s no Kubelet overseeing them. Logs are stored in exactly the same place,
|
||||
i.e. `/var/logs/` directory. Because binaries are not supervised by anything they won't be restarted in the case of a crash.
|
||||
If you need to debug master machine you can do similar things as you do on your
|
||||
ordinary master. The difference between Kubemark setup and ordinary setup is
|
||||
that in Kubemark etcd is run as a plain docker container, and all master
|
||||
components are run as normal processes. There’s no Kubelet overseeing them. Logs
|
||||
are stored in exactly the same place, i.e. `/var/logs/` directory. Because
|
||||
binaries are not supervised by anything they won't be restarted in the case of a
|
||||
crash.
|
||||
|
||||
To help you with debugging from inside the cluster startup script puts a `~/configure-kubectl.sh` script on the master. It downloads `gcloud` and `kubectl` tool and configures
|
||||
kubectl to work on unsecured master port (useful if there are problems with security). After the script is run you can use kubectl command from the master machine to play with
|
||||
the cluster.
|
||||
To help you with debugging from inside the cluster startup script puts a
|
||||
`~/configure-kubectl.sh` script on the master. It downloads `gcloud` and
|
||||
`kubectl` tool and configures kubectl to work on unsecured master port (useful
|
||||
if there are problems with security). After the script is run you can use
|
||||
kubectl command from the master machine to play with the cluster.
|
||||
|
||||
Debugging HollowNodes is a bit more tricky, as if you experience a problem on one of them you need to learn which hollow-node pod corresponds to a given HollowNode known by
|
||||
the Master. During self-registeration HollowNodes provide their cluster IPs as Names, which means that if you need to find a HollowNode named `10.2.4.5` you just need to find a
|
||||
Pod in external cluster with this cluster IP. There’s a helper script `test/kubemark/get-real-pod-for-hollow-node.sh` that does this for you.
|
||||
Debugging HollowNodes is a bit more tricky, as if you experience a problem on
|
||||
one of them you need to learn which hollow-node pod corresponds to a given
|
||||
HollowNode known by the Master. During self-registeration HollowNodes provide
|
||||
their cluster IPs as Names, which means that if you need to find a HollowNode
|
||||
named `10.2.4.5` you just need to find a Pod in external cluster with this
|
||||
cluster IP. There’s a helper script
|
||||
`test/kubemark/get-real-pod-for-hollow-node.sh` that does this for you.
|
||||
|
||||
When you have a Pod name you can use `kubectl logs` on external cluster to get logs, or use a `kubectl describe pod` call to find an external Node on which this particular
|
||||
HollowNode is running so you can ssh to it.
|
||||
When you have a Pod name you can use `kubectl logs` on external cluster to get
|
||||
logs, or use a `kubectl describe pod` call to find an external Node on which
|
||||
this particular HollowNode is running so you can ssh to it.
|
||||
|
||||
E.g. you want to see the logs of HollowKubelet on which pod `my-pod` is running. To do so you can execute:
|
||||
E.g. you want to see the logs of HollowKubelet on which pod `my-pod` is running.
|
||||
To do so you can execute:
|
||||
|
||||
```
|
||||
$ kubectl kubernetes/test/kubemark/kubeconfig.loc describe pod my-pod
|
||||
|
@ -142,7 +197,8 @@ Which outputs pod description and among it a line:
|
|||
Node: 1.2.3.4/1.2.3.4
|
||||
```
|
||||
|
||||
To learn the `hollow-node` pod corresponding to node `1.2.3.4` you use aforementioned script:
|
||||
To learn the `hollow-node` pod corresponding to node `1.2.3.4` you use
|
||||
aforementioned script:
|
||||
|
||||
```
|
||||
$ kubernetes/test/kubemark/get-real-pod-for-hollow-node.sh 1.2.3.4
|
||||
|
@ -164,17 +220,23 @@ All those things should work exactly the same on all cloud providers.
|
|||
|
||||
### Turning down Kubemark cluster
|
||||
|
||||
On GCE you just need to execute `test/kubemark/stop-kubemark.sh` script, which will delete HollowNode ReplicationController and all the resources for you. On other providers
|
||||
you’ll need to delete all this stuff by yourself.
|
||||
On GCE you just need to execute `test/kubemark/stop-kubemark.sh` script, which
|
||||
will delete HollowNode ReplicationController and all the resources for you. On
|
||||
other providers you’ll need to delete all this stuff by yourself.
|
||||
|
||||
## Some current implementation details
|
||||
|
||||
Kubemark master uses exactly the same binaries as ordinary Kubernetes does. This means that it will never be out of date. On the other hand HollowNodes use existing fake for
|
||||
Kubelet (called SimpleKubelet), which mocks its runtime manager with `pkg/kubelet/fake-docker-manager.go`, where most logic sits. Because there’s no easy way of mocking other
|
||||
managers (e.g. VolumeManager), they are not supported in Kubemark (e.g. we can’t schedule Pods with volumes in them yet).
|
||||
Kubemark master uses exactly the same binaries as ordinary Kubernetes does. This
|
||||
means that it will never be out of date. On the other hand HollowNodes use
|
||||
existing fake for Kubelet (called SimpleKubelet), which mocks its runtime
|
||||
manager with `pkg/kubelet/fake-docker-manager.go`, where most logic sits.
|
||||
Because there’s no easy way of mocking other managers (e.g. VolumeManager), they
|
||||
are not supported in Kubemark (e.g. we can’t schedule Pods with volumes in them
|
||||
yet).
|
||||
|
||||
As the time passes more fakes will probably be plugged into HollowNodes, but it’s crucial to make it as simple as possible to allow running a big number of Hollows on a single
|
||||
core.
|
||||
As the time passes more fakes will probably be plugged into HollowNodes, but
|
||||
it’s crucial to make it as simple as possible to allow running a big number of
|
||||
Hollows on a single core.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
|
|
@ -31,13 +31,17 @@ Documentation for other releases can be found at
|
|||
<!-- END STRIP_FOR_RELEASE -->
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
Logging Conventions
|
||||
===================
|
||||
|
||||
The following conventions for the glog levels to use. [glog](http://godoc.org/github.com/golang/glog) is globally preferred to [log](http://golang.org/pkg/log/) for better runtime control.
|
||||
## Logging Conventions
|
||||
|
||||
The following conventions for the glog levels to use.
|
||||
[glog](http://godoc.org/github.com/golang/glog) is globally preferred to
|
||||
[log](http://golang.org/pkg/log/) for better runtime control.
|
||||
|
||||
* glog.Errorf() - Always an error
|
||||
|
||||
* glog.Warningf() - Something unexpected, but probably not an error
|
||||
|
||||
* glog.Infof() has multiple levels:
|
||||
* glog.V(0) - Generally useful for this to ALWAYS be visible to an operator
|
||||
* Programmer errors
|
||||
|
@ -56,7 +60,9 @@ The following conventions for the glog levels to use. [glog](http://godoc.org/g
|
|||
* glog.V(4) - Debug level verbosity (for now)
|
||||
* Logging in particularly thorny parts of code where you may want to come back later and check it
|
||||
|
||||
As per the comments, the practical default level is V(2). Developers and QE environments may wish to run at V(3) or V(4). If you wish to change the log level, you can pass in `-v=X` where X is the desired maximum level to log.
|
||||
As per the comments, the practical default level is V(2). Developers and QE
|
||||
environments may wish to run at V(3) or V(4). If you wish to change the log
|
||||
level, you can pass in `-v=X` where X is the desired maximum level to log.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
|
|
|
@ -38,10 +38,14 @@ This documents the process for making release notes for a release.
|
|||
|
||||
### 1) Note the PR number of the previous release
|
||||
|
||||
Find the most-recent PR that was merged with the previous .0 release. Remember this as $LASTPR.
|
||||
_TODO_: Figure out a way to record this somewhere to save the next release engineer time.
|
||||
Find the most-recent PR that was merged with the previous .0 release. Remember
|
||||
this as $LASTPR.
|
||||
|
||||
Find the most-recent PR that was merged with the current .0 release. Remember this as $CURRENTPR.
|
||||
- _TODO_: Figure out a way to record this somewhere to save the next
|
||||
release engineer time.
|
||||
|
||||
Find the most-recent PR that was merged with the current .0 release. Remember
|
||||
this as $CURRENTPR.
|
||||
|
||||
### 2) Run the release-notes tool
|
||||
|
||||
|
@ -52,7 +56,7 @@ ${KUBERNETES_ROOT}/build/make-release-notes.sh $LASTPR $CURRENTPR
|
|||
### 3) Trim the release notes
|
||||
|
||||
This generates a list of the entire set of PRs merged since the last minor
|
||||
release. It is likely long and many PRs aren't worth mentioning. If any of the
|
||||
release. It is likely long and many PRs aren't worth mentioning. If any of the
|
||||
PRs were cherrypicked into patches on the last minor release, you should exclude
|
||||
them from the current release's notes.
|
||||
|
||||
|
@ -67,9 +71,13 @@ With the final markdown all set, cut and paste it to the top of `CHANGELOG.md`
|
|||
|
||||
### 5) Update the Release page
|
||||
|
||||
* Switch to the [releases](https://github.com/kubernetes/kubernetes/releases) page.
|
||||
* Switch to the [releases](https://github.com/kubernetes/kubernetes/releases)
|
||||
page.
|
||||
|
||||
* Open up the release you are working on.
|
||||
|
||||
* Cut and paste the final markdown from above into the release notes
|
||||
|
||||
* Press Save.
|
||||
|
||||
|
||||
|
|
|
@ -36,129 +36,207 @@ Documentation for other releases can be found at
|
|||
|
||||
## Introduction
|
||||
|
||||
We have observed two different cluster management architectures, which can be categorized as "Borg-style" and "Mesos/Omega-style."
|
||||
(In the remainder of this document, we will abbreviate the latter as "Mesos-style.")
|
||||
Although out-of-the box Kubernetes uses a Borg-style architecture, it can also be configured in a Mesos-style architecture,
|
||||
and in fact can support both styles at the same time. This document describes the two approaches and describes how
|
||||
to deploy a Mesos-style architecture on Kubernetes.
|
||||
We have observed two different cluster management architectures, which can be
|
||||
categorized as "Borg-style" and "Mesos/Omega-style." In the remainder of this
|
||||
document, we will abbreviate the latter as "Mesos-style." Although out-of-the
|
||||
box Kubernetes uses a Borg-style architecture, it can also be configured in a
|
||||
Mesos-style architecture, and in fact can support both styles at the same time.
|
||||
This document describes the two approaches and describes how to deploy a
|
||||
Mesos-style architecture on Kubernetes.
|
||||
|
||||
(As an aside, the converse is also true: one can deploy a Borg/Kubernetes-style architecture on Mesos.)
|
||||
As an aside, the converse is also true: one can deploy a Borg/Kubernetes-style
|
||||
architecture on Mesos.
|
||||
|
||||
This document is NOT intended to provide a comprehensive comparison of Borg and Mesos. For example, we omit discussion
|
||||
of the tradeoffs between scheduling with full knowledge of cluster state vs. scheduling using the "offer" model.
|
||||
(That issue is discussed in some detail in the Omega paper (see references section at the end of this doc).)
|
||||
This document is NOT intended to provide a comprehensive comparison of Borg and
|
||||
Mesos. For example, we omit discussion of the tradeoffs between scheduling with
|
||||
full knowledge of cluster state vs. scheduling using the "offer" model. That
|
||||
issue is discussed in some detail in the Omega paper.
|
||||
(See [references](#references) below.)
|
||||
|
||||
|
||||
## What is a Borg-style architecture?
|
||||
|
||||
A Borg-style architecture is characterized by:
|
||||
* a single logical API endpoint for clients, where some amount of processing is done on requests, such as admission control and applying defaults
|
||||
* generic (non-application-specific) collection abstractions described declaratively,
|
||||
* generic controllers/state machines that manage the lifecycle of the collection abstractions and the containers spawned from them
|
||||
|
||||
* a single logical API endpoint for clients, where some amount of processing is
|
||||
done on requests, such as admission control and applying defaults
|
||||
|
||||
* generic (non-application-specific) collection abstractions described
|
||||
declaratively,
|
||||
|
||||
* generic controllers/state machines that manage the lifecycle of the collection
|
||||
abstractions and the containers spawned from them
|
||||
|
||||
* a generic scheduler
|
||||
|
||||
For example, Borg's primary collection abstraction is a Job, and every application that runs on Borg--whether it's a user-facing
|
||||
service like the GMail front-end, a batch job like a MapReduce, or an infrastructure service like GFS--must represent itself as
|
||||
a Job. Borg has corresponding state machine logic for managing Jobs and their instances, and a scheduler that's responsible
|
||||
for assigning the instances to machines.
|
||||
For example, Borg's primary collection abstraction is a Job, and every
|
||||
application that runs on Borg--whether it's a user-facing service like the GMail
|
||||
front-end, a batch job like a MapReduce, or an infrastructure service like
|
||||
GFS--must represent itself as a Job. Borg has corresponding state machine logic
|
||||
for managing Jobs and their instances, and a scheduler that's responsible for
|
||||
assigning the instances to machines.
|
||||
|
||||
The flow of a request in Borg is:
|
||||
|
||||
1. Client submits a collection object to the Borgmaster API endpoint
|
||||
1. Admission control, quota, applying defaults, etc. run on the collection
|
||||
1. If the collection is admitted, it is persisted, and the collection state machine creates the underlying instances
|
||||
1. The scheduler assigns a hostname to the instance, and tells the Borglet to start the instance's container(s)
|
||||
1. Borglet starts the container(s)
|
||||
1. The instance state machine manages the instances and the collection state machine manages the collection during their lifetimes
|
||||
|
||||
Out-of-the-box Kubernetes has *workload-specific* abstractions (ReplicaSet, Job, DaemonSet, etc.) and corresponding controllers,
|
||||
and in the future may have [workload-specific schedulers](../../docs/proposals/multiple-schedulers.md),
|
||||
e.g. different schedulers for long-running services vs. short-running batch. But these abstractions, controllers, and
|
||||
schedulers are not *application-specific*.
|
||||
1. Admission control, quota, applying defaults, etc. run on the collection
|
||||
|
||||
1. If the collection is admitted, it is persisted, and the collection state
|
||||
machine creates the underlying instances
|
||||
|
||||
1. The scheduler assigns a hostname to the instance, and tells the Borglet to
|
||||
start the instance's container(s)
|
||||
|
||||
1. Borglet starts the container(s)
|
||||
|
||||
1. The instance state machine manages the instances and the collection state
|
||||
machine manages the collection during their lifetimes
|
||||
|
||||
Out-of-the-box Kubernetes has *workload-specific* abstractions (ReplicaSet, Job,
|
||||
DaemonSet, etc.) and corresponding controllers, and in the future may have
|
||||
[workload-specific schedulers](../../docs/proposals/multiple-schedulers.md),
|
||||
e.g. different schedulers for long-running services vs. short-running batch. But
|
||||
these abstractions, controllers, and schedulers are not *application-specific*.
|
||||
|
||||
The usual request flow in Kubernetes is very similar, namely
|
||||
|
||||
1. Client submits a collection object (e.g. ReplicaSet, Job, ...) to the API server
|
||||
1. Admission control, quota, applying defaults, etc. run on the collection
|
||||
1. If the collection is admitted, it is persisted, and the corresponding collection controller creates the underlying pods
|
||||
1. Admission control, quota, applying defaults, etc. runs on each pod; if there are multiple schedulers, one of the admission
|
||||
controllers will write the scheduler name as an annotation based on a policy
|
||||
1. If a pod is admitted, it is persisted
|
||||
1. The appropriate scheduler assigns a nodeName to the instance, which triggers the Kubelet to start the pod's container(s)
|
||||
1. Kubelet starts the container(s)
|
||||
1. The controller corresponding to the collection manages the pod and the collection during their lifetime
|
||||
1. Client submits a collection object (e.g. ReplicaSet, Job, ...) to the API
|
||||
server
|
||||
|
||||
In the Borg model, application-level scheduling and cluster-level scheduling are handled by separate
|
||||
components. For example, a MapReduce master might request Borg to create a job with a certain number of instances
|
||||
with a particular resource shape, where each instance corresponds to a MapReduce worker; the MapReduce master would
|
||||
then schedule individual units of work onto those workers.
|
||||
1. Admission control, quota, applying defaults, etc. run on the collection
|
||||
|
||||
1. If the collection is admitted, it is persisted, and the corresponding
|
||||
collection controller creates the underlying pods
|
||||
|
||||
1. Admission control, quota, applying defaults, etc. runs on each pod; if there
|
||||
are multiple schedulers, one of the admission controllers will write the
|
||||
scheduler name as an annotation based on a policy
|
||||
|
||||
1. If a pod is admitted, it is persisted
|
||||
|
||||
1. The appropriate scheduler assigns a nodeName to the instance, which triggers
|
||||
the Kubelet to start the pod's container(s)
|
||||
|
||||
1. Kubelet starts the container(s)
|
||||
|
||||
1. The controller corresponding to the collection manages the pod and the
|
||||
collection during their lifetime
|
||||
|
||||
In the Borg model, application-level scheduling and cluster-level scheduling are
|
||||
handled by separate components. For example, a MapReduce master might request
|
||||
Borg to create a job with a certain number of instances with a particular
|
||||
resource shape, where each instance corresponds to a MapReduce worker; the
|
||||
MapReduce master would then schedule individual units of work onto those
|
||||
workers.
|
||||
|
||||
## What is a Mesos-style architecture?
|
||||
|
||||
Mesos is fundamentally designed to support multiple application-specific "frameworks." A framework is
|
||||
composed of a "framework scheduler" and a "framework executor." We will abbreviate "framework scheduler"
|
||||
as "framework" since "scheduler" means something very different in Kubernetes (something that just
|
||||
assigns pods to nodes).
|
||||
Mesos is fundamentally designed to support multiple application-specific
|
||||
"frameworks." A framework is composed of a "framework scheduler" and a
|
||||
"framework executor." We will abbreviate "framework scheduler" as "framework"
|
||||
since "scheduler" means something very different in Kubernetes (something that
|
||||
just assigns pods to nodes).
|
||||
|
||||
Unlike Borg and Kubernetes, where there is a single logical endpoint that receives all API requests (the Borgmaster and API server,
|
||||
respectively), in Mesos every framework is a separate API endpoint. Mesos does not have any standard set of
|
||||
collection abstractions, controllers/state machines, or schedulers; the logic for all of these things is contained
|
||||
in each [application-specific framework](http://mesos.apache.org/documentation/latest/frameworks/) individually.
|
||||
(Note that the notion of application-specific does sometimes blur into the realm of workload-specific,
|
||||
for example [Chronos](https://github.com/mesos/chronos) is a generic framework for batch jobs.
|
||||
However, regardless of what set of Mesos frameworks you are using, the key properties remain: each
|
||||
framework is its own API endpoint with its own client-facing and internal abstractions, state machines, and scheduler).
|
||||
Unlike Borg and Kubernetes, where there is a single logical endpoint that
|
||||
receives all API requests (the Borgmaster and API server, respectively), in
|
||||
Mesos every framework is a separate API endpoint. Mesos does not have any
|
||||
standard set of collection abstractions, controllers/state machines, or
|
||||
schedulers; the logic for all of these things is contained in each
|
||||
[application-specific framework](http://mesos.apache.org/documentation/latest/frameworks/)
|
||||
individually. (Note that the notion of application-specific does sometimes blur
|
||||
into the realm of workload-specific, for example
|
||||
[Chronos](https://github.com/mesos/chronos) is a generic framework for batch
|
||||
jobs. However, regardless of what set of Mesos frameworks you are using, the key
|
||||
properties remain: each framework is its own API endpoint with its own
|
||||
client-facing and internal abstractions, state machines, and scheduler).
|
||||
|
||||
A Mesos framework can integrate application-level scheduling and cluster-level scheduling into a single component.
|
||||
A Mesos framework can integrate application-level scheduling and cluster-level
|
||||
scheduling into a single component.
|
||||
|
||||
Note: Although Mesos frameworks expose their own API endpoints to clients, they consume a common
|
||||
infrastructure via a common API endpoint for controlling tasks (launching, detecting failure, etc.) and learning about available
|
||||
cluster resources. More details [here](http://mesos.apache.org/documentation/latest/scheduler-http-api/).
|
||||
Note: Although Mesos frameworks expose their own API endpoints to clients, they
|
||||
consume a common infrastructure via a common API endpoint for controlling tasks
|
||||
(launching, detecting failure, etc.) and learning about available cluster
|
||||
resources. More details
|
||||
[here](http://mesos.apache.org/documentation/latest/scheduler-http-api/).
|
||||
|
||||
## Building a Mesos-style framework on Kubernetes
|
||||
|
||||
Implementing the Mesos model on Kubernetes boils down to enabling application-specific collection abstractions,
|
||||
controllers/state machines, and scheduling. There are just three steps:
|
||||
* Use API plugins to create API resources for your new application-specific collection abstraction(s)
|
||||
* Implement controllers for the new abstractions (and for managing the lifecycle of the pods the controllers generate)
|
||||
Implementing the Mesos model on Kubernetes boils down to enabling
|
||||
application-specific collection abstractions, controllers/state machines, and
|
||||
scheduling. There are just three steps:
|
||||
|
||||
* Use API plugins to create API resources for your new application-specific
|
||||
collection abstraction(s)
|
||||
|
||||
* Implement controllers for the new abstractions (and for managing the lifecycle
|
||||
of the pods the controllers generate)
|
||||
|
||||
* Implement a scheduler with the application-specific scheduling logic
|
||||
|
||||
Note that the last two can be combined: a Kubernetes controller can do the scheduling for the pods it creates,
|
||||
by writing node name to the pods when it creates them.
|
||||
Note that the last two can be combined: a Kubernetes controller can do the
|
||||
scheduling for the pods it creates, by writing node name to the pods when it
|
||||
creates them.
|
||||
|
||||
Once you've done this, you end up with an architecture that is extremely similar to the Mesos-style--the
|
||||
Kubernetes controller is effectively a Mesos framework. The remaining differences are
|
||||
* In Kubernetes, all API operations go through a single logical endpoint, the API server (we say logical because the API server can be replicated).
|
||||
In contrast, in Mesos, API operations go to a particular framework. However, the Kubernetes API plugin model makes this difference fairly small.
|
||||
* In Kubernetes, application-specific admission control, quota, defaulting, etc. rules can be implemented
|
||||
in the API server rather than in the controller. Of course you can choose to make these operations be no-ops for
|
||||
your application-specific collection abstractions, and handle them in your controller.
|
||||
* On the node level, Mesos allows application-specific executors, whereas Kubernetes only has
|
||||
executors for Docker and rkt containers.
|
||||
Once you've done this, you end up with an architecture that is extremely similar
|
||||
to the Mesos-style--the Kubernetes controller is effectively a Mesos framework.
|
||||
The remaining differences are:
|
||||
|
||||
The end-to-end flow is
|
||||
* In Kubernetes, all API operations go through a single logical endpoint, the
|
||||
API server (we say logical because the API server can be replicated). In
|
||||
contrast, in Mesos, API operations go to a particular framework. However, the
|
||||
Kubernetes API plugin model makes this difference fairly small.
|
||||
|
||||
* In Kubernetes, application-specific admission control, quota, defaulting, etc.
|
||||
rules can be implemented in the API server rather than in the controller. Of
|
||||
course you can choose to make these operations be no-ops for your
|
||||
application-specific collection abstractions, and handle them in your controller.
|
||||
|
||||
* On the node level, Mesos allows application-specific executors, whereas
|
||||
Kubernetes only has executors for Docker and rkt containers.
|
||||
|
||||
The end-to-end flow is:
|
||||
|
||||
1. Client submits an application-specific collection object to the API server
|
||||
2. The API server plugin for that collection object forwards the request to the API server that handles that collection type
|
||||
3. Admission control, quota, applying defaults, etc. runs on the collection object
|
||||
|
||||
2. The API server plugin for that collection object forwards the request to the
|
||||
API server that handles that collection type
|
||||
|
||||
3. Admission control, quota, applying defaults, etc. runs on the collection
|
||||
object
|
||||
|
||||
4. If the collection is admitted, it is persisted
|
||||
5. The collection controller sees the collection object and in response creates the underlying pods and chooses which nodes they will run on by setting node name
|
||||
|
||||
5. The collection controller sees the collection object and in response creates
|
||||
the underlying pods and chooses which nodes they will run on by setting node
|
||||
name
|
||||
|
||||
6. Kubelet sees the pods with node name set and starts the container(s)
|
||||
7. The collection controller manages the pods and the collection during their lifetimes
|
||||
|
||||
(note that if the controller and scheduler are separated, then step 5 breaks down into multiple steps:
|
||||
(5a) collection controller creates pods with empty node name. (5b) API server admission control, quota, defaulting,
|
||||
etc. runs on the pods; one of the admission controller steps writes the scheduler name as an annotation on each pods
|
||||
(see #18262 for more details).
|
||||
(5c) The corresponding application-specific scheduler chooses a node and writes node name, which triggers the Kubelet to start the pod's container(s).)
|
||||
7. The collection controller manages the pods and the collection during their
|
||||
lifetimes
|
||||
|
||||
As a final note, the Kubernetes model allows multiple levels of iterative refinement of runtime abstractions,
|
||||
as long as the lowest level is the pod. For example, clients of application Foo might create a `FooSet`
|
||||
which is picked up by the FooController which in turn creates `BatchFooSet` and `ServiceFooSet` objects,
|
||||
which are picked up by the BatchFoo controller and ServiceFoo controller respectively, which in turn
|
||||
create pods. In between each of these steps there is an opportunity for object-specific admission control,
|
||||
quota, and defaulting to run in the API server, though these can instead be handled by the controllers.
|
||||
*Note: if the controller and scheduler are separated, then step 5 breaks
|
||||
down into multiple steps:*
|
||||
|
||||
(5a) collection controller creates pods with empty node name.
|
||||
|
||||
(5b) API server admission control, quota, defaulting, etc. runs on the
|
||||
pods; one of the admission controller steps writes the scheduler name as an
|
||||
annotation on each pods (see pull request `#18262` for more details).
|
||||
|
||||
(5c) The corresponding application-specific scheduler chooses a node and
|
||||
writes node name, which triggers the Kubelet to start the pod's container(s).
|
||||
|
||||
As a final note, the Kubernetes model allows multiple levels of iterative
|
||||
refinement of runtime abstractions, as long as the lowest level is the pod. For
|
||||
example, clients of application Foo might create a `FooSet` which is picked up
|
||||
by the FooController which in turn creates `BatchFooSet` and `ServiceFooSet`
|
||||
objects, which are picked up by the BatchFoo controller and ServiceFoo
|
||||
controller respectively, which in turn create pods. In between each of these
|
||||
steps there is an opportunity for object-specific admission control, quota, and
|
||||
defaulting to run in the API server, though these can instead be handled by the
|
||||
controllers.
|
||||
|
||||
## References
|
||||
|
||||
|
|
Loading…
Reference in New Issue