mirror of https://github.com/k3s-io/k3s
194 lines
6.5 KiB
Markdown
194 lines
6.5 KiB
Markdown
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
|
|
|
|
<!-- BEGIN STRIP_FOR_RELEASE -->
|
|
|
|
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
|
|
width="25" height="25">
|
|
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
|
|
width="25" height="25">
|
|
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
|
|
width="25" height="25">
|
|
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
|
|
width="25" height="25">
|
|
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
|
|
width="25" height="25">
|
|
|
|
<h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
|
|
|
|
If you are using a released version of Kubernetes, you should
|
|
refer to the docs that go with that version.
|
|
|
|
<!-- TAG RELEASE_LINK, added by the munger automatically -->
|
|
<strong>
|
|
The latest release of this document can be found
|
|
[here](http://releases.k8s.io/release-1.1/docs/proposals/job.md).
|
|
|
|
Documentation for other releases can be found at
|
|
[releases.k8s.io](http://releases.k8s.io).
|
|
</strong>
|
|
--
|
|
|
|
<!-- END STRIP_FOR_RELEASE -->
|
|
|
|
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
|
|
|
# Job Controller
|
|
|
|
## Abstract
|
|
|
|
A proposal for implementing a new controller - Job controller - which will be responsible
|
|
for managing pod(s) that require running once to completion even if the machine
|
|
the pod is running on fails, in contrast to what ReplicationController currently offers.
|
|
|
|
Several existing issues and PRs were already created regarding that particular subject:
|
|
* Job Controller [#1624](https://github.com/kubernetes/kubernetes/issues/1624)
|
|
* New Job resource [#7380](https://github.com/kubernetes/kubernetes/pull/7380)
|
|
|
|
|
|
## Use Cases
|
|
|
|
1. Be able to start one or several pods tracked as a single entity.
|
|
1. Be able to run batch-oriented workloads on Kubernetes.
|
|
1. Be able to get the job status.
|
|
1. Be able to specify the number of instances performing a job at any one time.
|
|
1. Be able to specify the number of successfully finished instances required to finish a job.
|
|
|
|
|
|
## Motivation
|
|
|
|
Jobs are needed for executing multi-pod computation to completion; a good example
|
|
here would be the ability to implement any type of batch oriented tasks.
|
|
|
|
|
|
## Implementation
|
|
|
|
Job controller is similar to replication controller in that they manage pods.
|
|
This implies they will follow the same controller framework that replication
|
|
controllers already defined. The biggest difference between a `Job` and a
|
|
`ReplicationController` object is the purpose; `ReplicationController`
|
|
ensures that a specified number of Pods are running at any one time, whereas
|
|
`Job` is responsible for keeping the desired number of Pods to a completion of
|
|
a task. This difference will be represented by the `RestartPolicy` which is
|
|
required to always take value of `RestartPolicyNever` or `RestartOnFailure`.
|
|
|
|
|
|
The new `Job` object will have the following content:
|
|
|
|
```go
|
|
// Job represents the configuration of a single job.
|
|
type Job struct {
|
|
TypeMeta
|
|
ObjectMeta
|
|
|
|
// Spec is a structure defining the expected behavior of a job.
|
|
Spec JobSpec
|
|
|
|
// Status is a structure describing current status of a job.
|
|
Status JobStatus
|
|
}
|
|
|
|
// JobList is a collection of jobs.
|
|
type JobList struct {
|
|
TypeMeta
|
|
ListMeta
|
|
|
|
Items []Job
|
|
}
|
|
```
|
|
|
|
`JobSpec` structure is defined to contain all the information how the actual job execution
|
|
will look like.
|
|
|
|
```go
|
|
// JobSpec describes how the job execution will look like.
|
|
type JobSpec struct {
|
|
|
|
// Parallelism specifies the maximum desired number of pods the job should
|
|
// run at any given time. The actual number of pods running in steady state will
|
|
// be less than this number when ((.spec.completions - .status.successful) < .spec.parallelism),
|
|
// i.e. when the work left to do is less than max parallelism.
|
|
Parallelism *int
|
|
|
|
// Completions specifies the desired number of successfully finished pods the
|
|
// job should be run with. Defaults to 1.
|
|
Completions *int
|
|
|
|
// Selector is a label query over pods running a job.
|
|
Selector map[string]string
|
|
|
|
// Template is the object that describes the pod that will be created when
|
|
// executing a job.
|
|
Template *PodTemplateSpec
|
|
}
|
|
```
|
|
|
|
`JobStatus` structure is defined to contain informations about pods executing
|
|
specified job. The structure holds information about pods currently executing
|
|
the job.
|
|
|
|
```go
|
|
// JobStatus represents the current state of a Job.
|
|
type JobStatus struct {
|
|
Conditions []JobCondition
|
|
|
|
// CreationTime represents time when the job was created
|
|
CreationTime unversioned.Time
|
|
|
|
// StartTime represents time when the job was started
|
|
StartTime unversioned.Time
|
|
|
|
// CompletionTime represents time when the job was completed
|
|
CompletionTime unversioned.Time
|
|
|
|
// Active is the number of actively running pods.
|
|
Active int
|
|
|
|
// Successful is the number of pods successfully completed their job.
|
|
Successful int
|
|
|
|
// Unsuccessful is the number of pods failures, this applies only to jobs
|
|
// created with RestartPolicyNever, otherwise this value will always be 0.
|
|
Unsuccessful int
|
|
}
|
|
|
|
type JobConditionType string
|
|
|
|
// These are valid conditions of a job.
|
|
const (
|
|
// JobComplete means the job has completed its execution.
|
|
JobComplete JobConditionType = "Complete"
|
|
)
|
|
|
|
// JobCondition describes current state of a job.
|
|
type JobCondition struct {
|
|
Type JobConditionType
|
|
Status ConditionStatus
|
|
LastHeartbeatTime unversioned.Time
|
|
LastTransitionTime unversioned.Time
|
|
Reason string
|
|
Message string
|
|
}
|
|
```
|
|
|
|
## Events
|
|
|
|
Job controller will be emitting the following events:
|
|
* JobStart
|
|
* JobFinish
|
|
|
|
## Future evolution
|
|
|
|
Below are the possible future extensions to the Job controller:
|
|
* Be able to limit the execution time for a job, similarly to ActiveDeadlineSeconds for Pods. *now implemented*
|
|
* Be able to create a chain of jobs dependent one on another. *will be implemented in a separate type called Workflow*
|
|
* Be able to specify the work each of the workers should execute (see type 1 from
|
|
[this comment](https://github.com/kubernetes/kubernetes/issues/1624#issuecomment-97622142))
|
|
* Be able to inspect Pods running a Job, especially after a Job has finished, e.g.
|
|
by providing pointers to Pods in the JobStatus ([see comment](https://github.com/kubernetes/kubernetes/pull/11746/files#r37142628)).
|
|
* help users avoid non-unique label selectors ([see this proposal](../../docs/design/selector-generation.md))
|
|
|
|
|
|
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
|
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/job.md?pixel)]()
|
|
<!-- END MUNGE: GENERATED_ANALYTICS -->
|