According to AWS, the ELB healthy threshold is "Number of consecutive health check successes before declaring an EC2 instance healthy." It has an unusual interaction with Kubernetes, since all nodes will enter either an unhealthy state or a healthy state together depending on the service's healthiness as a whole.
We have observed that if our service goes down for the unhealthy threshold (which is 2 checks at 30 second intervals = 60 seconds), then the ELB will stop serving traffic to all nodes in the cluster, and will wait for the healthy threshold (currently 10 * 30 = 300 seconds) AFTER the service is restored to add back the cluster nodes, meaning it remains unreachable for an extra 300 seconds.
With the new settings, the ELB will continue to timeout dead nodes after 60 seconds, but will restore healthy nodes after 20 seconds. The minimum value for healthyThreshold is 2, and the minimum value for interval is 5 seconds. I went for 10 seconds instead of the minimum sort of arbitrarily because I was not sure how much this value may affect the scalability of clusters in EC2, as it does put some extra load on the kube-proxy.
We decided to remove this test, as there's no way to get an upper bound
on its running time. Etcd restart behavior should be tested in
integration or e2e tests.
When job.spec.completions is nil, only
one task needs to succeed for the job to succeed,
and parallelism can be scaled freely during runtime.
Added tests.
Release Note:
This causes two minor changes to the API.
First, unset parallelism previously was defaulted to be
equal to completions. Now it always defaults to 1 if unset.
Second, having parallelism=N and completions unset would previously
be defaulted to 1 completion and N parallelism.
(this is not something we expect people to do, though)
Now, no defaulting occurs in that case, and the job's
behavior is different (any completion causes success).
We don't cope well if a PD is in multiple zones, but this is actually
fairly easy to detect. This is probably justified purely on the basis
that we never want to delete the wrong volume (DeleteDisk), but also
because this means that we now warn on creation if a disk is in multiple
zones (with the labeling admission controller).
This also means that with the scheduling predicate in place, that many
of our volume problems "go away" in practice: you still can't create or
delete a volume when it is ambiguous, but thereafter the volume will be
labeled with the zone, that will match it only to nodes with the same
zone, and then we query for the volume in that zone when we
attach/detach it.
This updates the dockertools.dockerVersion to use a semantic versioning
library to more gracefully support engine versions which include
additional version fields.
Previously, go-dockerclient's APIVersion struct was use which only
handles plain numeric x.y.z version strings. With #19675, the library
was now used on the Docker engine string, however it is possible for the
engine string to include include additional information for beta, rc, or
distro specific builds.
This PR also enables the TestDockerRuntimeVersion test which was
previously just a FIXME and updates it to pass, and be used to test the
version string that cause #20005.
This negates the need for fsouza/go-dockerclient#451, since even with
that change, if a user was running Docker 1.10.0-rc1, this would cause
the kubelet to report it as simply 1.10.0.
Components can write services during startup, which results in the ip
allocator map being updated. Since core controllers *must* succeed for
the masters to start, we should retry a few times in order to pass.
Most of the logic related to type and kind retrieval belongs in the
codec, not in the various classes. Make it explicit that the codec
should handle these details.
Factory now returns a universal Decoder and a JSONEncoder to assist code
in kubectl that needs to specifically deal with JSON serialization
(apply, merge, patch, edit, jsonpath). Add comments to indicate the
serialization is explicit in those places. These methods decode to
internal and encode to the preferred API version as previous, although
in the future they may be changed.
React to removing Codec from version interfaces and RESTMapping by
passing it in to all the places that it is needed.
In general, everything in kubectl/* needs to be ignorant of api/* unless
it deals with a concrete type - this change forces resource_printer to
accept interface abstractions (that are already part of kubectl).
Pass down into the server initialization the necessary interface for
handling client/server content type negotiation. Add integration tests
for the negotiation.
Remove Codec from versionInterfaces in meta (RESTMapper is now agnostic
to codec and serialization). Register api/latest.Codecs as the codec
factory and use latest.Codecs.LegacyCodec(version) as an equvialent to
the previous codec.
A NegotiatedSerializer is passed into the API installer (and
ParameterCodec, which abstracts conversion of query params) that can be
used to negotiate client/server request/response serialization. All
error paths are now negotiation aware, and are at least minimally
version aware.
Watch is specially coded to only allow application/json - a follow up
change will convert it to use negotiation.
Ensure the swagger scheme will include supported serializations - this
now includes application/yaml as a negotiated option.
Add a recognizer that is capable of sniffing content type from data by
asking each serializer to try to decode - this is for a "universal
decoder/deserializer" which can be used by client logic.
Add codec factory, which provides the core primitives for content type
negotiation. Codec factories depend only on schemes, serializers, and
groupversion pairs.
Break Codec into two general purpose interfaces, Encoder and Decoder,
and move parameter codec responsibilities to ParameterCodec.
Make unversioned types explicit when registering - these types go
through conversion without modification.
Switch to use "__internal" instead of "" to represent the internal
version. Future commits will also add group defaulting (so that "" is
expanded internally into a known group version, and only cleared during
set).
For embedded types like runtime.Object -> runtime.RawExtension, put the
responsibility on the caller of Decode/Encode to handle transformation
into destination serialization. Future commits will expand RawExtension
and Unknown to accept a content encoding as well as bytes.
Make Unknown a bit more powerful and use it to carry unrecognized types.
This removes a panic I mistakenly introduced when an instance is not
found, and also restores the exact prior behaviour for
getInstanceByName, where it returns cloudprovider.InstanceNotFound when
the instance is not found.
- Ignore the "not found" error on deletion.
- Recognize the "already exists" error on creation and check if the existing
pod meets requirement. If so, don't report an error.
- Immediately create a mirror pod after a successful deletion, if needed.
We adapt the existing code to work across all zones in a region.
We require a feature-flag to enable Ubernetes-Lite
Reasons:
* There are some behavioural changes if users create volumes with
the same name in two zones.
* We don't want to make one API call per zone if we're not running
Ubernetes-Lite.
* Ubernetes-Lite is still experimental.
There isn't a parallel flag implemented for AWS, because at the moment
there would be no behaviour changes from this.
Used like:
var pod *api.Pod
err := client.RetryOnConflict(client.DefaultBackoff, func() (err error) {
pod, err = c.Pods("mynamespace").UpdateStatus(update)
return
})
// err may be conflict
This address a TODO when collecting the node version information so it
will properly report the configured runtime and its version. Previously,
this was hardcoded to "docker://" and the docker version, and would show
"docker://1.9.1" even when the kubelet was configured to use rkt.
With this change, it will use the runtime's Type() and Version() data.
This also changes the container.Runtime interface to add an APIVersion()
method. This can be used when the runtime has separate versions for the
engine and the API, such as with Docker. The Docker minimum version
validation has been updated to use APIVersion(), and
DockerManager.Version() now returns the engine version.
findInstancesByNodeNames was a simple loop around
findInstanceByNodeName, which made an EC2 API call for each call.
We've had trouble with this sort of behaviour hitting EC2 rate limits on
bigger clusters (e.g. #11979).
Instead, change this method to fetch _all_ the tagged EC2 instances, and
then loop through the local results. This is one API call (modulo
paging).
We are currently only using findInstancesByNodeNames for the load
balancer, where we attach every node, so we were fetching all but one of
the instances anyway.
Issue #11979
Support a desired replica count of 0 for the new RC. Users sometimes
want to roll out a new "inactive" template with the intent of scaling
it up manually later.
Current BuildConfigFromFlags() seems to print a log like below when
config path isn't specified:
```
W0120 15:30:06.196820 13323 client_config.go:359] error creating
inClusterConfig, falling back to default config: %vunable to load
in-cluster configuration, KUBERNETES_SERVICE_HOST and
KUBERNETES_SERVICE_PORT must be defined
```
This commit removes the needless "%v"
We can either fix it here or at every callsite. Every callsite is
currently using this method incorrectly.
Signed-off-by: Mike Danese <mikedanese@google.com>
It makes more sense for `ValidatePositiveField` and
`ValidatePositiveQuantity` methods to be named `ValidateNonnegativeField`
and `ValidateNonnegativeQuantity` as that is what is truly being
checked. This commit simply updates the method names everywhere they are
used.
This is part of migrating kubelet configuration to the componentconfig api
group and is preliminary to retrofitting client configuration and
implementing full fledged API group mechinary.
Signed-off-by: Mike Danese <mikedanese@google.com>
Add a `/stats/summary` endpoint to the kubelet which will return an
empty Summary{} struct (json formatted), as a summary API
placeholder. Once the next cAdvisor release is vendored, the summary
API will be filled in.
We will have the rigth formula to generate correct maxCIDRs now.
Previous code assume cluster CIDR is /8 which may not be true.
Now it generates maxCIDR based on the info of cluster IP.
This cache will be used to stores the PodStatus of all pods/containers
visible on the node. This will elimiate the need for pod workers to query the
container runtime directly.
Add `kube-reserved` and `system-reserved` flags for configuration
reserved resources for usage outside of kubernetes pods. Allocatable is
provided by the Kubelet according to the formula:
```
Allocatable = Capacity - KubeReserved - SystemReserved
```
Also provides a method for estimating a reasonable default for
`KubeReserved`, but the current implementation probably is low and needs
more tuning.
Currently, pleg would report a event if a container transitions from running to
exited between relisting. However, if would not report any event if a container
gets stopped and removed between relisting. This event will eventually be
handled when the pod syncs periodically, but this is undesirable. This change
ensures that we detect all such events.