Commit Graph

407 Commits (ddea2dd56ff76168d2597c9cea62aa110a78f81c)

Author SHA1 Message Date
Antoine Pelisse ca964a1872 Update OWNERS approvers and reviewers: pkg/cloudprovider 2017-01-17 13:42:07 -08:00
Clayton Coleman 9a2a50cda7
refactor: use metav1.ObjectMeta in other types 2017-01-17 16:17:19 -05:00
Justin Santa Barbara 99d475d3fe AWS: Fix a few log messages Info -> Infof 2017-01-16 16:27:39 -05:00
Hemant Kumar aaa56e2c56 Remove duplicate calls to describeInstance from aws
This change removes all duplicate calls to describeInstance
from aws volume code path.
2017-01-12 21:46:52 -05:00
deads2k 6a4d5cd7cc start the apimachinery repo 2017-01-11 09:09:48 -05:00
Jeff Grafton 20d221f75c Enable auto-generating sources rules 2017-01-05 14:14:13 -08:00
Kubernetes Submit Queue 6d0efbc9d6 Merge pull request #38766 from jsafrane/wait-attach-backoff
Automatic merge from submit-queue

AWS: Add exponential backoff to waitForAttachmentStatus() and createTags()

We should use exponential backoff while waiting for a volume to get attached/detached to/from a node. This will lower AWS load and reduce API call throttling.

This partly fixes #33088

@justinsb, can you please take a look?
2017-01-05 03:08:04 -08:00
Mike Danese 161c391f44 autogenerated 2016-12-29 13:04:10 -08:00
Jan Safranek 65f6bcb927 AWS: Add sequential allocator for device names.
On AWS, we should not reuse device names as long as possible, see
https://aws.amazon.com/premiumsupport/knowledge-center/ebs-stuck-attaching/

"If you specify a device name that is not in use by EC2, but is being used by
the block device driver within the EC2 instance, the attachment of the EBS
volume does not succeed and the EBS volume is stuck in the attaching state."

This patch adds a device name allocator that tries to find a name that's next
to the last used device name instead of using the first available one.
This way we will loop through all device names ("xvdba" .. "xvdzz") before
a device name is reused.
2016-12-15 17:22:19 +01:00
Jan Safranek be3fcd4383 AWS: Add exponential backoff to createTags()
We should have something more reliable than 1 second sleep
2016-12-14 16:51:54 +01:00
Jan Safranek 92e576e01c AWS: Add exponential backoff to waitForAttachmentStatus()
We should use exponential backoff while waiting for a volume to get attached/
detached to/from a node. This will lower AWS load and reduce its API call
throttling.
2016-12-14 14:00:29 +01:00
Mike Danese c87de85347 autoupdate BUILD files 2016-12-12 13:30:07 -08:00
Dhawal Patel e4bebc7c40 Remove check that forces loadBalancerSourceRanges to be 0.0.0.0/0. Also, remove check that forces service.beta.kubernetes.io/aws-load-balancer-internal annotation to be 0.0.0.0/0. Ideally, it should be a boolean, but for backward compatibility, leaving it to be a non-empty value 2016-12-12 10:53:49 -08:00
Angus Lees 8a7e103191 providers: Remove long-deprecated Instances.List()
This method has been unused by k8s for some time, and yet is the last
piece of the cloud provider API that encourages provider names to be
human-friendly strings (this method applies a regex to instance names).

Actually removing this deprecated method is part of a long effort to
migrate from instance names to instance IDs in at least the OpenStack
provider plugin.
2016-12-10 22:36:12 +11:00
Kubernetes Submit Queue 44e25b1087 Merge pull request #33570 from justinsb/aws_elb_more_logging
Automatic merge from submit-queue (batch tested with PRs 38260, 32811, 28458, 33570, 37096)

AWS: include ELB name in health-check logging
2016-12-08 02:11:24 -08:00
Kubernetes Submit Queue cffaf1b71b Merge pull request #31321 from anguslees/lb-nodes
Automatic merge from submit-queue (batch tested with PRs 37328, 38102, 37261, 31321, 38146)

Pass full Node objects to provider LoadBalancer methods
2016-12-05 20:16:53 -08:00
Clayton Coleman 3454a8d52c
refactor: update bazel, codec, and gofmt 2016-12-03 19:10:53 -05:00
Clayton Coleman 5df8cc39c9
refactor: generated 2016-12-03 19:10:46 -05:00
Angus Lees 398c62d1ff aws: Update LB API hosts->nodes
Update EnsureLoadBalancer/UpdateLoadBalancer API to use node objects.
2016-12-01 09:53:54 +11:00
Pengfei Ni f584ed4398 Fix package aliases to follow golang convention 2016-11-30 15:40:50 +08:00
Chao Xu bcc783c594 run hack/update-all.sh 2016-11-23 15:53:09 -08:00
Chao Xu c962c2602a dependencies: pkg/cloudprovider 2016-11-23 15:53:09 -08:00
Kubernetes Submit Queue f4738ff575 Merge pull request #35883 from justinsb/aws_strong_volumetype
Automatic merge from submit-queue

AWS: strong-typing for k8s vs aws volume ids
2016-11-05 02:29:17 -07:00
Justin Santa Barbara 3cdbfc98af AWS: strong-typing for k8s vs aws volume ids
We are more liberal in what we accept as a volume id in k8s, and indeed
we ourselves generate names that look like `aws://<zone>/<id>` for
dynamic volumes.

This volume id (hereafter a KubernetesVolumeID) cannot directly be
compared to an AWS volume ID (hereafter an awsVolumeID).

We introduce types for each, to prevent accidental comparison or
confusion.

Issue #35746
2016-11-02 09:42:55 -04:00
Kubernetes Submit Queue 674b770a20 Merge pull request #35066 from justinsb/typo_attachment
Automatic merge from submit-queue

Fix typo: attachement -> attachment
2016-11-02 03:07:45 -07:00
Jing Xu abbde43374 Add sync state loop in master's volume reconciler
At master volume reconciler, the information about which volumes are
attached to nodes is cached in actual state of world. However, this
information might be out of date in case that node is terminated (volume
is detached automatically). In this situation, reconciler assume volume
is still attached and will not issue attach operation when node comes
back. Pods created on those nodes will fail to mount.

This PR adds the logic to periodically sync up the truth for attached volumes kept in the actual state cache. If the volume is no longer attached to the node, the actual state will be updated to reflect the truth. In turn, reconciler will take actions if needed.

To avoid issuing many concurrent operations on cloud provider, this PR
tries to add batch operation to check whether a list of volumes are
attached to the node instead of one request per volume.

More details are explained in PR #33760
2016-10-28 09:24:53 -07:00
Mike Danese 3b6a067afc autogenerated 2016-10-21 17:32:32 -07:00
Justin Santa Barbara c53d62a554 Fix typo: attachement -> attachment 2016-10-18 17:52:48 -04:00
Doug Davis 9d5bac6330 Change minion to node
Contination of #1111

I tried to keep this PR down to just a simple search-n-replace to keep
things simple.  I may have gone too far in some spots but its easy to
roll those back if needed.

I avoided renaming `contrib/mesos/pkg/minion` because there's already
a `contrib/mesos/pkg/node` dir and fixing that will require a bit of work
due to a circular import chain that pops up. So I'm saving that for a
follow-on PR.

I rolled back some of this from a previous commit because it just got
to big/messy. Will follow up with additional PRs

Signed-off-by: Doug Davis <dug@us.ibm.com>
2016-09-28 10:53:30 -07:00
Kubernetes Submit Queue b1e8c9fc13 Merge pull request #29491 from justinsb/aws_deprecate_orempty
Automatic merge from submit-queue

AWS: Deprecate a few functions in favor of aws-sdk-go
2016-09-28 03:01:39 -07:00
Kubernetes Submit Queue c20965c652 Merge pull request #33067 from justinsb/better_aws_logging
Automatic merge from submit-queue

Better AWS logging around volumes
2016-09-28 00:20:56 -07:00
Kubernetes Submit Queue 4b4e8ad6a7 Merge pull request #33569 from justinsb/fix_31127
Automatic merge from submit-queue

AWS: Add log line when we're updating ELB attributes
2016-09-27 22:58:20 -07:00
Justin Santa Barbara 5bfd15e49e AWS: include ELB name in health-check logging
Makes more supportable
2016-09-27 11:20:42 -04:00
Justin Santa Barbara 54309acd84 AWS: Add log line when we're updating ELB attributes
We want to be sure that reflect.DeepEqual doesn't give false positives

Fix #31127
2016-09-27 11:19:19 -04:00
Justin Santa Barbara 310423a4f9 AWS: more information in volume log messages 2016-09-27 11:10:40 -04:00
Justin Santa Barbara 54195d590f Use strongly-typed types.NodeName for a node name
We had another bug where we confused the hostname with the NodeName.

To avoid this happening again, and to make the code more
self-documenting, we use types.NodeName (a typedef alias for string)
whenever we are referring to the Node.Name.

A tedious but mechanical commit therefore, to change all uses of the
node name to use types.NodeName

Also clean up some of the (many) places where the NodeName is referred
to as a hostname (not true on AWS), or an instanceID (not true on GCE),
etc.
2016-09-27 10:47:31 -04:00
Jan Safranek 9903b389b3 Update cloud providers 2016-09-15 10:33:57 +02:00
Justin Santa Barbara 3688dc4a72 AWS: More robust volume-mount poll
When we are mounting a lot of volumes, we frequently hit rate limits.

Reduce the frequency with which we poll the status; introduces a bit of
latency but probably matches common attach times pretty closely, and
avoids causing rate limit problems everywhere.

Also, we now poll for longer, as when we timeout, the volume is in an
indeterminate state: it may be about to complete.  The volume controller
can tolerate a slow attach/detach, but it is harder to tolerate the
indeterminism.

Finally, we ignore a sequence of errors in DescribeVolumes (up to 5 in a
row currently).  So we will eventually return an error, but a one
off-failure (e.g. due to rate limits) does not cause us to spuriously
fail.
2016-09-14 16:47:53 -04:00
Kubernetes Submit Queue 61dda4d34a Merge pull request #31773 from pigmej/typos_englishify_some_pkgs
Automatic merge from submit-queue

Typos and englishify pkg/cloudprovider + pkg/dns + pkg/kubectl

**What this PR does / why we need it**: Just fixed some typos + "englishify" in pkg/cloudprovider + pkg/dns + pkg/kubectl

**Which issue this PR fixes** : None

**Special notes for your reviewer**: It's just fixes typos

**Release note**: `NONE`
2016-09-05 11:10:09 -07:00
Kubernetes Submit Queue 130051b2d9 Merge pull request #31090 from justinsb/fix_29324
Automatic merge from submit-queue

AWS: fix volume device assignment race condition

* Move volume attachment map to cloud level
* Perform sanity check after volume attach, to double-check everything is right
2016-09-02 16:19:57 -07:00
Jedrzej Nowak 9e2abd4b02 Fix various typos in pgk/cloudprovider,dns,kubectl 2016-08-31 18:56:52 +02:00
Justin Santa Barbara 7e5c6877d7 Fixes per code review 2016-08-25 22:51:04 -04:00
Kubernetes Submit Queue 49ff2e8831 Merge pull request #31115 from jsafrane/add-constants
Automatic merge from submit-queue

Add constants and documentation around AWS magic numbers

Also, bumped max IOPS/GB to 50, it changed from 30 since last time I checked.

Source: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html

@kubernetes/sig-storage
2016-08-24 12:59:50 -07:00
Justin Santa Barbara 6a1f892c1d AWS: Sanity checks after volume attach
In the light of issue #29324, double check that the volume was attached
correctly where we expect it, before returning.

Issue #29324
2016-08-24 13:00:38 -04:00
Justin Santa Barbara 81240da858 AWS: move volume attachment map to cloud level
The problem is that attachments are now done on the master, and we are
only caching the attachment map persistently for the local instance.  So
there is now a race, because the attachment map is cleared every time.

Issue #29324
2016-08-24 13:00:33 -04:00
Jan Safranek 8cd5e263b8 Fix AWS reporting "The parameter KmsKeyId requires the parameter Encrypted to be set."
- use aws.String/Int/Bool functions
- don't set the key to empty string, use nil instead
2016-08-24 10:05:07 +02:00
Kubernetes Submit Queue bfafb6f425 Merge pull request #30695 from krancour/manage-elb-attributes
Automatic merge from submit-queue

AWS: More ELB attributes via service annotations

Replaces #25015 and addresses all of @justinsb's feedback therein. This is a new PR because I was unable to reopen #25015 to amend it.

I noticed recently that there is existing (but undocumented) precedent for the AWS cloud provider to manage ELB-specifc load balancer configuration based on service annotations.  In particular, one can _already_ designate an ELB as "internal" or enable PROXY protocol.

This PR extends this capability to the management of ELB attributes, which includes the following items:
* Access logs:
    * Enabled / disabled
    * Emit interval
    * S3 bucket name
    * S3 bucket prefix
* Connection draining:
    * Enabled / disabled
    * Timeout
* Connection:
    * Idle timeout
* Cross-zone load balancing:
    * Enabled / disabled

Some of these are possibly more useful than others.  Use cases that immediately come to mind:

* Enabling cross-zone load balancing is potentially useful for "Ubernetes Light," or anyone otherwise attempting to spread worker nodes around multiple AZs.
* Increasing idle timeout is useful for the benefit of anyone dealing with long-running requests. An example I personally care about would be git pushes to Deis' builder component.
2016-08-22 10:24:12 -07:00
Jan Safranek a596668de7 Add constants and documentation aroung AWS magic numbers
Also, remove check for IOPS per GB, AWS checks it on its own.
2016-08-22 15:30:47 +02:00
Kubernetes Submit Queue 364d696fd5 Merge pull request #30563 from knarz/master
Automatic merge from submit-queue

AWS: Support HTTP->HTTP mode for ELB

**What this PR does / why we need it**:

Right now it is not possible to create an AWS ELB that listens for HTTP and where the backend pod also listens for HTTP.
I asked @justinsb in slack and he said that this seems to be an oversight, so I'd like to use this PR as a step towards solving this.

**Special notes for your reviewer**:

I've only added a simple unit test. Are any integration tests needed? I'm not familiar with the code base.

cc @therc
2016-08-22 00:54:44 -07:00
markturansky 9a2645aa5e add encryption to aws provisioner and cloud provider 2016-08-18 15:42:44 -04:00
Jan Safranek 4b97db202c AWS changes for new provisioning model 2016-08-18 10:36:49 +02:00
Kent Rancourt 96dad1f0f3 Add support for managing ELB attributes with service annotations 2016-08-16 13:07:49 -04:00
Sascha Hanse 9a111fffc8 enables the aws-load-balancer-backend-protocol annotion to be used without a cert to be able to create an HTTP->HTTP ELB 2016-08-13 02:30:35 +02:00
Kubernetes Submit Queue 2537f66f0e Merge pull request #29230 from luxas/goimport
Automatic merge from submit-queue

Run goimport for the whole repo

While removing GOMAXPROC and running goimports, I noticed quite a lot of other files also needed a goimport format. Didn't commit `*.generated.go`, `*.deepcopy.go` or files in `vendor`

This is more for testing if it builds.
The only strange thing here is the gopkg.in/gcfg.v1 => github.com/scalingdata/gcfg replace.
cc @jfrazelle @thockin
2016-08-05 16:22:01 -07:00
Rohith 0da5f50b03 - fixing the spelling mistakes 2016-08-04 10:17:59 +01:00
Lucas Käldström c88a07ce1a Run goimports 2016-08-02 15:12:39 +03:00
Cole Mickens 2ebffb431d implement azure cloudprovider 2016-07-26 14:50:33 -07:00
Cole Mickens 6ad9dc659f add clusterName to Loadbalancer methods 2016-07-26 14:50:33 -07:00
saadali 89fd358c52 Assume volume detached if node doesn't exist
Fixes #29358
2016-07-22 22:07:32 -07:00
Justin Santa Barbara aa9f2b2cda AWS: Deprecate a few functions in favor of aws-sdk-go
We have a few functions that predate aws-sdk-go, but they have natural
equivalents in aws-sdk-go.  Document them as deprecated, and replace
the implementation with the equivalent in aws-sdk-go to make it obvious
that they are the same.
2016-07-22 22:08:20 -04:00
k8s-merge-robot a0da4153b6 Merge pull request #29260 from lixiaobing10051267/masterErr
Automatic merge from submit-queue

Modify err output format from %s to %v

t.Errorf err output format should be %v
2016-07-20 11:11:46 -07:00
k8s-merge-robot 60f9ce8a41 Merge pull request #29253 from lixiaobing10051267/masterLBname
Automatic merge from submit-queue

format number not consistent with real variable number

glog.Infof format number not consistent with real variable number, should add %s for second var because loadBalancerName is string:
func (c *Cloud) ensureLoadBalancer(namespacedName types.NamespacedName, loadBalancerName string, ...
2016-07-20 11:11:42 -07:00
lixiaobing10051267 e3bff25dbb Modify err output format from %s to %v 2016-07-20 15:06:47 +08:00
k8s-merge-robot a3110dcb41 Merge pull request #28417 from kevensen/awszonefix
Automatic merge from submit-queue

AWS: Added experimental option to skip zone check

This pull request resolves #28380.  In the vast majority of cases, it is appropriate to validate the AWS region against a known set of regions.  However, there is the edge case where this is undesirable as Kubernetes may be deployed in an AWS-like environment where the region is not one of the known regions.

By adding the optional **DisableStrictZoneCheck true** to the **[Global]** section in the aws.conf file (e.g. /etc/aws/aws.conf) one can bypass the ragion validation.
2016-07-19 21:03:28 -07:00
lixiaobing10051267 6de7bc0085 format number not consistent with real variable number 2016-07-20 11:14:45 +08:00
Kenneth D. Evensen d69fe11c09
Fixing gofmt errors 2016-07-19 16:50:55 -04:00
Davanum Srinivas ee8507a5ae Use Infof/Warningf when appropriate
When we use a format string, we should use Infof/Warningf instead
of Info/Warning
2016-07-19 12:10:53 -04:00
lixiaobing10051267 1a01308356 glog.Warning output content not complete, lack of string fomat "%s" 2016-07-19 20:35:52 +08:00
Quinton Hoole 791dd215d2 Deprecate the term "Ubernetes" in favor of "Cluster Federation" and "Multi-AZ Clusters" 2016-07-06 15:42:56 -07:00
Kenneth D. Evensen 274411b94e
Adding comments 2016-07-02 16:54:41 -04:00
Kenneth D. Evensen 7e4af9a66b
Added flag to skip zone check 2016-07-02 05:58:25 -04:00
David McMahon ef0c9f0c5b Remove "All rights reserved" from all the headers. 2016-06-29 17:47:36 -07:00
Rudi Chiarito 8db551f674 golint fixes for aws cloudprovider 2016-06-24 17:06:38 -04:00
k8s-merge-robot 07471cf90f Merge pull request #27553 from justinsb/pvc_zone_spreading_2
Automatic merge from submit-queue

AWS/GCE: Spread PetSet volume creation across zones, create GCE volumes in non-master zones

Long term we plan on integrating this into the scheduler, but in the
short term we use the volume name to place it onto a zone.
    
We hash the volume name so we don't bias to the first few zones.
    
If the volume name "looks like" a PetSet volume name (ending with
-<number>) then we use the number as an offset.  In that case we hash
the base name.
2016-06-22 01:22:16 -07:00
Justin Santa Barbara 404c501c0d AWS: Add missing error check for #27774
There was an error check missing, which seems likely to have caused 27774

Issue #27774
2016-06-21 15:37:18 -04:00
Justin Santa Barbara dd94997619 Add comments & misc review fixes
Lots of comments describing the heuristics, how it fits together and the
limitations.

In particular, we can't guarantee correct volume placement if the set of
zones is changing between allocating volumes.
2016-06-21 15:22:16 -04:00
k8s-merge-robot 554b7010fa Merge pull request #27677 from justinsb/fix_24254
Automatic merge from submit-queue

AWS: Enable ICMP Type 3 Code 4 for ELBs

This enables MTU discovery.

Fixes #24254
2016-06-20 11:07:40 -07:00
k8s-merge-robot 536ed2843e Merge pull request #27628 from justinsb/fix_27534
Automatic merge from submit-queue

AWS volumes: Use /dev/xvdXX names with EC2

We are using HVM style names, which cannot be paravirtual style names.

See
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html

This also fixes problems introduced when moving volume mounting to KCM.

Fix #27534
2016-06-19 13:17:09 -07:00
Justin Santa Barbara fddc9d61fa AWS: Enable ICMP Type 3 Code 4 for ELBs
This enables MTU discovery.

Fixes #24254
2016-06-18 21:52:10 -04:00
Justin Santa Barbara e711cbf912 GCE/AWS: Spread PetSet volume creation across zones
Long term we plan on integrating this into the scheduler, but in the
short term we use the volume name to place it onto a zone.

We hash the volume name so we don't bias to the first few zones.

If the volume name "looks like" a PetSet volume name (ending with
-<number>) then we use the number as an offset.  In that case we hash
the base name.

Fixes #27256
2016-06-17 23:27:31 -04:00
goltermann 218645b346 Fix several spelling errors in comments. 2016-06-17 10:41:18 -07:00
Justin Santa Barbara 3af950f8f4 AWS volumes: Use /dev/xvdXX names with EC2
We are using HVM style names, which cannot be paravirtual style names.

See
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html

This also fixes problems introduced when moving volume mounting to KCM.

Fix #27534
2016-06-17 13:09:26 -04:00
Rudi Chiarito e29709df73 AWS: cache values from getInstancesByNodeName() 2016-06-11 13:46:06 -04:00
k8s-merge-robot 4793372a85 Merge pull request #25888 from rootfs/attacher-aws-cinder
Automatic merge from submit-queue

implement EBS and Cinder attacher/detacher 

follow up with #21709

@kubernetes/sig-storage
2016-06-10 05:39:22 -07:00
Huamin Chen d1e0a13924 support AWS and Cinder attacher
Signed-off-by: Huamin Chen <hchen@redhat.com>
2016-06-08 12:56:24 +00:00
Rudi Chiarito 4ff9e9319f AWS: support mixed plaintext/encrypted ports in ELBs
Fixes #26268

Implements the second SSL ELB annotation, per #24978

service.beta.kubernetes.io/aws-load-balancer-ssl-ports=* (or e.g. https)

If not specified, all ports are secure (SSL or HTTPS).
2016-06-07 18:39:53 -04:00
Andrew Williams 01d9cddda5 Add Amazon ELB proxy protocol support
Add ELB proxy protocol support via the annotation
"service.beta.kubernetes.io/aws-load-balancer-proxy-protocol". This
allows servers like Nginx and Haproxy to retrieve the real IP address of
a remote client.
2016-05-31 10:33:16 -05:00
Minhan Xia a1bd33f510 promote sourceRange into service spec 2016-05-26 10:42:30 -07:00
Rudi Chiarito ca8699e83e Fix long-standing bug in aws.stringSetToPointers
Instead of N pointers, we were returning N null pointers, followed by the real
thing. It's not clear why we didn't trip on this until now, maybe there is a
new server-side check for empty subnetID strings.
2016-05-26 10:56:37 -04:00
k8s-merge-robot d33fa39abf Merge pull request #23254 from jsafrane/devel/ulimited-aws-devices
Automatic merge from submit-queue

AWS: Move enforcement of attached AWS device limit from kubelet to scheduler

Limit of nr. of attached EBS volumes to a node is now enforced by scheduler. It can be adjusted by `KUBE_MAX_PD_VOLS` env. variable there. Therefore we don't need the same check in kubelet. If the system admin wants to attach more, we should allow it.

Kubelet limit is now 650 attached volumes ('ba'..'zz').

Note that the scheduler counts only *pods* assigned to a node. When a pod is deleted and a new pod is scheduled on a node, kubelet start (slowly) detaching the old volume and (slowly) attaching the new volume. Depending on AWS speed **it may happen that more than KUBE_MAX_PD_VOLS volumes are actually attached to a node for some time!** Kubelet will clean it up in few seconds / minutes (both attach/detach is quite slow).

Fixes #22994
2016-05-19 06:13:42 -07:00
k8s-merge-robot 24c46acd16 Merge pull request #24369 from Clarifai/ecr
Automatic merge from submit-queue

AWS: Allow cross-region image pulling with ECR

Fixes #23298
Definitely should be in the release notes; should maybe get merged in 1.2 along with #23594 after some soaking. Documentation changes to follow.

cc @justinsb @erictune @rata @miguelfrde

This is step two. We now create long-lived, lazy ECR providers in all regions.
When first used, they will create the actual ECR providers doing the work
behind the scenes, namely talking to ECR in the region where the image lives,
rather than the one our instance is running in.

Also:
- moved the list of AWS regions out of the AWS cloudprovider and into the
credentialprovider, then exported it from there.
- improved logging

Behold, running in us-east-1:

```
aws_credentials.go:127] Creating ecrProvider for us-west-2
aws_credentials.go:63] AWS request: ecr:GetAuthorizationToken in us-west-2
aws_credentials.go:217] Adding credentials for user AWS in us-west-2
Successfully pulled image "123456789012.dkr.ecr.us-west-2.amazonaws.com/test:latest"
```

*"One small step for a pod, one giant leap for Kube-kind."*

<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/24369)
<!-- Reviewable:end -->
2016-05-13 15:15:45 -07:00
Rudi Chiarito 6e6ea46182 Include changes from feedback
Use constructor for ecrProvider
Rename package to "credentials" like golint requests
Don't wrap the lazy provider with a caching provider
Add immedita compile-time interface conformance checks for the interfaces
Added comments
2016-05-10 12:03:40 -04:00
Rudi Chiarito eea29e8851 Allow cross-region image pulling with AWS' ECR
This is step two. We now create long-lived, lazy ECR providers in all regions.
When first used, they will create the actual ECR providers doing the work
behind the scenes, namely talking to ECR in the region where the image lives,
rather than the one our instance is running in.

Also:

- moved the list of AWS regions out of the AWS cloudprovider and into the
credentialprovider, then exported it from there.
- improved logging

Behold, running in us-east-1:

```
aws_credentials.go:127] Creating ecrProvider for us-west-2
aws_credentials.go:63] AWS request: ecr:GetAuthorizationToken in us-west-2
aws_credentials.go:217] Adding credentials for user AWS in us-west-2
Successfully pulled image 123456789012.dkr.ecr.us-west-2.amazonaws.com/test:latest"
```

*"One small step for a pod, one giant leap for Kube-kind."*
2016-05-10 12:03:39 -04:00
Rudi Chiarito 59334408a6 Change default when no BE proto given, add test for that
Also improve error message when BE proto is wrong
2016-05-10 11:53:44 -04:00
Rudi Chiarito e19c069b9d Add comment, rename getListener to buildListener 2016-05-10 11:40:34 -04:00
Rudi Chiarito 898df1f52b Fix API fields to use new int32 sizes 2016-05-02 19:20:50 -04:00
Rudi Chiarito 61471965d8 Split annotation in two 2016-05-02 19:18:02 -04:00
Rudi Chiarito 7b7dd7861f Add support for HTTPS->HTTP ELB listeners through annotations
Moved listener creation to a separate function, which had the nice
side effect of allowing tests (added eight cases).
2016-05-02 19:18:01 -04:00
k8s-merge-robot 928990730e Merge pull request #24457 from leokhoa/master
Automatic merge from submit-queue

AWS: Add support for ap-northeast-2 region (Seoul)

This PR does:
- Support AWS Seoul region: ap-northeast-2. 
Currently, I can not setup Kubernetes on AWS Seoul.  
Error Messages: 

> 
> ip-10-0-0-50 core # docker logs 0697db
> I0419 07:57:44.569174       1 aws.go:466] Zone not specified in configuration file; querying AWS metadata service
> F0419 07:57:44.570380       1 controllermanager.go:279] Cloud provider could not be initialized: could not init cloud provider "aws": not a valid AWS zone (unknown region): ap-northeast-2a
2016-05-02 09:46:47 -07:00
Khoa Le ce771effc6 Added AWS Seoul region 2016-04-19 15:20:39 +07:00
goltermann c226c9435b Fix misspellings in comments.
https://goreportcard.com/report/k8s.io/kubernetes#misspell
2016-04-14 13:57:45 -07:00
k8s-merge-robot 0c06f31cb8 Merge pull request #23340 from justinsb/fix_23339
Auto commit by PR queue bot
2016-03-29 05:04:27 -07:00
k8s-merge-robot 4e4ad61260 Merge pull request #23366 from goltermann/vet
Auto commit by PR queue bot
2016-03-24 21:50:56 -07:00
Jan Safranek e4dc6709de Remove limit of attached AWS devices from kubelet.
Limit of nr. of attached EBS volumes to a node is now enforced by scheduler. It
can be adjusted by KUBE_MAX_PD_VOLS env. variable there.

Therefore we don't need the same check in kubelet. If the system admin wants to
attach more, we should allow it.

Kubelet limit is now 650 attached volumes ('ba'..'zz').
2016-03-23 12:07:16 +01:00
Chris Batey and James Ravn be9ce30897 Change LoadBalancer methods to take api.Service
This is a better abstraction than passing in specific pieces of the
Service that each of the cloudproviders may or may not need. For
instance, many of the providers don't need a region, yet this is passed
in. Similarly many of the providers want a string IP for the load
balancer, but it passes in a converted net ip. Affinity is unused by
AWS. A provider change may also require adding a new parameter which has
an effect on all other cloud provider implementations.

Further, this will simplify adding provider specific load balancer
options, such as with labels or some other metadata. For example, we
could add labels for configuring the details of an AWS elastic load
balancer, such as idle timeout on connections, whether it is
internal or external, cross-zone load balancing, and so on.

Authors: @chbatey, @jsravn
2016-03-23 10:48:11 +00:00
goltermann 34d4eaea08 Fixing several (but not all) go vet errors. Most are around string formatting, or unreachable code. 2016-03-22 17:26:50 -07:00
Justin Santa Barbara 59013f5507 AWS: Fix problems with >2 security groups
The previous logic was incorrect; if we saw two untagged security groups
before seeing the first tagged security, we would incorrectly return an
error.

Fix #23339
2016-03-22 13:00:14 -04:00
k8s-merge-robot 8c02a46c4d Merge pull request #22486 from thockin/update-gcfg-dep
Auto commit by PR queue bot
2016-03-21 18:47:21 -07:00
k8s-merge-robot 0a28a38110 Merge pull request #22280 from justinsb/fix_error_message_formats
Auto commit by PR queue bot
2016-03-21 16:53:38 -07:00
Tim Hockin a073c80e45 Use newer home for gcfg package
Switch from obsolete "github.com/scalingdata/gcfg" to "gopkg.in/gcfg.v1".
2016-03-16 08:42:08 -07:00
Jan Safranek f270cb1b9b Allow 39 atached EBS devices.
AWS has soft support limit for 40 attached EBS devices. Assuming there is just
one root device, use the rest for persistent volumes.

The devices will have name /dev/xvdba - /dev/xvdcm, leaving /dev/sda - /dev/sdz
to the system.

Also, add better error handling and propagate error
"Too many EBS volumes attached to node XYZ" to a pod.
2016-03-15 17:28:59 +01:00
k8s-merge-robot bf1bb5d309 Merge pull request #22899 from justinsb/aws_fix_e2e
Auto commit by PR queue bot
2016-03-14 17:37:13 -07:00
Justin Santa Barbara 3e12e8b7cc AWS e2e: don't try to build a full cloudprovider in e2e
We have previously tried building a full cloudprovider in e2e for AWS;
this wasn't the best idea, because e2e runs on a different machine than
normal operations, and often doesn't even run in AWS.  In turn, this
meant that the cloudprovider had to do extra work and have extra code,
which we would like to get rid of.  Indeed, I got rid of some code which
tolerated not running in AWS, and this broke e2e.
2016-03-12 06:14:45 -05:00
k8s-merge-robot 45064e19d1 Merge pull request #22793 from justinsb/fix_22792
Auto commit by PR queue bot
2016-03-11 20:37:25 -08:00
k8s-merge-robot 26e309c753 Merge pull request #22784 from justinsb/fix_17626
Auto commit by PR queue bot
2016-03-11 20:04:13 -08:00
k8s-merge-robot de7193a095 Merge pull request #22788 from justinsb/fix_22786
Auto commit by PR queue bot
2016-03-11 15:13:06 -08:00
Justin Santa Barbara e40595fa57 AWS volumes: Release disk from attaching map on error
If AWS gives us an actual error (vs just timing out), we know the disk
did not attach, and so we can remove it immediately from the attaching
map.
2016-03-11 11:40:39 -05:00
Justin Santa Barbara 16730aba96 AWS: Tag created EBS volumes with our cluster tag
Fix #22792
2016-03-10 08:13:50 -05:00
Justin Santa Barbara 79b2b7edef AWS EBS: Remove the attached volumes cache
There are known issues with the attached-volume state cache that just aren't
possible to fix with the current interface.

Replace it with a map of the active attach jobs (that was the original
requirement, to avoid a nasty race condition).

This costs us an extra DescribeInstance call on attach/detach, but that
seems worth it if it ends this class of bugs.

Fix #15073
2016-03-10 07:50:35 -05:00
Justin Santa Barbara 0921af4aca AWS: Don't pass empty filters to AWS requests
It gives an error: `The filter 'null' is invalid`

Instead of a zero-length filter list, provide a nil value.

Fix #22786
2016-03-10 07:22:31 -05:00
Justin Santa Barbara 7c82fe7389 AWS: Increase timeout deleting ELB; log remaining security groups
Either ELB is slow to delete (in which case the bumped timeout will
help), or the security groups are otherwise blocked (in which case
logging them will help us track this down).

Fix #17626
2016-03-10 06:57:13 -05:00
Justin Santa Barbara cb818a01d0 AWS: Fix some error messages
Some error messages had incorrect spacing.  Prefer Warningf to Warning,
and fix some of those problems.
2016-03-08 06:29:29 -05:00
Justin Santa Barbara 02e79b9e52 AWS: If we have no subnets, bail out early
We know the ELB call will fail, so we error out early rather than
hitting the API.  Preserves rate limit quota, and also allows us to give
a more self-evident message.

Fix #21993
2016-03-06 09:41:29 -05:00
Justin Santa Barbara 5cf837452b AWS: Fix problems identifying subnets for internal ELBs
We tacitly supported this before, but we broke this with the
public-subnet detection.

Fix #22527
2016-03-06 09:41:29 -05:00
Justin Santa Barbara 43e6602c42 AWS: Fix test failure introduced by rebase 2016-03-05 08:11:30 -05:00
Justin Santa Barbara cff564b1a6 AWS: Remove dead code and fix up comments 2016-03-05 08:09:40 -05:00
Justin Santa Barbara f8e6098e4d AWS: Update tests for refactoring 2016-03-05 08:09:40 -05:00
Justin Santa Barbara af9efa02b4 AWS: Remove getSelfAWSInstance, use field directly
Now that we always populate the local instance, we don't need a getter.
2016-03-05 08:09:40 -05:00
Justin Santa Barbara 8c492c7536 AWS: Don't store the AZ on the cloud
Now we have Ubernetes-Lite, an AWSCloud can span multiple AZs.
2016-03-05 08:09:40 -05:00
Justin Santa Barbara ddb5072a54 AWS: Don't pretend getSelfAWSInstance can return an error
It can't any more; this simplifies calling code.
2016-03-05 08:09:40 -05:00
Justin Santa Barbara 40d0afbb1b AWS: Capture VPC ID into AWSCloud, avoiding requeries
By storing the VPC ID on AWSCloud, we avoid the need to requery or to
pass it around.
2016-03-05 08:09:40 -05:00
Justin Santa Barbara 00b666f853 AWS: Rename getInfo -> describeInstance/describeVolume
Makes it clearer that we are making an AWS API call
2016-03-05 08:09:40 -05:00
Justin Santa Barbara efa68a3590 AWS: Build awsInstance as part of cloud provider creation
We need getSelfAWSInstance to be working anyway; we might as well build
it early, and then we can use its methods to extract e.g. the VPC ID
2016-03-05 08:09:40 -05:00
Justin Santa Barbara 0375fa057f AWS: Refactor newAWSInstance
Now that we can't build an awsInstance from metadata, because of the
PrivateDnsName issue, we might as well simplify the arguments.

Create a 'placeholder' method though - newAWSInstanceFromMetadata - that
documents the desire to use metadata, shows how we would get it, but
links to the bug which explains why we can't use it.
2016-03-05 08:09:40 -05:00
k8s-merge-robot a80f6a7ea6 Merge pull request #21905 from justinsb/aws_wrap_security_group_error
Auto commit by PR queue bot
2016-03-05 01:25:38 -08:00
k8s-merge-robot 264f5786ea Merge pull request #21987 from justinsb/fixx_21895
Auto commit by PR queue bot
2016-03-04 17:07:57 -08:00
k8s-merge-robot 40778f3d2c Merge pull request #22194 from chbatey/testify-for-aws-provider
Auto commit by PR queue bot
2016-03-04 02:29:32 -08:00
Justin Santa Barbara 62e34da125 AWS: Remove extra load balancer security group ingress rules
The ingress CIDRs are going to be dynamic, and in general we don't want
to leave old ingress rules around.

Fix #21895
2016-03-03 18:55:49 -05:00
k8s-merge-robot de72b6be1b Merge pull request #21907 from justinsb/load_balancer_source_ranges
Auto commit by PR queue bot
2016-03-03 14:10:47 -08:00
Justin Santa Barbara cb92133dfa LB Source Ranges: Move validation into API layer
Had to move other things around too to avoid a weird api ->
cloudprovider dependency.

Also adding fixes per code reviews.

(This is a squash of the previously approved commits)
2016-03-03 10:27:17 -05:00
k8s-merge-robot d81d823ca5 Merge pull request #22393 from eparis/blunderbuss
Auto commit by PR queue bot
2016-03-02 18:51:56 -08:00
Eric Paris 5e5a823294 Move blunderbuss assignees into tree 2016-03-02 20:46:32 -05:00
Christopher Batey aebd4c95e1 Use testify for AWS provider
This has two main advantages:

* The use of the mock package to verify API calls against the aws SDK
* Nicer error messages for asserts without having to use if statements
2016-03-01 14:32:45 +00:00
k8s-merge-robot 6e6550a105 Merge pull request #21989 from justinsb/fix_21986
Auto commit by PR queue bot
2016-03-01 03:51:43 -08:00
Justin Santa Barbara 49e1149227 AWS: Add support for load balancer source ranges
This refactors #21431 to pull a lot of the code into cloudprovider so it
can be reused by AWS.

It also changes the name of the annotation to be non-GCE specific:
service.beta.kubernetes.io/load-balancer-source-ranges

Fix #21651
2016-02-29 19:32:08 -05:00
James Ravn f568b6511a Handle aws implicit and shared routing tables
Fix the AWS subnet lookup that checks if a subnet is public, which was
missing a few cases:

- Subnets without explicit routing tables, which use the main VPC
  routing table.
- Routing tables not tagged with KubernetesCluster. The filter for this
  is now removed.
2016-02-25 22:52:26 +00:00
Justin Santa Barbara 1cdfc9ad84 AWS: Find the correct security group by looking at tags
Like everything else AWS, we differentiate between k8s-owned security
groups and k8s-not-owned security groups using tags.

When we are setting up the ingress rule for ELBs, pick the security
group that is tagged over any others.

We continue to tolerate a single security group being untagged, but
having multiple security groups without tagging is now an error, as it
leads to undefined behaviour.

We also log at startup if the cluster tag is not defined.

Fix #21986
2016-02-25 11:20:58 -05:00
k8s-merge-robot 2a58c0062d Merge pull request #17913 from jtblin/jtblin/17912-pick-public-subnets
Auto commit by PR queue bot
2016-02-24 23:48:15 -08:00
Justin Santa Barbara e50ae40301 AWS: Wrap AWS error when failing to create security group ingress
All AWS errors should be wrapped in a user-friendly error before
returning.
2016-02-24 14:13:44 -05:00
k8s-merge-robot 9c1d8bf99d Merge pull request #21399 from sky-uk/disable-ingress-sg
Auto commit by PR queue bot
2016-02-24 00:05:47 -08:00
James Ravn and Yoseph Samuel 9f62e81be5 Disable aws node security group ingress creation
Add aws cloud config:

    [global]
    disableSecurityGroupIngress = true

The aws provider creates an inbound rule per load balancer on the node
security group. However, this can quickly run into the AWS security
group rule limit of 50.

This disables the automatic ingress creation. It requires that the user
has setup a rule that allows inbound traffic on kubelet ports from the
local VPC subnet (so load balancers can access it). E.g.  `10.82.0.0/16
30000-32000`.

Limits: http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Appendix_Limits.html#vpc-limits-security-groups

Authors: @jsravn, @balooo
2016-02-23 15:24:50 +00:00
Chris Batey, James Ravn and Yoseph Samuel 087ff78cf9 Only find running aws hosts by nodename
When finding instance by node name in AWS, only retrieve running
instances.  Otherwise terminated, old nodes can show up with the same
tag when rebuilding nodes in the cluster.

Another improvement made is to filter instances by the node names
provided, rather than selecting all instances and filtering in code.

Authors: @jsravn, @chbatey, @balooo
2016-02-23 14:47:16 +00:00
k8s-merge-robot 9470a7e61c Merge pull request #21627 from justinsb/fix_11324
Auto commit by PR queue bot
2016-02-23 03:45:10 -08:00
Justin Santa Barbara 7e69426b8b AWS: Only warn about nameless instances if they are running
Otherwise this is just confusing logspam.

Fix #20912
2016-02-22 10:27:13 -05:00
k8s-merge-robot 84891cabad Merge pull request #19335 from justinsb/aws_delay_when_requestlimitexceeded
Auto commit by PR queue bot
2016-02-21 18:38:18 -08:00
Justin Santa Barbara f8af47b645 AWS: Pass globals into Backoff struct
Thanks for the suggestion bprashanth!
2016-02-21 20:17:22 -05:00
k8s-merge-robot 2e3053a204 Merge pull request #21431 from freehan/sourcerange
Auto commit by PR queue bot
2016-02-21 16:14:42 -08:00
Justin Santa Barbara b269e8f43c AWS: Delay all AWS calls when we observe RequestLimitExceeded errors
This applies a cross-request time delay when we observe
RequestLimitExceeded errors, unlike the default library behaviour which
only applies a *per-request* backoff.

Issue #12121
2016-02-21 15:09:42 -05:00
Justin Santa Barbara 22d719018a AWS: Recover if tags missing on security group
In the AWS API (generally) we tag things we create, and then we filter
to find them.  However, creation & tagging are typically two separate
calls.  So there is a chance that we will create an object, but fail to
tag it.

We fix this (done here in the case of security groups, but we can do
this more generally) by retrieving the resource without a tag filter.
If the retrieved resource has the correct tags, great.  If it has the
tags for another cluster, that's a problem, and we raise an error.  If
it has no tags at all, we add the tags.

This only works where the resource is uniquely named (or we can
otherwise retrieve it uniquely).  For security groups, the SG name comes
from the service UUID, so that's unique.

Fixes #11324
2016-02-20 14:58:19 -05:00
k8s-merge-robot f08a8f23c1 Merge pull request #20959 from justinsb/fix_20911
Auto commit by PR queue bot
2016-02-20 00:56:53 -08:00
k8s-merge-robot 6be4417aff Merge pull request #17649 from jtblin/jtblin/17647-aws-elb-creation-no-tags
Auto commit by PR queue bot
2016-02-18 19:41:15 -08:00
Minhan Xia 7ffb123abe add source range support for loadbalancer on gce 2016-02-18 17:05:02 -08:00
Jerome Touffe-Blin 62a8b3d44c Fix #17912 - pick public subnets only on ELB creation 2016-02-16 23:16:38 +11:00
Rudi Chiarito da4ba61232 Fix AWS IPPermission check for case with preexisting groups and ranges 2016-02-15 11:41:18 -05:00
Rudi Chiarito b3863eae82 Add instance-type label to cloud providers
Fully implemented for AWS and GCE
2016-02-12 15:02:03 -05:00
Justin Santa Barbara a0093eb503 e2e: Don't try to create a UDP LoadBalancer on AWS
AWS doesn't support type=LoadBalancer with UDP services.  For now, we
simply skip over the test with type=LoadBalancer on AWS for the UDP
service.

Fix #20911
2016-02-10 12:31:18 -05:00
k8s-merge-robot 9520bb5ddf Merge pull request #20731 from Clarifai/ensure-lb-servicename
Auto commit by PR queue bot
2016-02-10 01:25:07 -08:00
k8s-merge-robot 2ec49efd54 Merge pull request #19945 from Clarifai/fix-formatting
Auto commit by PR queue bot
2016-02-09 16:05:00 -08:00
David Pratt 57782459e6 Log missing public IP from AWS metadata. 2016-02-09 12:06:02 -06:00
David Pratt adde7f548f Fix AWS kubelet registration.
This commit allows the AWS cloud provider plugin to work on EC2 instances
that do not have a public IP. The EC2 metadata service returns a 404 for the
'public-ipv4' endpoint for private instances, and the plugin was bubbling this
up as a fatal error.
2016-02-09 11:46:53 -06:00
Rudi Chiarito 5874b0cb9d Pass namespaced service name to cloudprovider's EnsureLoadBalancer
Also has an AWS implementation that plugs the service name into the ELB and SG.
Log the service name under GCE and OpenStack.
Fixes #20668
2016-02-09 06:50:53 -05:00
k8s-merge-robot 318705feb9 Merge pull request #20378 from jtblin/jtblin/11543-aws-cloud-provider-private-hosted-dns-zone-via-api
Auto commit by PR queue bot
2016-02-08 23:43:59 -08:00
k8s-merge-robot fec0d127b3 Merge pull request #15938 from justinsb/aws_ebs_cleanup
Auto commit by PR queue bot
2016-02-08 21:42:52 -08:00
k8s-merge-robot 257c3ad776 Merge pull request #20153 from sky-uk/fix-sg-comparison
Auto commit by PR queue bot
2016-02-06 12:25:26 -08:00
Jerome Touffe-Blin 0a36db19bf Fix #11543 - use DescribeInstances API to retrieve the correct private DNS name 2016-02-04 08:48:25 +11:00
Justin Santa Barbara f61a5d0400 AWS: Switch arguments to AttachDisk/DetachDisk to match GCE 2016-02-03 20:43:23 +00:00
Justin Santa Barbara 6c87a4be7c AWS: Handle deleting volume that no longer exists
The tests in particular double-delete volumes, so we need to handle this
graciously.
2016-02-03 20:43:14 +00:00
Justin Santa Barbara 1ae1db6027 AWS: Update copy-paste of GCE PD code to latest version
We are (sadly) using a copy-and-paste of the GCE PD code for AWS EBS.
This code hasn't been updated in a while, and it seems that the GCE code
has some code to make volume mounting more robust that we should copy.
2016-02-03 20:43:14 +00:00
Rudi Chiarito a0831a2378 Mass fix of Infof and co. missing the trailing "f", even when formatting placeholders are used 2016-02-03 11:34:59 -05:00
k8s-merge-robot 5a6cf15c09 Merge pull request #19874 from justinsb/aws_fix_findinstancesbynodenames
Auto commit by PR queue bot
2016-02-01 21:38:46 -08:00
k8s-merge-robot 3db1a6c3ce Merge pull request #19976 from aarondav/elb-timeout
Auto commit by PR queue bot
2016-01-30 18:57:56 -08:00
James Ravn c3383b3422 Fix formatting of aws_test.go 2016-01-26 15:13:25 +00:00
James Ravn b4bbc3b3ef Fix removal of aws security group ingress
The ip permission method now checks for containment, not equality, so
order of parameters matter. This change fixes
`removeSecurityGroupIngress` to pass in the removal permission first to
compare against the existing permission.
2016-01-26 13:57:52 +00:00
Dogan Narinc and James Ravn 190c829ac5 Fix AWS ip permission comparison on security group
Change isEqualIPPermission to consider the entire list of security group
ids on when checking if a security group id has already been added.

This is used for example when adding and removing ingress rules to the
cluster nodes from an elastic load balancer. Without this, once there
are multiple load balancers, the method as it stands incorrectly returns
false even if the security group id is in the list of group ids. This
causes a few problems: dangling security groups which fill up an
account's limit since they don't get removed, and inability to recreate
load balancers in certain situations (receiving an
InvalidPermission.Duplicate from AWS when adding the same security
group).
2016-01-26 11:30:27 +00:00
Rudi Chiarito 76e29ed455 Register ECR credential plugin only when an AWS cloud instance is created 2016-01-25 22:18:45 -05:00
Aaron Davidson 97689c326d Reduce healthy threshold and check interval for Amazon ELBs
According to AWS, the ELB healthy threshold is "Number of consecutive health check successes before declaring an EC2 instance healthy." It has an unusual interaction with Kubernetes, since all nodes will enter either an unhealthy state or a healthy state together depending on the service's healthiness as a whole.

We have observed that if our service goes down for the unhealthy threshold (which is 2 checks at 30 second intervals = 60 seconds), then the ELB will stop serving traffic to all nodes in the cluster, and will wait for the healthy threshold (currently 10 * 30 = 300 seconds) AFTER the service is restored to add back the cluster nodes, meaning it remains unreachable for an extra 300 seconds.

With the new settings, the ELB will continue to timeout dead nodes after 60 seconds, but will restore healthy nodes after 20 seconds. The minimum value for healthyThreshold is 2, and the minimum value for interval is 5 seconds. I went for 10 seconds instead of the minimum sort of arbitrarily because I was not sure how much this value may affect the scalability of clusters in EC2, as it does put some extra load on the kube-proxy.
2016-01-23 11:10:37 -08:00
Zach Loafman 7189db3701 Merge pull request #19396 from justinsb/aws_mountdevice
AWS: Use a strongly typed mountDevice
2016-01-22 11:04:23 -08:00
Alex Mohr f64a40f315 Merge pull request #19863 from justinsb/aws_fix_loadbalancer_tcp_check
AWS: Eliminate assumptions about all load-balancer ports matching
2016-01-21 10:52:20 -08:00
Alex Mohr b8f8a62775 Merge pull request #19862 from justinsb/aws_fix_tcploadbalancer_comment
AWS: Fix comment to reflect new method name
2016-01-21 10:51:37 -08:00
Alex Mohr 8e9bebb424 Merge pull request #19861 from justinsb/aws_remove_dead_code
AWS: Remove dead code
2016-01-21 10:51:06 -08:00
Alex Mohr 819cdc85d3 Merge pull request #19390 from danielschonfeld/optimize-kubelet-heartbeat-aws
optimize kubelet heartbeats instance data to be cached for AWS
2016-01-21 10:33:00 -08:00
Justin Santa Barbara 9f44c72ba9 AWS: findInstancesByNodeNames should not make O(N) API calls
findInstancesByNodeNames was a simple loop around
findInstanceByNodeName, which made an EC2 API call for each call.

We've had trouble with this sort of behaviour hitting EC2 rate limits on
bigger clusters (e.g. #11979).

Instead, change this method to fetch _all_ the tagged EC2 instances, and
then loop through the local results.  This is one API call (modulo
paging).

We are currently only using findInstancesByNodeNames for the load
balancer, where we attach every node, so we were fetching all but one of
the instances anyway.

Issue #11979
2016-01-20 12:37:22 -05:00
Justin Santa Barbara 0586d866de AWS: Eliminate assumptions about all load-balancer ports matching
It costs us basically nothing to just check all the ports, and
protects us against future changes to the controller.
2016-01-20 09:30:46 -05:00
Justin Santa Barbara 12dd568662 AWS: Fix comment to reflect new method name
TCPLoadBalancer.EnsureTCPLoadBalancer => LoadBalancer.EnsureLoadBalancer
2016-01-20 09:28:28 -05:00
Justin Santa Barbara 30882265b6 AWS: Remove dead code
I think I added these functions by mistake; they aren't used and
apparently never were.
2016-01-20 09:23:53 -05:00
Daniel Schonfeld 12cbc9ff89 optimize ExternalId() and InstanceId() to returned cached results directly from the metadata service 2016-01-13 23:36:57 -05:00
k8s-merge-robot 37b5726716 Merge pull request #14431 from Defensative/UDP-LB
Auto commit by PR queue bot
2016-01-08 12:39:02 -08:00
k8s-merge-robot e0e305c6be Merge pull request #19337 from danielschonfeld/optimize-list-routes
Auto commit by PR queue bot
2016-01-08 10:19:47 -08:00
Justin Santa Barbara 03900b1dc9 AWS: Use a strongly typed mountDevice
We've had problems in the past from using a string with passing the
wrong value when detaching; stronger typing would have caught this for
us.
2016-01-08 00:25:11 -05:00
Daniel Schonfeld 24c44e7a8e optimize ListRoutes to fetch instances only once per call
Issue #12121 - fixes courtesy of @justinsb - thank you
2016-01-07 14:32:37 -05:00