Commit Graph

442 Commits (865321c2d69d249d95079b7f8e2ca99f5430d79e)

Author SHA1 Message Date
Fabio Bertinatto 5abe207eef Add metric for throttled requests in AWS 2018-05-02 12:35:37 +02:00
Fabio Bertinatto 81bf821d69 Capitalize acronyms in AWS metrics-related code 2018-04-30 15:50:18 +02:00
Jesse Haka de967b717d PR #59323, fix bug and remove one api call, add node util dependency to cloud controller 2018-04-22 20:32:26 +03:00
Kubernetes Submit Queue 2196b0a074
Merge pull request #61391 from hanxiaoshuai/fixtodo0320
Automatic merge from submit-queue (batch tested with PRs 60476, 62462, 61391, 62535, 62394). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Use function aws.Int64Value replace of deprecated function orZero

**What this PR does / why we need it**:

```
// orZero returns the value, or 0 if the pointer is nil
// Deprecated: prefer aws.Int64Value
func orZero(v *int64) int64 {
    return aws.Int64Value(v)
}
```
Use function aws.Int64Value replace of deprecated function orZero and remove unused orZero .

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2018-04-13 11:20:17 -07:00
Anup Navare 1335e6e2d4 Cleanup the use of ExternalID as it is deprecated
The patch removes ExternalID usage from node_controller
and node_lifecycle_oontroller. The code instead uses InstanceID
which returns the cloud provider ID as well.
2018-04-02 10:15:32 -07:00
Mikhail Mazurskiy 468655b76a
Use typed events client directly 2018-04-01 18:57:29 +10:00
hangaoshuai f78c7d3db0 Use function aws.Int64Value replace of deprecated function orZero 2018-03-20 17:27:20 +08:00
Kubernetes Submit Queue 3d60b3cd67
Merge pull request #60490 from jsafrane/fix-aws-delete
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Volume deletion should be idempotent

- Describe* calls should return `aws.Error` so caller can handle individual errors. `aws.Error` already has enough context (`"InvalidVolume.NotFound: The volume 'vol-0a06cc096e989c5a2' does not exist"`)
- Deletion of already deleted volume should succeed.


**Release note**:


Fixes: #60778

```release-note
NONE
```

/sig storage
/sig aws

/assign @justinsb @gnufied
2018-03-05 12:42:22 -08:00
wackxu f737ad62ed update import 2018-02-27 20:23:35 +08:00
Jan Safranek 38c0ce75c3 Volume deletion should be idempotent
- Describe* calls should return aws.Error so caller can handle individual
  errors. aws.Error already has enough context ("InvalidVolume.NotFound: The
  volume 'vol-0a06cc096e989c5a2' does not exist")
- Deletion of already deleted volume should succeed.
2018-02-27 09:46:38 +01:00
Kubernetes Submit Queue 3ca89a3469
Merge pull request #60125 from vainu-arto/aws-missing-tags-error
Automatic merge from submit-queue (batch tested with PRs 60435, 60334, 60458, 59301, 60125). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Return missing ClusterID error instead of ignoring it

This fixes issue #57382. In the cases I'm aware of kubelet cannot function if it can't detect the cluster it is running in, so the error should be passed up to the caller preventing initialization when kubelet would fail. This way the error can be detected and kubelet startup attempted again later (giving AWS time to apply the tags).

```release-note
On AWS kubelet returns an error when started under conditions that do not allow it to work (AWS has not yet tagged the instance).
```
2018-02-26 17:48:54 -08:00
Kubernetes Submit Queue 98b1c79e2b
Merge pull request #59756 from tsmetana/refactor-describe-volume
Automatic merge from submit-queue (batch tested with PRs 57326, 60076, 60293, 59756, 60370). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix #59601: AWS: Check error code returned from describeVolume

The errors returned by the describeVolume call are not all equal:
if the error is of InvalidVolume.NotFound type it does not necessarily
mean the desired operation cannot be finished successfully.

Fixes #59601

```release-note
NONE
```
2018-02-26 09:20:49 -08:00
Arto Jantunen cba110aa3d Return missing ClusterID error instead of ignoring it
This fixes issue #57382.
2018-02-26 14:50:58 +02:00
Arto Jantunen a58f16bdfa Add clusterid tags to the instances in AWS tests
In practice these were in most cases required to exist, but kubelet did not
previously enforce this. It now does, so these tests need to change a bit.
2018-02-26 14:50:58 +02:00
Kubernetes Submit Queue 62c5f21d5d
Merge pull request #58767 from 2rs2ts/tag-elb-sgs
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Tag Security Group created for AWS ELB with same additional tags as ELB

/sig aws

(I worked on this with @bkochendorfer)

Tags the SG created for the ELB with the same additional tags the ELB gets from the `service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags` annotation. This is useful for identifying orphaned resources.

We think that reusing the annotation is a simpler and less intrusive approach than adding a new annotation, and most users will want the same set of tags applied.

We weren't sure how to write a test for this because it looks like the fake EC2 code doesn't store the state of the security groups. If new tests are a requirement for merging, we'll need help writing them.

Fixes #53489

```release-note
AWS Security Groups created for ELBs will now be tagged with the same additional tags as the ELB (i.e. the tags specified by the "service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags" annotation.)
```
2018-02-25 11:59:53 -08:00
Kubernetes Submit Queue d1f3de9a39
Merge pull request #57569 from micahhausler/nlb-remove-fix
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix AWS NLB delete error

**What this PR does / why we need it**:

Fixes an error when deleting an NLB in AWS

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #57568

**Special notes for your reviewer**:


**Release note**:

```release-note
Fixes an error when deleting an NLB in AWS - Fixes #57568
```

@justinsb  How do I get this into the `release-1.9` branch?
2018-02-25 11:07:07 -08:00
Kubernetes Submit Queue 96ec318718
Merge pull request #59842 from ixdy/update-rules_go-02-2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

 Update bazelbuild/rules_go, kubernetes/repo-infra, and gazelle dependencies

**What this PR does / why we need it**: updates our bazelbuild/rules_go dependency in order to bump everything to go1.9.4. I'm separating this effort into two separate PRs, since updating rules_go requires a large cleanup, removing an attribute from most build rules.

**Release note**:

```release-note
NONE
```
2018-02-19 22:23:05 -08:00
Kubernetes Submit Queue 6d0b71740f
Merge pull request #59968 from kubernetes/revert-59323-nodetaint
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Revert "add node shutdown taint"

Reverts kubernetes/kubernetes#59323

Node becomes unready, but is never removed. I've found the following in [kube-controller-manager.log](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-autoscaling/6055/artifacts/bootstrap-e2e-master/cluster-autoscaler.log) from test run for one such node:

`E0216 01:14:27.084923       1 node_lifecycle_controller.go:686] Error determining if node bootstrap-e2e-minion-group-01b1 shutdown in cloud: failed to get instance ID from cloud provider: instance not found`

This goes on for the rest of the run (~6h). Looks like the node is stuck in Unready state because of this check: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/nodelifecycle/node_lifecycle_controller.go#L684. Previously, there was no such check and the node was removed.

Reverting as this would affect all users attempting to resize their node groups on GCE.

```release-note
NONE
```
2018-02-16 20:12:56 -08:00
Andrew Garrett 39f46806b7 Don't assume SG is for ELB; pass tags directly 2018-02-16 22:00:22 +00:00
Jeff Grafton ef56a8d6bb Autogenerated: hack/update-bazel.sh 2018-02-16 13:43:01 -08:00
Kubernetes Submit Queue 0e81651e77
Merge pull request #59909 from jsafrane/volumemanager-approvers
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add jsafrane as AWS approver.

**What this PR does / why we need it**:
I contrinbuted several PRs in AWS storage and I'm willing to share review/approval duty.

**Release note**:

```release-note
NONE
```

/assign @justinsb
2018-02-16 06:02:01 -08:00
Aleksandra Malinowska 2d54ba3e0f
Revert "add node shutdown taint" 2018-02-16 12:24:27 +01:00
Bryce Carman 3b99e1b487 Add AWS cloud provider option for IAM role
Currently the AWS cloud provider uses the EC2 instance role when
interacting with AWS APIs. This change gives the option to provide and IAM
role that the cloud provider will assume before calling the APIs. All
resources created by the role will be owned by that account instead of
the account where the EC2 instance is running.
2018-02-15 21:14:58 -08:00
Kubernetes Submit Queue 27daaab224
Merge pull request #59323 from zetaab/nodetaint
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

add node shutdown taint

**What this PR does / why we need it**: we need node stopped taint in order to detach volumes immediately without waiting timeout. More info in issue ticket #58635 

**Which issue(s) this PR fixes** 
Fixes #58635

**Special notes for your reviewer**:

**Release note**:
```release-note
NONE
```
2018-02-15 09:52:10 -08:00
Jan Safranek 161972272c Add jsafrane as AWS approver. 2018-02-14 17:41:18 +01:00
Kubernetes Submit Queue 757c24d224
Merge pull request #57969 from jsafrane/aws-approver
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add gnufied as AWS approver.

@gnufied has been maintaining the storage part of AWS cloud provider for a long while and he deserves to be approver.

```release-note
NONE
```

/sig aws
2018-02-12 19:41:02 -08:00
Tomas Smetana 6832d3a056 Fix #59601: AWS: Check error code returned from describeVolume
The errors returned by the describeVolume call are not all equal:
if the error is of InvalidVolume.NotFound type it does not necessarily
mean the desired operation cannot be finished successfully.

Fixes #59601
2018-02-12 17:15:11 +01:00
Kubernetes Submit Queue 30f54b4028
Merge pull request #56974 from gnufied/speed-up-attach-detach
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Make AWS attach/detach operations faster

Most attach/detach operations on AWS finish within 1-4seconds.
By using a shorter time interval and higher exponetial
factor  we can shorten time taken for attach and detach to complete. 

After this change retry interval looks like:

```
[1, 1.8, 3.24, 5.832000000000001, 10.4976]
```

Before it was:
```
[10, 12.0, 14.399999999999999, 17.279999999999998]
```


/sig aws

```release-note
AWS: Make attach/detach operations faster. from 10-12s to 2-6s
```
2018-02-08 20:24:10 -08:00
Tomas Smetana c879e531f9 AWS: Do not ignore errors from EC2::DescribeVolumee in DetachDisk
The DetachDisk method of AWS cloudprovider indirectly calls
EC2::DescribeVolume AWS API function to check if the volume
being detached is really attached to the specified node.

The AWS API call may fail and return error which is logged however
the DetachDisk then finishes successfully. This may cause the AWS
volumes to remain attached to the instances forever because the
attach/detach controller will mark the volume as attached. The PV
controller will never be able to delete those disks and they need
to be detached manually.

This patch ensures on error from DescribeVolume is propagated to
attach/detach controller and the detach operation is re-tried.
2018-02-08 13:51:53 +01:00
Jesse Haka 3cf5b172fa add node shutdown taint
shutdowned -> stopped

use shutdown everywhere

use patch in taints api call

use notimplemented in clouds use AddOrUpdateTaintOnNode

correct log text

add fake cloud

try to fix bazel

add shutdown tests

add context
2018-02-08 12:56:06 +02:00
Walter Fender e18e8ec3c0 Add context to all relevant cloud APIs
This adds context to all the relevant cloud provider interface signatures.
Callers of those APIs are currently satisfied using context.TODO().
There will be follow on PRs to push the context through the stack.
For an idea of the full scope of this change please look at PR #58532.
2018-02-06 12:49:17 -08:00
Andrew Garrett 20a682b643 Tag Security Group created for AWS ELB with same additional tags as ELB
Reuse the
service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags
annotation for these tags.

We think that this is a simpler and less intrusive approach than adding
a new annotation, and most users will want the same set of tags applied.

Signed-off-by: Brett Kochendorfer <brett.kochendorfer@gmail.com>
2018-01-24 20:16:45 +00:00
Kubernetes Submit Queue eff07b6f55
Merge pull request #56759 from aledbf/fix-nlb-icmp
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix NLB icmp permission duplication

**What this PR does / why we need it**:

Fixes an issue with the ICMP rule for MTU during the creation of a NLB

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:

Fixes #56703
2018-01-09 12:20:29 -08:00
Jan Safranek 1a2e651a1c Add gnufied as AWS approver. 2018-01-08 14:21:24 +01:00
Kubernetes Submit Queue 21a15c5673
Merge pull request #50923 from rowleyaj/typo_fix
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix typo in comment
2018-01-07 09:03:45 -08:00
Andrew Pilloud a57d492713 Move DefaultMaxEBSVolumes constant into scheduler
A constant only used by the scheduler lives in the aws cloudprovider
package. Moving the constant into the only package where it is used
reduces import bloat.
2018-01-03 08:19:38 -08:00
Jeff Grafton efee0704c6 Autogenerate BUILD files 2017-12-23 13:12:11 -08:00
Micah Hausler 4fa5be320b Fix AWS NLB delete error 2017-12-22 17:10:36 -05:00
Kubernetes Submit Queue 2ae99cf146
Merge pull request #56955 from feiskyer/scrub-dns
Automatic merge from submit-queue (batch tested with PRs 56997, 57008, 56984, 56975, 56955). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Remove unused ScrubDNS interface from cloudprovider

**What this PR does / why we need it**:

DNS scrubber from kubelet has been removed in #36785 and cloudprovider's `ScrubDNS()` interface is not used anywhere.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #56953.

**Special notes for your reviewer**:

**Release note**:

```release-note
Remove ScrubDNS interface from cloudprovider.
```
2017-12-16 15:23:54 -08:00
chrislovecnm 4a16f16af4 Adding myself as a reviewer to aws cloud provider 2017-12-13 22:42:07 -07:00
Manuel de Brito Fontes bea2d8d1aa Fix NLB icmp permission duplication 2017-12-13 20:31:09 -03:00
Hemant Kumar 0e6a541036 Make AWS attach/detach operations faster
Most attach/detach operations on AWS finish within 1-4seconds.
By using a shorter time interval and higher exponetial
factor  we can shorten time taken for attach and detach to complete.
2017-12-08 15:28:58 -05:00
Pengfei Ni 65efeee64f Remove unused ScrubDNS interface from cloudprovider 2017-12-08 16:03:56 +08:00
Hemant Kumar 514f219c22 cloud-provider needs cluster-role to apply taint to the node
When volume is stuck in attaching state on AWS, cloud-provider
needs to taint the node. But the node can not be tainted
without proper access.
2017-12-04 10:57:21 -05:00
Justin Santa Barbara 8bfb676378 AWS: Support for mounting nvme volumes 2017-11-30 14:48:33 -05:00
Hemant Kumar ac2c68ad8f AWS: Implement fix for detaching volume from stopped instances
Clean up detach disk functions and remove duplication
2017-11-23 11:02:09 -05:00
Hemant Kumar 8c49d1db02 Implement disk resizing for AWS
Update bazel files
2017-11-22 21:38:54 -05:00
Kubernetes Submit Queue 6b1b6d504a
Merge pull request #56024 from dimpavloff/aws-elb-set-hc-params
Automatic merge from submit-queue (batch tested with PRs 56211, 56024). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

allow ELB Healthcheck configuration via Service annotations

**What this PR does / why we need it**:
The default settings which are set on the ELB HC work well but there are cases when it would be better to tweak its parameters -- for example, faster detection of unhealthy backends. This PR makes it possible to override any of the healthcheck's parameters via annotations on the Service, with the exception of the Target setting which continues to be inferred from the Service's spec.

**Release note**:
```release-note
It is now possible to override the healthcheck parameters for AWS ELBs via annotations on the corresponding service. The new annotations are `healthy-threshold`, `unhealthy-threshold`, `timeout`, `interval` (all prefixed with `service.beta.kubernetes.io/aws-load-balancer-healthcheck-`)
```
2017-11-22 08:48:43 -08:00
Kubernetes Submit Queue 63d4b85bf4
Merge pull request #53400 from micahhausler/aws-nlb
Automatic merge from submit-queue (batch tested with PRs 54316, 53400, 55933, 55786, 55794). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add Amazon NLB support

**What this PR does / why we need it**:

This adds support for AWS's NLB for `LoadBalancer` services.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

Fixes #52173

**Special notes for your reviewer**:

This is NOT yet ready for merge, but I'd love any feedback before it is.

This requires at least `v1.10.40` of the [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go), which is not yet included in Kubernetes. Per @justinsb, I'm waiting on possibly #48314 to update to `v1.10.40`  or some other PR. 

I tried to make the change as easy to review as possible, so some LoadBalancer logic is duplicated in the `if isNLB(annotations)` blocks. I can refactor that and sprinkle more `isNLB()` switches around, but it might be harder to view the diff.

**Other Notes:**

* NLB's subnets cannot be modified after creation (maybe look for public subnets in all AZ's?).  Currently, I'm just using `c.findELBSubnets()`
* Health check uses TCP with all the NLB default values. I was thinking HTTP health checks via annotation could be added later. Should that go into this PR?
* ~~`externalTrafficPolicy`/`healthCheckNodePort` are ignored. Should those be implemented for this PR?~~
* `externalTrafficPolicy` and subsequent `healthCheckNodePort` are handled properly. This may come with uneven load balancing, as NLB doesn't support weighted backends.
* With classic ELB, you have a security group the ELB is inside of to associate Instance (k8s node) SG rules with a LoadBalancer (k8s Service), but NLB's don't have a security group. Instead, I use the `Description` field on [`ec2.IpRange`](https://docs.aws.amazon.com/sdk-for-go/api/service/ec2/#IpRange) with the following annotations. Is this ok? I couldn't think of another way to associate SG rule to the NLB
    * Node SG gets an rule added for VPC cidr on NodePort for Health Check with annotation in description `kubernetes.io/rule/nlb/health=<loadBalancerName>`
    * Node SG gets an rule added for `loadBalancerSourceRanges` to  NodePort for client traffic with annotation in description `kubernetes.io/rule/nlb/client=<loadBalancerName>`
    * **Note: if `loadBalancerSourceRanges` is unspecified, this opens instance security groups to traffic from `0.0.0.0/0` on the service's nodePorts**
* Respects internal annotation
* Creates a TargetGroup per frontend port: simplifies updates when you have same backend port for multiple front end ports.
* Does not (yet) verify that we're under the NLB limits in terms of # of listeners
* `UpdateLoadBalancer()`  basically just calls `EnsureLoadBalancer` for NLB's. Is this ok?

**Areas for future improvement or optimization**:

* A new annotation indicating a new security group should be created for NLB traffic and instances would be placed in this new SG. (Could bump up against the default limit of 5 SG's per instance)
* Only create a client health check security group rule when the VPC cidr is not a subset of `spec.loadBalancerSourceRanges`
* Consolidate TargetGroups if a service has 2+ frontend ports and the same nodePort.
* A new annotation for specifying TargetGroup Health Check options.

**Release note**:

```release-notes
Add Amazon NLB support - Fixes #52173
```

ping @justinsb @bchav
2017-11-21 15:04:25 -08:00
Kubernetes Submit Queue d1e711a6af
Merge pull request #55307 from xiangpengzhao/fix-aws-panic
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Check if SleepDelay of AWS request is nil before sign.

**What this PR does / why we need it**:

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #55309

**Special notes for your reviewer**:
/cc @justinsb 

**Release note**:

```release-note
NONE
```
2017-11-21 06:47:30 -08:00