Commit Graph

47278 Commits (09747e6bee78fb8b71356309d6ede84142aee9eb)

Author SHA1 Message Date
Kubernetes Submit Queue 09747e6bee Merge pull request #44510 from bowei/gce-metrics
Automatic merge from submit-queue (batch tested with PRs 44124, 44510)

Add metrics to all major gce operations (latency, errors)

```release-note
Add metrics to all major gce operations {latency, errors}

The new metrics are:

  cloudprovider_gce_api_request_duration_seconds{request, region, zone}
  cloudprovider_gce_api_request_errors{request, region, zone}
 
`request` is the specific function that is used.
`region` is the target region (Will be "<n/a>" if not applicable)
`zone` is the target zone (Will be "<n/a>" if not applicable)

Note: this fixes some issues with the previous implementation of
metrics for disks:
- Time duration tracked was of the initial API call, not the entire
  operation.
- Metrics label tuple would have resulted in many independent
  histograms stored, one for each disk. (Did not aggregate well).
```
2017-04-27 16:14:58 -07:00
Kubernetes Submit Queue 684df6e421 Merge pull request #44124 from vmware/VSANPolicySupportPVCScaleCreationFix
Automatic merge from submit-queue (batch tested with PRs 44124, 44510)

Optimize the time taken to create Persistent volumes with VSAN storage capabilities at scale and handle VPXD crashes

Currently creating persistent volumes with VSAN storage capabilities at scale is taking very large amount of time. We have tested at the scale of 500-600 PVC's and its more time for all the PVC requests to go from Pending state to Bound state. 

- In our current design we use a single systemVM - "kubernetes-helper-vm" as a means to create a persistent volume with the VSAN policy configured. 

- Since all the operations are on a single system VM, all requests on scale get queued and executed serially on this system VM. Because of this creating a high number of PVC's is taking very large time.

- Since its a single system VM, all parallel PVC requests most of the time tend to take the same SCSI adapter on the system VM and also same unit number on the SCSI adapter. Therefore the error rate is high.

Inorder to overcome these issues and to optimize the time taken to create persistent volumes with VSAN storage capabilities at scale we have slightly modified the design which is described below:

- In this model, we create a VM on the fly for every persistent volume that is being created. Since all the reconfigure operations to create a disk with the VSAN policy configured are on their individual VM's, all of these PVC's request execute in parallel independent one other.

- With this new design, there will no error rate at all.

Also, we have overcome the problem of vpxd crashes and any other intermediate problems by checking type of the errors.

Fixes https://github.com/vmware/kubernetes/issues/122, https://github.com/vmware/kubernetes/issues/124

@kerneltime  @tusharnt @divyenpatel @pdhamdhere

**Release note**:

```release-note
None
```
2017-04-27 16:14:56 -07:00
Kubernetes Submit Queue c2595909e9 Merge pull request #44966 from a-robinson/insecure
Automatic merge from submit-queue

Fix cockroachdb statefulset test read/write commands

Explicitly specifying `--insecure` is required on insecure clusters,
which started being enforced in a very recent release. In 2 weeks
we'll have a stable image version that we can reliably pin the
relevant statefulset yaml file to in order to avoid stupid failures
like this. I'm really sorry for the flakes!

**What this PR does / why we need it**:

It fixes the currently broken statefulset test suite - https://storage.googleapis.com/k8s-gubernator/triage/index.html?job=gci-gce-statefulset&test=CockroachDB

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:

N/A

**Special notes for your reviewer**:

N/A

**Release note**:

```release-note
NONE
```

@kow3ns
2017-04-27 14:56:35 -07:00
Kubernetes Submit Queue 963e056515 Merge pull request #45044 from juju-solutions/gkk/e2e-snap
Automatic merge from submit-queue (batch tested with PRs 42740, 44980, 45039, 41627, 45044)

Update kubernetes-e2e charm to use snaps

**What this PR does / why we need it**:

This updates the kubernetes-e2e charm to use snaps instead of Juju resources for payload delivery.

The main advantage of this is that it decouples the charm from the e2e payload, allowing us to support multiple versions of Kubernetes with a single release of the charm.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
Update kubernetes-e2e charm to use snaps
```
2017-04-27 13:27:09 -07:00
Kubernetes Submit Queue 8b9625d2ea Merge pull request #41627 from gyliu513/kubelet-types
Automatic merge from submit-queue (batch tested with PRs 42740, 44980, 45039, 41627, 45044)

Improved code coverage for /pkg/kubelet/types

**What this PR does / why we need it**:
The test coverage for /pkg/kubelet/types was increased from 50% to 87.5%

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
```
2017-04-27 13:27:06 -07:00
Kubernetes Submit Queue a3c4d9d603 Merge pull request #45039 from shyamjvs/report-metrics-grabber-error
Automatic merge from submit-queue (batch tested with PRs 42740, 44980, 45039, 41627, 45044)

Log the error (if any) in e2e metrics gathering step

Because why not.

Ref https://github.com/kubernetes/kubernetes/issues/45038

cc @wojtek-t @gmarek
2017-04-27 13:27:04 -07:00
Kubernetes Submit Queue a984a7ed09 Merge pull request #44980 from csbell/sync-daemonset
Automatic merge from submit-queue (batch tested with PRs 42740, 44980, 45039, 41627, 45044)

[Federation] Convert Daemonset to use the generic sync controller

To be rebased on master when @perotinus's configmaps PR merges.

Tested integration and e2e.
2017-04-27 13:27:02 -07:00
Kubernetes Submit Queue 8ab63dd9ea Merge pull request #42740 from mtaufen/tarball-cleanup
Automatic merge from submit-queue (batch tested with PRs 42740, 44980, 45039, 41627, 45044)

Cleanup some of the tarball producing code for e2e node tests

This is some e2e node cleanup work I found sitting in a local branch while deleting old local git branches. It looks like it's still useful.
2017-04-27 13:27:00 -07:00
Bowei Du ee847ebf8a Add metrics to all major gce operations {latency, errors}
The new metrics is:

  cloudprovider_gce_api_request_duration_seconds{request, region, zone}
  cloudprovider_gce_api_request_errors{request, region, zone}

`request` is the specific function that is used.
`region` is the target region (Will be "<n/a>" if not applicable)
`zone` is the target zone (Will be "<n/a>" if not applicable)

Note: this fixes some issues with the previous implementation of
metrics for disks:
- Time duration tracked was of the initial API call, not the entire
  operation.
- Metrics label tuple would have resulted in many independent
  histograms stored, one for each disk. (Did not aggregate well).
2017-04-27 12:49:30 -07:00
Mike Danese 310c914a6d Merge pull request #45046 from mikedanese/fix-build
update libc to pick up some fixes
2017-04-27 12:20:49 -07:00
Kubernetes Submit Queue 33f51926f6 Merge pull request #45027 from MaciekPytel/ca_test_gcloud_log
Automatic merge from submit-queue (batch tested with PRs 41106, 44346, 44929, 44979, 45027)

Log error before failing in autoscaling e2e

The gcloud alpha command in e2e fails, but no useful information (error message) is logged.
2017-04-27 12:11:08 -07:00
Kubernetes Submit Queue 6b38c11dbe Merge pull request #44979 from shyamjvs/fix-metrics-json-filename
Automatic merge from submit-queue (batch tested with PRs 41106, 44346, 44929, 44979, 45027)

Make metrics filenames for e2e tests indicate the test better

Currently the names of the json files with metrics for e2e tests are named by appending the `SummaryKind` with a timestamp of the test. It took me some time to figure out which file corresponds to which e2e test due to this. Changing it to have the testname instead of the timestamp.

cc @wojtek-t @gmarek
2017-04-27 12:11:06 -07:00
Kubernetes Submit Queue 6251ff47c3 Merge pull request #44929 from liggitt/proxy-subresource-patch
Automatic merge from submit-queue (batch tested with PRs 41106, 44346, 44929, 44979, 45027)

Add PATCH to supported list of proxy subresource verbs

Follow up to #41421 for the proxy subresources

```release-note
The proxy subresource APIs for nodes, services, and pods now support the HTTP PATCH method.
```
2017-04-27 12:11:03 -07:00
Kubernetes Submit Queue 14a557b1a2 Merge pull request #44346 from mikedanese/build-static
Automatic merge from submit-queue (batch tested with PRs 41106, 44346, 44929, 44979, 45027)

bazel: statically link dockerized components
2017-04-27 12:11:00 -07:00
Mike Danese 7bb880de8d update libc 2017-04-27 11:59:18 -07:00
Kubernetes Submit Queue 98398d5d6e Merge pull request #41106 from spxtr/gen3
Automatic merge from submit-queue

Don't check in zz_generated.openapi.go.

`zz_generated.openapi.go` is the file that causes the most merge conflicts of all. In #33440, @thockin updated the makefile to support generating these files on demand, but that didn't play well with bazel/gazel.

In this PR, I add a new build macro that will generate this file with a `go_genrule`. I added support for keeping the BUILD file up to date in mikedanese/gazel#34.

**Release note**:
```release-note
NONE
```
2017-04-27 11:40:31 -07:00
Kubernetes Submit Queue a2eb8888fb Merge pull request #45031 from crassirostris/fluent-gcp-monitoring-fix
Automatic merge from submit-queue

Remove too verbose label from fluentd metrics

/cc @fabxc
2017-04-27 10:55:20 -07:00
Kubernetes Submit Queue 68850b87b1 Merge pull request #44549 from shashidharatd/federation-e2e
Automatic merge from submit-queue (batch tested with PRs 44591, 44549)

[Federation][e2e] Fix a failing federation e2e testcase in gce-serial

This is to fix the failing test case in federation [gce-serial](https://k8s-testgrid.appspot.com/cluster-federation#gce-serial) tests. The test case has been failing consistently since we registered the clusters in suite-init instead of doing it in every test case.
Instead of registering and then unregistering, we will be now unregistering and then registering the cluster to federation. this test will be run in serial and will not affect other test cases too.


**Release note**:
```
NONE
```
2017-04-27 10:54:57 -07:00
Kubernetes Submit Queue 549bd4b7d5 Merge pull request #44591 from ixdy/bazel-push-build
Automatic merge from submit-queue (batch tested with PRs 44591, 44549)

Update repo-infra bazel dependency and use new gcs_upload rule

This PR provides similar functionality to push-build.sh entirely within Bazel rules (though it relies on gsutil).

It's an alternative to #44306.

Depends on https://github.com/kubernetes/repo-infra/pull/13.

**Release note**:

```release-note
NONE
```
2017-04-27 10:54:56 -07:00
Shyam Jeedigunta 87bfad85b1 Log the error (if any) in e2e metrics gathering step 2017-04-27 18:37:52 +02:00
Kubernetes Submit Queue 493e4486b6 Merge pull request #45017 from MaciekPytel/ca_logging
Automatic merge from submit-queue

Update cluster-autoscaler logging config

Previously cluster-autoscaler would duplicate all logs,
writing to master /var/log and /tmp inside pod.
2017-04-27 09:23:12 -07:00
Kubernetes Submit Queue 120fd322bd Merge pull request #44804 from humblec/glusterfs-rearrange
Automatic merge from submit-queue (batch tested with PRs 44996, 44804)

Rearrange glusterfs artifacts to own directory.
2017-04-27 08:57:32 -07:00
Kubernetes Submit Queue dc92a6fcc7 Merge pull request #44996 from liggitt/token-test
Automatic merge from submit-queue

Update token controller test to test async retry

Fixes #44819

https://github.com/kubernetes/kubernetes/pull/44625 changed the token controller to queue a retry if the live service account's resourceVersion did not match our cache.

This updates the unit test that was testing that condition to test async queue behavior (which this condition now drives)
2017-04-27 08:39:04 -07:00
Mik Vyatskov 0625697dd5 Remove too verbose label from fluentd metrics 2017-04-27 17:16:25 +02:00
Christian Bell 15e81959e9 [Federation] Convert Daemonset to use the generic sync controller 2017-04-27 08:07:33 -07:00
Maciej Pytel adc1d6a428 Log error before failing in autoscaling e2e 2017-04-27 16:52:53 +02:00
Jordan Liggitt 3f4ded12be
Add PATCH to supported list of proxy subresource verbs 2017-04-27 10:38:10 -04:00
Marcin Wielgus e36cca4b5d Update CHANGELOG.md for v1.5.7. 2017-04-27 08:07:09 -04:00
Kubernetes Submit Queue a2f40cafcf Merge pull request #44847 from crassirostris/sd-logging-e2e-timeout
Automatic merge from submit-queue

Increase timeout for Stackdriver Logging e2e tests

They're failing in CI, because Stackdriver Logging's List method is too slow for this purpose. Quick fix, should be gone completely when reading is implemented properly

/cc @piosz
2017-04-27 05:03:05 -07:00
Kubernetes Submit Queue f81e272483 Merge pull request #44993 from FengyunPan/Cleanup
Automatic merge from submit-queue

Cleanup storeToClusterConditionLister

ClusterConditionPredicate() has been deleted,
storeToClusterConditionLister will be unused.
2017-04-27 05:02:56 -07:00
Maciej Pytel b6574bd7b9 Update cluster-autoscaler logging config
Previously it would duplicate all logs,
writing to master /var/log and /tmp inside pod.
2017-04-27 13:32:32 +02:00
Shyam Jeedigunta 647a1563dc Make metrics filenames for e2e tests indicate the test 2017-04-27 13:03:39 +02:00
Kubernetes Submit Queue 0597a85f51 Merge pull request #41197 from aleksandra-malinowska/monitoring-test
Automatic merge from submit-queue

Add Stackdriver monitoring test
2017-04-27 03:44:22 -07:00
shashidharatd 814bb0b80f Fix a failing test case for cluster object creation/deletion 2017-04-27 14:42:19 +05:30
Aleksandra Malinowska 8c335ea4db Add monitoring test 2017-04-27 11:06:37 +02:00
Kubernetes Submit Queue 65838085b0 Merge pull request #43618 from xilabao/fix-kubectl-run-output
Automatic merge from submit-queue (batch tested with PRs 44970, 43618)

fix kubectl run output

fixes https://github.com/kubernetes/kubernetes/issues/40440
2017-04-26 22:58:13 -07:00
Kubernetes Submit Queue c3df35df7b Merge pull request #44970 from Random-Liu/fix-stop-container-timeout
Automatic merge from submit-queue (batch tested with PRs 44970, 43618)

CRI: Fix StopContainer timeout

Fixes https://github.com/kubernetes/kubernetes/issues/44956.

I verified this PR with the example provided in https://github.com/kubernetes/kubernetes/issues/44956, and now pod deletion will respect grace period timeout:
```
NAME                         READY     STATUS        RESTARTS   AGE
gracefully-terminating-pod   1/1       Terminating   0          6m
```

@dchen1107 @yujuhong @feiskyer /cc @kubernetes/sig-node-bugs
2017-04-26 22:58:11 -07:00
Jordan Liggitt f1207de4ea
Update token controller test to test async retry 2017-04-27 00:33:27 -04:00
Kubernetes Submit Queue e885c77ffd Merge pull request #44986 from dashpole/fix_image_gc
Automatic merge from submit-queue

Allow Partial Success for ImageGC

Fixes #44951.  When the eviction manager is under disk pressure, it first attempts to reclaim disk space by deleting images.  However, if there are any errors during the image deletion process, the eviction manager treats that as a failed attempt delete images--even if some were successfully deleted.

This change essentially makes the eviction manager ignore errors during image garbage collection, and instead rely solely on the quantity of resources reclaimed.  If image deletion completely fails, for example, then this should still work as it would return 0 bytes freed.  This allows for partial success, because any resources freed are counted, regardless of if some images fail to be deleted, for example.

This does not require any changes to the image manager, as the current behavior is already to return the disk space freed along with any errors.

```release-note
Fixes a bug where pods were evicted even after images are successfully deleted.
```

cc @dchen1107 @vishh @kubernetes/kubernetes-release-managers

note to reviewers: this is mostly whitespace changes, so it will make more sense in reviewable
2017-04-26 20:52:18 -07:00
FengyunPan 7d4c66c5b5 Cleanup storeToClusterConditionLister
ClusterConditionPredicate() has been deleted,
storeToClusterConditionLister will be unused.
2017-04-27 11:51:26 +08:00
Kubernetes Submit Queue 2e7cc0222d Merge pull request #44935 from yifan-gu/fix_poll
Automatic merge from submit-queue (batch tested with PRs 44940, 44974, 44935)

apimachinery/pkg/util/wait: Fix potential goroutine leak in pollInternal().

**What this PR does / why we need it**:

Without the change, the wait function wouldn't exit until the timeout
happens, so if the timeout is set to a big value and the Poll() is run
inside a loop, then the total goroutines will increase indefinitely.

This PR fixes the issue by closing the stop channel to tell the wait function
to exit immediately if condition is true or any error happens.
2017-04-26 20:34:14 -07:00
Kubernetes Submit Queue c446132a97 Merge pull request #44974 from caesarxuchao/remove-client-go-api-listers
Automatic merge from submit-queue (batch tested with PRs 44940, 44974, 44935)

Remove import of internal api package in generated external-versioned listers

Follow up of https://github.com/kubernetes/kubernetes/pull/44523

One line change in cmd/libs/go2idl/lister-gen/generators/lister.go, and simple changes in pkg/apis/autoscaling/v2alpha1/register.go, other changes are generated.

The internal api package will be eliminated from client-go, so these imports should be removed. Also, it's more correct to report the versioned resource in the error.
2017-04-26 20:34:13 -07:00
Kubernetes Submit Queue a92007b43c Merge pull request #44940 from sjenning/bump-runc
Automatic merge from submit-queue

Bump runc to d223e2a

Fixes https://github.com/kubernetes/kubernetes/issues/43856

@derekwaynecarr
2017-04-26 20:09:24 -07:00
Kubernetes Submit Queue 904b020756 Merge pull request #43469 from enisoc/has-conflicts
Automatic merge from submit-queue

Fix mergepatch.HasConflicts().

**What this PR does / why we need it**:

This fixes some false negatives:

* If a map had multiple entries, only the first was checked.
* If a list had multiple entries, only the first was checked.

**Which issue this PR fixes**:

**Special notes for your reviewer**:

**Release note**:
```release-note
NONE
```
2017-04-26 18:32:33 -07:00
Kubernetes Submit Queue 2c8a156579 Merge pull request #43188 from vmware/VSANPolicyTest
Automatic merge from submit-queue

e2e tests for VSAN policy support in Kubernetes for vSphere

Following e2e test cases have been implemented for VSAN policy support in Kubernetes for vSphere. These e2e tests are for PR #42974 which out for review.

A total of 8 test cases for below mentioned 5 different scenarios are implemented.

Test cases:

1. Validation of VSAN capabilities.
- hostFailuresToTolerate : Minimum 1 and Max 3. Should be integer.
- stripeWidth: Minimum is 1 and Maximum is 12. Should be integer.
- proportionalCapacity: Expressed in percentage. Should be between 0 and 100. Should be integer.
- iopsLimit: Should be greater than 0. Should be integer.

2. Use a VSAN testbed setup. Try valid VSAN capabilities which are supported by VSAN testbed. Make sure the disk is created with policy configured with it.
- Ex: Using hostFailuresToTolerate=0 and cacheReservation=12
Ex: diskStripes=1 and objectSpaceReservation=30

3. Use a VSAN testbed setup. Try valid VSAN capabilities which are not supported by VSAN testbed. Make sure that the disk creation fails and PVC status is pending.

4. Try using VSAN capabilities on a non-VSAN datastore. PVC status will be pending and it errors to the user saying to VSAN capabilities are not supported on a non-VSAN testbed.

5. Try all 1 to 5 with custom datastore specified by the user.

@jeffvance  @divyenpatel

**Release note**:

```release-note
None
```
2017-04-26 17:49:08 -07:00
David Ashpole 958e290c8d still consider quantity reclaimed even when errors are returned 2017-04-26 17:40:30 -07:00
Kubernetes Submit Queue 433aec11c8 Merge pull request #44531 from pwittrock/kubectl-openapi
Automatic merge from submit-queue

OpenAPI support for kubectl

Support for openapi spec in kubectl.

Includes:
- downloading and caching openapi spec to a local file
- parsing openapi spec into binary serializable datastructures (10x faster load times 600ms -> 40ms)
- caching parsed openapi spec in memory for each command

```release-note
NONE
```
2017-04-26 16:59:17 -07:00
Random-Liu cfd0efff11 Fix StopContainer timeout 2017-04-26 15:48:12 -07:00
Kubernetes Submit Queue 15533fac30 Merge pull request #44967 from chuckbutler/43461
Automatic merge from submit-queue

Fixes #43461

**What this PR does / why we need it**:
The master-components started state triggers a daemon recycle. The guard
was to prevent the daemons from being cycled too often and interrupting
normal workflow. This additional state check is probing the etcd
connection and if changing triggers a re-configure and recycle of the api-control 
plane when etcd units are scaling up and down.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #43461

**Special notes for your reviewer**:
Check the contents of /var/snap/kube-apiserver/current/args after scaling etcd both up and down and the values will have changed, and kube-apiserver will have recycled to read the new connection string.

**Release note**:

```release-note
kubernetes-master juju charm properly detects etcd-scale events and reconfigures appropriately.
```
2017-04-26 15:41:00 -07:00
Kubernetes Submit Queue 274df99e9b Merge pull request #44451 from ncdc/spdy-follow-redirects
Automatic merge from submit-queue

Add redirect support to SpdyRoundTripper

Add support for following redirects to the SpdyRoundTripper. This is
necessary for clients using it directly (e.g. the apiserver talking
directly to the kubelet) because the CRI streaming server issues a
redirect for streaming requests.

We need this in OpenShift because we have code that executes inside our apiserver that talks directly to the node to perform an attach request, and we need to be able to follow that redirect.

This code was adapted from the upgrade-aware proxy handler.

cc @smarterclayton @sttts @liggitt @timstclair @kubernetes/sig-api-machinery-pr-reviews
2017-04-26 14:47:41 -07:00