Commit Graph

2577 Commits (286bcc6f5c0e4b1bd9315cb5b898966b48efb4b1)

Author SHA1 Message Date
Kubernetes Submit Queue cc6e51c6e8 Merge pull request #45427 from ncdc/gc-shared-informers
Automatic merge from submit-queue (batch tested with PRs 46201, 45952, 45427, 46247, 46062)

Use shared informers in gc controller if possible

Modify the garbage collector controller to try to use shared informers for resources, if possible, to reduce the number of unique reflectors listing and watching the same thing.

cc @kubernetes/sig-api-machinery-pr-reviews @caesarxuchao @deads2k @liggitt @sttts @smarterclayton @timothysc @soltysh @kargakis @kubernetes/rh-cluster-infra @derekwaynecarr @wojtek-t @gmarek
2017-05-22 20:58:03 -07:00
Kubernetes Submit Queue c2c5051adf Merge pull request #44899 from smarterclayton/burst
Automatic merge from submit-queue (batch tested with PRs 38990, 45781, 46225, 44899, 43663)

Support parallel scaling on StatefulSets

Fixes #41255

```release-note
StatefulSets now include an alpha scaling feature accessible by setting the `spec.podManagementPolicy` field to `Parallel`.  The controller will not wait for pods to be ready before adding the other pods, and will replace deleted pods as needed.  Since parallel scaling creates pods out of order, you cannot depend on predictable membership changes within your set.
```
2017-05-22 19:07:09 -07:00
Eric Tune b17e3c14eb Move PDB controller and type ownership to SIG-Apps
Created OWNERS_ALIASES called sig-apps-reviewers from the union of reviewers in:
 pkg/controller/{cronjob,deployment,daemon,job,replicaset,statefulset}/OWNERS
except removed inactive user bprashanth

Created OWNERS_ALIASES called sig-apps-api-reviewers as the intersection
of sig-apps-reviewers and the approvers from pkg/api/OWNERS.

Used those OWNERS_ALIASES as the reviewers/approvers for the disruption controller,
and API.
2017-05-22 12:55:28 -07:00
Andy Goldstein 2480f2ceb6 Use shared informers in gc controller if possible 2017-05-22 12:51:37 -04:00
Kubernetes Submit Queue 16b5093feb Merge pull request #46037 from ncdc/ns-controller-aggregate-errors
Automatic merge from submit-queue (batch tested with PRs 46164, 45471, 46037)

NS controller: don't stop deleting GVRs on error

**What this PR does / why we need it**:

If the namespace controller encounters an error trying to delete a
single GroupVersionResource, add the error to an aggregated list of
errors and continue attempting to delete all the GroupVersionResources
instead of stopping at the first error. Return the aggregated error list
(if any) when done. This allows us to delete as much of the content in
the namespace as we can in each pass.

**Special notes for your reviewer**:

This may help with some of the namespace deletions taking too long in our e2e tests.

**Release note**:

```release-note
```
2017-05-22 09:08:56 -07:00
Kubernetes Submit Queue 574608d2e9 Merge pull request #46169 from kargakis/progress-when-ready
Automatic merge from submit-queue (batch tested with PRs 45864, 46169)

Account newly ready replicas as progress

@kubernetes/sig-apps-pr-reviews
2017-05-22 08:08:56 -07:00
Clayton Coleman 20d45af694
Combine statefulset burst and monotonic scaling tests
Use subtests to avoid duplicating entire suite of control logic.
2017-05-21 01:14:30 -04:00
Clayton Coleman 2861ae5eb9
Support burst in stateful set scale up and down
The alpha field podManagementPolicy defines how pods are created,
deleted, and replaced. The new `Parallel` policy will replace pods
as fast as possible, not waiting for the pod to be `Ready` or providing
an order. This allows for advanced clustered software to take advantage
of rapid changes in scale.
2017-05-21 01:14:26 -04:00
Clayton Coleman ad720cc651
generated: bazel 2017-05-20 21:58:38 -04:00
Michail Kargakis 7910dc3131
Account newly ready replicas as progress
Signed-off-by: Michail Kargakis <mkargaki@redhat.com>
2017-05-20 21:14:50 +02:00
Clayton Coleman 784e3ae5fa
Switch the tokens controller to use shared informers
Tokens controller previously needed a bit of extra help in order to be
safe for concurrent use. The new MutationCache allows it to keep a local
cache and still use a shared informer. The filtering event handler lets
it only see changes to secrets it cares about.
2017-05-20 14:19:49 -04:00
Clayton Coleman 3e095d12b4
Refactor move of client-go/util/clock to apimachinery 2017-05-20 14:19:48 -04:00
Kubernetes Submit Queue f499606bfe Merge pull request #45346 from codablock/fix_double_attach
Automatic merge from submit-queue

Don't try to attach volumes which are already attached to other nodes

This PR is a replacement for https://github.com/kubernetes/kubernetes/pull/40148. I was not able to push fixes and rebases to the original branch as I don't have access to the Github organization anymore.

CC @saad-ali You probably have to update the PR link in [Q2 2017 (v1.7)](https://docs.google.com/spreadsheets/d/1t4z5DYKjX2ZDlkTpCnp18icRAQqOE85C1T1r2gqJVck/edit#gid=14624465)

I assume the PR will need a new "ok to test" 

**ORIGINAL PR DESCRIPTION**

This PR fixes an issue with the attach/detach volume controller. There are cases where the `desiredStateOfWorld` contains the same volume for multiple nodes, resulting in the attach/detach controller attaching this volume to multiple nodes. This of course fails for volumes like AWS EBS, Azure Disks, ...

I observed this situation on Azure when using Azure Disks and replication controllers which start to reschedule PODs. When you delete a POD that belongs to a RC, the RC will immediately schedule a new POD on another node. This results in a short time (max a few seconds) where you have 2 PODs which try to attach/mount the same volume on different nodes. As the old POD is still alive, the attach/detach controller does not try to detach the volume and starts to attach the volume to the new POD immediately.

This behavior was probably not noticed before on other clouds as the bogus attempt to attach probably fails pretty fast and thus is unnoticed. As the situation with the 2 PODs disappears after a few seconds, a detach for the old POD is initiated and thus the new POD can attach successfully.

On Azure however, attaching and detaching takes quite long, resulting in the first bogus attach attempt to already eat up much time.
When attaching fails on Azure and reports that it is already attached somewhere else, the cloud provider immediately does a detach call for the same volume+node it tried to attach to. This is done to make sure the failed attach request is aborted immediately. You can find this here: https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/azure/azure_storage.go#L74

The complete flow of attach->fail->abort eats up valuable time and the attach/detach controller can not proceed with other work while this is happening. This means, if the old POD disappears in the meantime, the controller can't even start the detach for the volume which delays the whole process of rescheduling and reattaching.

Also, I and other people have observed very strange behavior where disks ended up being "attached" to multiple VMs at the same time as reported by Azure Portal. This results in the controller to fail reattaching forever. It's hard to figure out why and when this happens and there is no reproducer known yet. I can imagine however that the described behavior correlates with what I described above.

I was not sure if there are actually cases where it is perfectly fine to have a volume mounted to multiple PODs/nodes. At least technically, this should be possible with network based volumes, e.g. nfs. Can someone with more knowledge about volumes help me here? I may need to add a check before skipping attaching in `reconcile`.

CC @colemickens @rootfs

-->
```release-note
Don't try to attach volume to new node if it is already attached to another node and the volume does not support multi-attach.
```
2017-05-19 21:54:42 -07:00
Kubernetes Submit Queue 2473c24f81 Merge pull request #45979 from bowei/owners
Automatic merge from submit-queue

Add bowei to OWNERS: e2e/test dns,network; cloud route, node, service…
2017-05-19 19:39:05 -07:00
Bowei Du 3af1c0efcb Add bowei to OWNERS: e2e/test dns,network; cloud route, node, service controller 2017-05-19 14:49:43 -07:00
Wojciech Tyczynski d2529bb6b6 Avoid sleep in endpoint controller 2017-05-19 13:57:36 +02:00
Andy Goldstein e8e87cb1c2 NS controller: don't stop deleting GVRs on error
If the namespace controller encounters an error trying to delete a
single GroupVersionResource, add the error to an aggregated list of
errors and continue attempting to delete all the GroupVersionResources
instead of stopping at the first error. Return the aggregated error list
(if any) when done. This allows us to delete as much of the content in
the namespace as we can in each pass.
2017-05-18 12:01:40 -04:00
Clayton Coleman bdd4d34c7d
generated: api changes 2017-05-18 10:07:47 -04:00
Alexander Block 06baeb33b2 Don't try to attach volumes which are already attached to other nodes 2017-05-18 06:56:30 +02:00
Kubernetes Submit Queue 7df0178076 Merge pull request #42975 from smarterclayton/time_namespace
Automatic merge from submit-queue (batch tested with PRs 40234, 45885, 42975)

Log how much time it takes e2e tests to clean up the namespace
2017-05-17 20:27:52 -07:00
Kubernetes Submit Queue 6dbe853e29 Merge pull request #45544 from ianchakeres/reconciler-err-cleanup
Automatic merge from submit-queue (batch tested with PRs 45990, 45544, 45745, 45742, 45678)

Refactor reconciler volume log and error messages

**What this PR does / why we need it**:
Utilizes volume-specific error and log messages introduced in #44969, inside files that also log volume information. 

Specifically: 

- pkg/kubelet/volumemanager/reconciler/reconciler.go, 
- pkg/controller/volume/attachdetach/reconciler/reconciler.go, and
- pkg/kubelet/volumemanager/populator/desired_state_of_world_populator.go


**Which issue this PR fixes** : fixes #40905

**Special notes for your reviewer**:

**Release note**:

```release-note
```
NONE
2017-05-17 18:40:51 -07:00
Clayton Coleman 7da310ea28
Fix namespace controller logging to be consistent
time.Now() was wrong, simplify namespace controller output
2017-05-17 17:45:05 -04:00
Kubernetes Submit Queue 4a9a702ee1 Merge pull request #45926 from MrHohn/api-annotations-move
Automatic merge from submit-queue

Move all API related annotations into annotation_key_constants.go

Separate from #45869. See https://github.com/kubernetes/kubernetes/pull/45869#discussion_r116839411 for details.

This PR does nothing but move constants around :)

/assign @caesarxuchao 

**Release note**:

```release-note
NONE
```
2017-05-17 10:34:53 -07:00
Kubernetes Submit Queue 8863bd4353 Merge pull request #45709 from YuPengZTE/devGetAllDaemonSetPods
Automatic merge from submit-queue (batch tested with PRs 45709, 41939)

delete err when return _

Signed-off-by: yupengzte <yu.peng36@zte.com.cn>



**What this PR does / why we need it**:

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
```
2017-05-16 23:11:49 -07:00
Zihong Zheng 5992425588 Autogenerated files 2017-05-16 21:55:51 -07:00
Zihong Zheng c0920f75cf Move API annotations into annotation_key_constants and remove api/annotations package 2017-05-16 21:55:23 -07:00
Kubernetes Submit Queue 3f0ebbe884 Merge pull request #45247 from mbohlool/c3
Automatic merge from submit-queue (batch tested with PRs 45247, 45810, 45034, 45898, 45899)

Apiregistration v1alpha1→v1beta1

Promoting apiregistration api from v1alpha1 to v1beta1.

API Registration is responsible for registering an API `Group`/`Version` with
another kubernetes like API server. The `APIService` holds information
about the other API server in `APIServiceSpec` type as well as general
`TypeMeta` and `ObjectMeta`. The `APIServiceSpec` type have the main
configuration needed to do the aggregation. Any request coming for
specified `Group`/`Version` will be directed to the service defined by
`ServiceReference` (on port 443) after validating the target using provided
`CABundle` or skipping validation if development flag `InsecureSkipTLSVerify`
is set. `Priority` is controlling the order of this API group in the overall
discovery document.
The return status is a set of conditions for this aggregation. Currently
there is only one condition named "Available", if true, it means the
api/server requests will be redirected to specified API server.

```release-note
API Registration is now in beta.
```
2017-05-16 19:01:55 -07:00
yupengzte e463c28db6 delete err when return _
Signed-off-by: yupengzte <yu.peng36@zte.com.cn>
2017-05-16 16:15:06 +08:00
Kubernetes Submit Queue 746f5d6a28 Merge pull request #45664 from tacy/fix45213
Automatic merge from submit-queue (batch tested with PRs 45664, 45861)

Fix #45213: Syncing jobs would return error when podController exception

**What this PR does / why we need it**:
Jobcontroller:  Syncing jobs would return error when podController exception
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
fixes #45213 
**Special notes for your reviewer**:

**Release note**:

```release-note
```
2017-05-15 21:08:48 -07:00
Kubernetes Submit Queue 24cb1cec64 Merge pull request #45758 from caesarxuchao/remove-vendor-heapster-v1alpha1
Automatic merge from submit-queue (batch tested with PRs 44337, 45775, 45832, 45574, 45758)

Stop vendoring heapster v1alpha1

Changes to use the one in staging/metrics.

TODO: remove the exception in https://github.com/kubernetes/kubernetes/pull/45176

Implementing https://github.com/kubernetes/kubernetes/issues/45498#issuecomment-301013423.
2017-05-15 18:39:20 -07:00
Kubernetes Submit Queue c03e4952a3 Merge pull request #45574 from zhangxiaoyu-zidif/format-daemondset-for
Automatic merge from submit-queue (batch tested with PRs 44337, 45775, 45832, 45574, 45758)

daemoncontroller.go:format for

**What this PR does / why we need it**:
format for.
delete redundant para.
make code clean.

**Release note**:

```release-note
NONE
```
2017-05-15 18:39:18 -07:00
Kubernetes Submit Queue 521d7d1ac5 Merge pull request #42472 from timchenxiaoyu/requesttypo
Automatic merge from submit-queue

fix request typo
2017-05-15 15:57:57 -07:00
mbohlool e2f20a3539 Promote apiregistration from v1alpha1 to v1beta1 2017-05-15 15:34:33 -07:00
Chao Xu 36e2d0b4cb hack/update-bazal.sh
hack/update-godep-license
2017-05-15 13:51:39 -07:00
Chao Xu d9c2fbbf23 rely on staging/metrics instead of on
vendor/k8s.io/heapster/metrics/apis/metrics/v1alpha1

fix some related imports
2017-05-15 13:50:49 -07:00
Kubernetes Submit Queue 3f054a03a9 Merge pull request #42471 from timchenxiaoyu/resourceytpo
Automatic merge from submit-queue

fix resource typo
2017-05-15 10:39:06 -07:00
lichunlong ab1476b1df Fix #45213: Syncing jobs would return error when podController exception 2017-05-15 22:52:50 +08:00
Kubernetes Submit Queue abaffb243e Merge pull request #45692 from caesarxuchao/limit-client-go-package-import-2
Automatic merge from submit-queue (batch tested with PRs 44748, 45692)

Limiting client go packages visibility, round 3

Continue the work in the merged PR https://github.com/kubernetes/kubernetes/pull/45258

These packages in client-go will be gone after #44065 is fixed:
pkg/api/helper, pkg/api/util, internal version of api groups, API install packages. 

This PR removes the dependency on these packages and add bazel visibility rules to prevent relapse.
2017-05-12 16:04:37 -07:00
Kubernetes Submit Queue 521badc4b7 Merge pull request #44748 from zhangxiaoyu-zidif/cleancode-1-graph_builder
Automatic merge from submit-queue

cleancode: graph_builder.go

It make the code clean and light.
2017-05-12 15:35:42 -07:00
Kubernetes Submit Queue da6fda3631 Merge pull request #45685 from derekwaynecarr/quota-enqueue
Automatic merge from submit-queue (batch tested with PRs 45685, 45572, 45624, 45723, 45733)

resource quota full resync was removed in error

**What this PR does / why we need it**:
the quota controller should have had a full resync interval, and it was inadvertently removed in the move to shared informers.

**Which issue this PR fixes** 
This fixes quota recalculation happening at the specified interval.

**Special notes for your reviewer**:

**Release note**:
```release-note
the resource quota controller was not adding quota to be resynced at proper interval
```
2017-05-12 14:00:50 -07:00
Ian Chakeres b1315f4491 Refactor reconciler volume log and error messages 2017-05-11 22:33:17 -07:00
Hemant Kumar 951a36aac7 Add Keepterminatedpodvolumes as a annotation on node
and lets make sure that controller respects it
and doesn't detaches mounted volumes.
2017-05-11 22:31:14 -04:00
Hemant Kumar 9a1a9cbe08 detach the volume when pod is terminated
Make sure volume is detached when pod is terminated because
of any reason and not deleted from api server.
2017-05-11 22:18:22 -04:00
Chao Xu 14045d253d hack/update-bazel.sh 2017-05-11 15:59:04 -07:00
Chao Xu c354076aa4 remove invocation of k8s.io/client-go/pkg/api/install
change import of client-go/api/helper to kubernetes/api/helper

remove unnecessary use of client-go/api.registry

change use of client-go/pkg/util to kubernetes/pkg/util

remove dependency on client-go/pkg/apis/extensions

remove unnecessary invocation of k8s.io/client-go/extension/intsall

change use of k8s.io/client-go/pkg/apis/authentication to v1
2017-05-11 15:03:46 -07:00
Derek Carr 430f078f93 resource quota full resync was removed in error 2017-05-11 15:58:55 -04:00
Kubernetes Submit Queue fc7ae99327 Merge pull request #45478 from HardySimpson/fix-endpoints-del
Automatic merge from submit-queue (batch tested with PRs 45569, 45602, 45604, 45478, 45550)

fix endpoints controller del lead-election endpoints

when there are multiple controller-manager instances,  we observe that it will delete leader-election endpoints after 5min,  and cause re-election, add a check to avoid that

Fixes #45585

error log

```
192.168.0.5 - - [02/May/2017:15:10:13 +0000] "GET /api/v1/endpoints HTTP/1.1" 200 1175 "-" "kube-controller-manager/V100R001C00B012 (linux/amd64) kubernetes/bede5a0/endpoint-controller"
192.168.0.5 - - [02/May/2017:15:10:13 +0000] "DELETE /api/v1/namespaces/kube-system/endpoints/kube-controller-manager HTTP/1.1" 200 46 "-" "kube-controller-manager/V100R001C00B012 (linux/amd64) kubernetes/bede5a0/endpoint-controller"
192.168.0.5 - - [02/May/2017:15:10:13 +0000] "DELETE /api/v1/namespaces/kube-system/endpoints/kube-scheduler HTTP/1.1" 200 46 "-" "kube-controller-manager/V100R001C00B012 (linux/amd64) kubernetes/bede5a0/endpoint-controller"
192.168.0.7 - - [02/May/2017:15:10:14 +0000] "GET /api/v1/namespaces/kube-system/endpoints/kube-scheduler HTTP/1.1" 404 123 "-" "kube-scheduler/V100R001C00B012 (linux/amd64) kubernetes/bede5a0"
192.168.0.7 - - [02/May/2017:15:10:14 +0000] "POST /api/v1/namespaces/kube-system/endpoints HTTP/1.1" 201 398 "-" "kube-scheduler/V100R001C00B012 (linux/amd64) kubernetes/bede5a0"
192.168.0.6 - - [02/May/2017:15:10:14 +0000] "GET /api/v1/namespaces/kube-system/endpoints/kube-controller-manager HTTP/1.1" 404 141 "-" "kube-controller-manager/V100R001C00B012 (linux/amd64) kubernetes/bede5a0"
192.168.0.6 - - [02/May/2017:15:10:14 +0000] "POST /api/v1/namespaces/kube-system/endpoints HTTP/1.1" 201 416 "-" "kube-controller-manager/V100R001C00B012 (linux/amd64) kubernetes/bede5a0"
192.168.0.7 - - [02/May/2017:15:10:14 +0000] "GET /api/v1/namespaces/kube-system/endpoints/kube-controller-manager HTTP/1.1" 200 416 "-" "kube-controller-manager/V100R001C00B012 (linux/amd64) ku
```



release-note

```release-note
none
```
2017-05-10 21:34:43 -07:00
zhangxiaoyu-zidif 00b67443f0 daemoncontroller.go:format for 2017-05-10 14:06:34 +08:00
Kubernetes Submit Queue f8f9d7db93 Merge pull request #45304 from deads2k/controller-03-ns-discovery
Automatic merge from submit-queue (batch tested with PRs 45304, 45006, 45527)

increase the QPS for namespace controller

The namespace controller is really chatty. Especially to discovery since that involves two requests for every API version available. This bumps the QPS and burst on the namespace controller to avoid being stuck waiting.
2017-05-09 12:04:41 -07:00
Kubernetes Submit Queue 202a9f8445 Merge pull request #42317 from NickrenREN/attach-detach-error-info-print
Automatic merge from submit-queue

add and clear err message about RemoveVolumeFromReportAsAttached()

**Release note**:

```release-note
NONE
```
2017-05-09 10:44:32 -07:00