Commit Graph

19776 Commits (de92f90f12dbc17c8ecdc1b54121d976e17dcd15)

Author SHA1 Message Date
Anthony Yeh de92f90f12 Deployment: Clear obsolete OverlapAnnotaiton.
This ensures old clients will not assume the Deployment is blocked.
2017-03-16 14:52:01 -07:00
Anthony Yeh fa23729a6d kubectl: Use v1.5-compatible ownership logic when listing dependents.
In particular, we should not assume ControllerRefs are necessarily set.
However, we can still use ControllerRefs that do exist to avoid
interfering with controllers that do use it.
2017-03-16 12:28:38 -07:00
Anthony Yeh 725ec0cc5e kubectl: Check for Deployment overlap annotation in reaper.
This effectively reverts the client-side changes in
cec3899b96.
We have to maintain the old behavior on the client side to support
version skew when talking to old servers that set the annotation.

However, the new server-side behavior is still to NOT set the
annotation.
2017-03-16 12:28:28 -07:00
Kubernetes Submit Queue 754effe332 Merge pull request #42949 from wenlxie/master
Automatic merge from submit-queue

recycle pod can't get the event since channel closed

What this PR does / why we need it:
We create a   hostPath type  PV with "Recycle" persistentVolumeReclaimPolicy,  and bind a PVC to it, but after deleted the PVC, the PV cannot become to available status. This is happened after we upgrade etcd to 3.0. The reason is:
If the channel used to get the pod message and events been abnormal closed(for example, the event channel maybe closed because of "required revision has been compacted" error), the function internalRecycleVolumeByWatchingPodUntilCompletion will stuck in a loop, and the recycle pod will not been deleted, the PV can not become into available status

Special notes for your reviewer:
None
Release note:
2017-03-16 02:41:11 -07:00
Kubernetes Submit Queue 2bec20ce55 Merge pull request #43122 from liggitt/protobuf-default
Automatic merge from submit-queue

Prevent protobuf storage with etcd2

Prevents accidentally storing protobuf content in etcd2 when upgrading to 1.6

c.f. https://github.com/kubernetes/kubernetes/issues/42976#issuecomment-286537139

```release-note
if kube-apiserver is started with `--storage-backend=etcd2`, the media type `application/json` is used.
```
2017-03-15 22:07:03 -07:00
Kubernetes Submit Queue ba25afd278 Merge pull request #40964 from tanshanshan/kubelet-unit-test
Automatic merge from submit-queue (batch tested with PRs 40964, 42967, 43091, 43115)

Improve code coverage for pkg/kubelet/status/generate.go

**What this PR does / why we need it**:

Improve code coverage for pkg/kubelet/status/generate.go  from #39559

Thanks.

**Special notes for your reviewer**:

**Release note**:

```release-note
```
2017-03-15 16:08:23 -07:00
Jordan Liggitt 87e32c7532
Force etcd2 to use application/json, add base64-wrapper decoder as fallback 2017-03-15 11:24:12 -04:00
Kubernetes Submit Queue 0afdcfcaf6 Merge pull request #43114 from enisoc/deployment-upgrade-test
Automatic merge from submit-queue

Fix Deployment upgrade test.

**What this PR does / why we need it**:

When the upgrade test operates on Deployments in a pre-1.6 cluster (i.e. during the Setup phase), it needs to use the v1.5 deployment/util logic. In particular, the v1.5 logic does not filter children to only those with a matching ControllerRef.

**Which issue this PR fixes**:

Fixes #42738

**Special notes for your reviewer**:

**Release note**:
```release-note
```
cc @kubernetes/sig-apps-pr-reviews
2017-03-15 07:03:19 -07:00
Kubernetes Submit Queue df97434e8a Merge pull request #43110 from caesarxuchao/fix-42952
Automatic merge from submit-queue (batch tested with PRs 43106, 43110)

Wait for garbagecollector to be synced in test

Fix #42952

Without the `cache.WaitForCacheSync` in the test, it's possible for the GC to get a merged event of RC's creation and its update (update to deletionTimestamp != 0), before GC gets the creation event of the pod, so it's possible the GC will handle the foreground deletion of the RC before it adds the Pod to the dependency graph, thus the race.

With the `cache.WaitForCacheSync` in the test, because GC runs a single thread to process graph changes, it's guaranteed the Pod will be added to the dependency graph before GC handles the foreground deletion of the RC.

Note that this pull fixes the race in the test. The race described in the first point of #26120 still exists.
2017-03-15 06:14:21 -07:00
Kubernetes Submit Queue 222f69cf3c Merge pull request #43030 from yujuhong/rm_corrupted_checkpoint
Automatic merge from submit-queue (batch tested with PRs 42747, 43030)

dockershim: remove corrupted sandbox checkpoints

This is a workaround to ensure that kubelet doesn't block forever when
the checkpoint is corrupted.

This is a workaround for #43021
2017-03-14 22:56:20 -07:00
Kubernetes Submit Queue 1b4433d4c3 Merge pull request #42747 from dcbw/iptables-proxy-optimize
Automatic merge from submit-queue (batch tested with PRs 42747, 43030)

kube-proxy/iptables: optimize endpoint map creation by excluding invalid endpoints earlier

We don't need to do as much work as we were doing, if we exclude invalid endpoints earlier in the endpoints processing.

Fixes: https://github.com/kubernetes/kubernetes/issues/42210

@freehan @liggitt @thockin if you could review this with a fine-toothed comb...  I can't immediately think of why invalid endpoints would be useful for the HealthChecker, and this PR prevents the HC from seeing these endpoints.
2017-03-14 22:56:18 -07:00
Kubernetes Submit Queue 792a24ac01 Merge pull request #43118 from yujuhong/sync_checkpoint
Automatic merge from submit-queue

dockershim: call sync() after writing the checkpoint

This ensures the checkpoint files are persisted.

This fixes #43021
2017-03-14 21:12:49 -07:00
Kubernetes Submit Queue fea42bade0 Merge pull request #42854 from vladimirvivien/scaleio-k8s-fix-readOnly
Automatic merge from submit-queue (batch tested with PRs 42854, 43105, 43090)

Update ScaleIO volume plugin default readOnly value

This commit updates the code to set readOnly attribute to be set to false.

**What this PR does / why we need it**:
This PR is a minor fix that updates the default value of `readOnly` attribute to `false`.

**Release note**:

```release-note
NONE
```
2017-03-14 18:44:18 -07:00
Yu-Ju Hong 48afc7d4e0 dockershim: call sync() after writing the checkpoint
This ensures the checkpoint files are persisted.
2017-03-14 18:36:51 -07:00
Anthony Yeh 0c015927c4 Fix Deployment upgrade test.
When the upgrade test operates on Deployments in a pre-1.6 cluster
(i.e. during the Setup phase), it needs to use the v1.5 deployment/util
logic. In particular, the v1.5 logic does not filter children to only
those with a matching ControllerRef.
2017-03-14 17:39:29 -07:00
Chao Xu 0605ba7a6d wait for garbagecollector to be synced in test 2017-03-14 16:19:33 -07:00
Kubernetes Submit Queue f8dd2569a1 Merge pull request #42991 from jsafrane/fix-default-beta
Automatic merge from submit-queue (batch tested with PRs 42775, 42991, 42968, 43029)

Remove 'beta' from default storage class annotation

I forgot to update default storage class annotation in my storage.k8s.io/v1beta1 -> v1 PRs. Let's fix it before 1.6 is released.

I consider it as a bugfix, in #40088 I already updated the release notes to include non-beta annotation  `storageclass.kubernetes.io/is-default-class`

```release-note
NONE
```


@kubernetes/sig-storage-pr-reviews 
@msau42, please help with merging.
2017-03-14 13:52:41 -07:00
Kubernetes Submit Queue c11498a34d Merge pull request #42775 from gyliu513/kubectl-taint
Automatic merge from submit-queue (batch tested with PRs 42775, 42991, 42968, 43029)

`kubectl taint` update for `NoExecute`.

**What this PR does / why we need it**:

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:
Fixes #42774
**Release note**:

```release-note
```
2017-03-14 13:52:39 -07:00
Kubernetes Submit Queue a8d8542fc7 Merge pull request #42765 from janetkuo/ds-update-validation-fix
Automatic merge from submit-queue

Add DaemonSet templateGeneration validation and tests, and fix a bunch of validation test errors

For DaemonSet update:
1. Validate that templateGeneration is increased when and only when template is changed
1. Validate that templateGeneration is never decreased
1. Added validation tests for templateGeneration 
1. Fix a bunch of errors in validate tests
   - fake tests: almost all validation test error cases failed on "missing resource version", "name changes", "missing update strategy", "selector/template labels mismatch", not on the real validation we wanted to test
   - some error cases should be success cases

@kargakis @lukaszo @kubernetes/sig-apps-bugs 

*I've verified locally that all DaemonSet e2e tests pass with this change.*
2017-03-14 12:54:49 -07:00
Vladimir Vivien 0715b32439 Update ScaleIO volume plugin default readOnly value
This commit updates the code to set the default value of the readOnly attribute to false.
It also updates the example docs to add full list of supported plugin attributes and doc.
2017-03-14 14:19:48 -04:00
Kubernetes Submit Queue 70b3848bce Merge pull request #42935 from yifan-gu/fix_flock
Automatic merge from submit-queue (batch tested with PRs 42942, 42935)

pkg/util/flock: Fix the flock so it actually locks.

With this PR, the second call to `Acquire()` will block unless the lock is released (process exits).
Also removed the memory mutex in the previous code since we don't need `Release()` here so no need to save and protect the local fd.

Fix #42929.
2017-03-14 10:19:18 -07:00
Kubernetes Submit Queue 6de28fab7d Merge pull request #42942 from vishh/gpu-cont-fix
Automatic merge from submit-queue (batch tested with PRs 42942, 42935)

[Bug] Handle container restarts and avoid using runtime pod cache while allocating GPUs

Fixes #42412

**Background**
Support for multiple GPUs is an experimental feature in v1.6. 
Container restarts were handled incorrectly which resulted in stranding of GPUs
Kubelet is incorrectly using runtime cache to track running pods which can result in race conditions (as it did in other parts of kubelet). This can result in same GPU being assigned to multiple pods.

**What does this PR do**
This PR tracks assignment of GPUs to containers and returns pre-allocated GPUs instead of (incorrectly) allocating new GPUs.
GPU manager is updated to consume a list of active pods derived from apiserver cache instead of runtime cache.
Node e2e has been extended to validate this failure scenario.

**Risk**
Minimal/None since support for GPUs is an experimental feature that is turned off by default. The code is also isolated to GPU manager in kubelet.

**Workarounds**
In the absence of this PR, users can mitigate the original issue by setting `RestartPolicyNever`  in their pods.
There is no workaround for the race condition caused by using the runtime cache though.
Hence it is worth including this fix in v1.6.0.

cc @jianzhangbjz @seelam @kubernetes/sig-node-pr-reviews 

Replaces #42560
2017-03-14 10:19:17 -07:00
Kubernetes Submit Queue f1e9004da9 Merge pull request #42927 from Random-Liu/fix-kubelet-panic
Automatic merge from submit-queue (batch tested with PRs 42802, 42927, 42669, 42988, 43012)

Fix kubelet panic in cgroup manager.

Fixes https://github.com/kubernetes/kubernetes/issues/42920.
Fixes https://github.com/kubernetes/kubernetes/issues/42875
Fixes #42927 
Fixes #43059

Check the error in walk function, so that we don't use info when there is an error.

@yujuhong @dchen1107 @derekwaynecarr @vishh /cc @kubernetes/sig-node-bugs
2017-03-14 07:31:31 -07:00
Guangya Liu ab94015ee8 `kubectl taint` update for `NoExecute`. 2017-03-14 18:24:40 +08:00
wenlxie 33385214bc recycle pod can't get the event since the channel been closed 2017-03-14 10:35:08 +08:00
Yu-Ju Hong 035afab901 dockershim: remove corrupted sandbox checkpoints
This is a workaround to ensure that kubelet doesn't block forever when
the checkpoint is corrupted.
2017-03-13 15:41:01 -07:00
Yifan Gu a489bd2674 pkg/util/flock: Fix the flock so it actually locks.
With this PR, the second call to `Acquire()` will block unless the lock is released (process exits).
Also removed the memory mutex in the previous code since we don't need `Release()` here so no need to save and protect the local fd.

Fix #42929.
2017-03-13 14:24:59 -07:00
Random-Liu e6341cc3c7 Fix kubelet panic in cgroup manager. 2017-03-13 12:06:08 -07:00
Vishnu kannan ad743a922a remove dead code in gpu manager
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-03-13 10:58:26 -07:00
Vishnu kannan ff158090b3 use active pods instead of runtime pods in gpu manager
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-03-13 10:58:26 -07:00
Vishnu Kannan 8ed9bff073 handle container restarts for GPUs
Signed-off-by: Vishnu Kannan <vishnuk@google.com>
2017-03-13 10:58:26 -07:00
Jan Safranek 06feaccead Remove 'beta' from default storage class annotation 2017-03-13 12:53:41 +01:00
Kubernetes Submit Queue e1248bcbbc Merge pull request #42962 from k82cn/fix_min_tolerant_time
Automatic merge from submit-queue

Fixed incorrect result of getMinTolerationTime.

For the following case, `getMinTolerationTime` should return one; but  it returned -1 :
1. for tolerations[0], TolerationSeconds is nil, minTolerationTime is not set 
2. for tolerations[1], it's TolerationSeconds (1) is bigger than `minTolerationTime`, so minTolerationTime is still -1 which means infinite.

```
+		{
+			tolerations: []v1.Toleration{
+				{
+					TolerationSeconds: nil,
+				},
+				{
+					TolerationSeconds: &one,
+				},
+			},
+		},
```
2017-03-12 23:55:39 -07:00
Kubernetes Submit Queue 65ddace3ed Merge pull request #42702 from smarterclayton/printer_owners
Automatic merge from submit-queue

Add pkg/printers OWNERS

Should also include more sig-api-machinery as this will be moving to server side
2017-03-12 21:04:57 -07:00
tanshanshan 26ab52a3cb fix 2017-03-13 10:00:19 +08:00
Klaus Ma d0e04427d7 Fixed incorrect result of getMinTolerationTime. 2017-03-12 20:21:14 +08:00
Kubernetes Submit Queue e315c388b2 Merge pull request #42944 from liggitt/patch-defaulting
Automatic merge from submit-queue

Ensure patched objects are defaulted correctly

Restores defaulting behavior for patch API calls removed in e34e1abe33 (diff-517d1b81963bbc7c9b0a16e6eb3c0e2f)

Restores the unit test that ensures we get a defaulted result after applying a patch

Fixes https://github.com/kubernetes/kubernetes/issues/42764
Fixes #42834
2017-03-11 17:49:41 -08:00
Kubernetes Submit Queue 3f660a9779 Merge pull request #42913 from aveshagarwal/master-fix-taint-based-eviction-no-node-cidr
Automatic merge from submit-queue

Fix taint based pod eviction for clusters where controller manager is not running with allocate-node-cidrs set

Fixes https://github.com/kubernetes/kubernetes/issues/42733

In my cluster, I have not set allocate-node-cidr, and It is causing taint based pod eviction to fail. 

@gmarek @kubernetes/sig-scheduling-bugs @davidopp @derekwaynecarr
2017-03-11 14:02:45 -08:00
Kubernetes Submit Queue 8cb14a4f7f Merge pull request #42755 from aveshagarwal/master-fix-default-toleration-seconds
Automatic merge from submit-queue (batch tested with PRs 41794, 42349, 42755, 42901, 42933)

Fix DefaultTolerationSeconds admission plugin

DefaultTolerationSeconds is not working as expected. It is supposed to add default tolerations (for unreachable and notready conditions). but no pod was getting these toleration. And api server was throwing this error:

```
Mar 08 13:43:57 fedora25 hyperkube[32070]: E0308 13:43:57.769212   32070 admission.go:71] expected pod but got Pod
Mar 08 13:43:57 fedora25 hyperkube[32070]: E0308 13:43:57.789055   32070 admission.go:71] expected pod but got Pod
Mar 08 13:44:02 fedora25 hyperkube[32070]: E0308 13:44:02.006784   32070 admission.go:71] expected pod but got Pod
Mar 08 13:45:39 fedora25 hyperkube[32070]: E0308 13:45:39.754669   32070 admission.go:71] expected pod but got Pod
Mar 08 14:48:16 fedora25 hyperkube[32070]: E0308 14:48:16.673181   32070 admission.go:71] expected pod but got Pod
```

The reason for this error is that the input to admission plugins is internal api objects not versioned objects so expecting versioned object is incorrect. Due to this, no pod got desired tolerations and it always showed:

```
Tolerations: <none>
```

After this fix, the correct  tolerations are being assigned to pods as follows:

```
Tolerations:	node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 300s
		node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s
```

@davidopp @kevin-wangzefeng @kubernetes/sig-scheduling-pr-reviews @kubernetes/sig-scheduling-bugs @derekwaynecarr 

Fixes https://github.com/kubernetes/kubernetes/issues/42716
2017-03-10 22:02:18 -08:00
Jordan Liggitt 464db160b4
Ensure patched objects are defaulted correctly 2017-03-10 22:07:10 -05:00
Kubernetes Submit Queue 59aa924a9b Merge pull request #42642 from fraenkel/envfrom
Automatic merge from submit-queue

Invalid environment var names are reported and pod starts

When processing EnvFrom items, all invalid keys are collected and
reported as a single event.

The Pod is allowed to start.

fixes #42583
2017-03-10 17:37:31 -08:00
Kubernetes Submit Queue 9590f694c8 Merge pull request #41830 from irfanurrehman/fed-rbac-1
Automatic merge from submit-queue

[Federation] Kubefed Init should use the right RBAC API version clientset

**What this PR does / why we need it**:
Implements the need as described in https://github.com/kubernetes/kubernetes/issues/41263
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
https://github.com/kubernetes/kubernetes/issues/41263

**Special notes for your reviewer**:
@madhusudancs @shashidharatd @marun 
cc @kubernetes/sig-federation-bugs

**Release note**:

```
NONE
```
2017-03-10 15:56:47 -08:00
Kubernetes Submit Queue 486ec2b7c9 Merge pull request #42862 from caesarxuchao/sync-warning
Automatic merge from submit-queue (batch tested with PRs 38805, 42362, 42862)

Let GC print specific message for RESTMapping failure

Make the error messages reported in https://github.com/kubernetes/kubernetes/issues/39816 to be more specific, also only print the message once.

I'll also update the garbage collector's doc to clearly state we don't support tpr yet.

We'll wait for the watchable discovery feature (@sttts are you going to work on that?) to land in 1.7, and then enable the garbage collector to handle TPR.

cc @hongchaodeng @MikaelCluseau @djMax
2017-03-10 14:01:23 -08:00
Kubernetes Submit Queue d2d3884f83 Merge pull request #42362 from soltysh/deployment_generators
Automatic merge from submit-queue (batch tested with PRs 38805, 42362, 42862)

Fix deployment generator after introducing deployments in apps/v1beta1

This PR does two things:

1. Switches all generator to produce versioned objects, to bypass the problem of having an object in multiple versions, which then results in not having stable generator (iow. producing exactly the same object).
2. Introduces new generator for `apps/v1beta1` deployments.

@kargakis @janetkuo ptal

@kubernetes/sig-apps-pr-reviews @kubernetes/sig-cli-pr-reviews ptal

This is a followup to https://github.com/kubernetes/kubernetes/pull/39683, so I'm adding 1.6 milestone.

```release-note
Introduce new generator for apps/v1beta1 deployments
```
2017-03-10 14:01:21 -08:00
Kubernetes Submit Queue e2218290cf Merge pull request #42444 from jingxu97/Mar/deleteVolume
Automatic merge from submit-queue (batch tested with PRs 42608, 42444)

Return nil when deleting non-exist GCE PD

When gce cloud tries to delete a disk, if the disk could not be found
from the zones, the function should return nil error. This modified behavior is also consistent with AWS
2017-03-10 12:50:24 -08:00
Avesh Agarwal c3a80719a2 Fix taint based pod eviction for clusters where controller manager
is not running with --allocate-node-cidrs set.
2017-03-10 15:39:21 -05:00
Chao Xu d7aef0a338 Let GC print specific message for RESTMapping failure 2017-03-10 11:38:57 -08:00
Kubernetes Submit Queue e261cabb09 Merge pull request #42877 from gmarek/taint_cleanup
Automatic merge from submit-queue (batch tested with PRs 42877, 42853)

Remove unused functions and make logs slightly better

Zero risk cleanup, removing function that are not used anymore, and adding few more logs to help debugging problems.

cc @aveshagarwal
2017-03-10 09:54:21 -08:00
Kubernetes Submit Queue 18ffc95308 Merge pull request #36704 from fabxc/client-metrics2
Automatic merge from submit-queue

Use Prometheus instrumentation conventions

The `System` and `Subsystem` parameters are subject to removal.
(x-ref: https://github.com/prometheus/client_golang/issues/240)

All metrics should use base units, which is seconds in the duration
case.

Counters should always end in `_total` and metrics should avoid
referring to potential label dimensions. Those should rather be
mentioned in the documentation string.

@kubernetes/sig-instrumentation 

Reference docs:
https://prometheus.io/docs/practices/instrumentation/
https://prometheus.io/docs/practices/naming/

**Release note**:
```
Breaking change: Renamed REST client Prometheus metrics to follow the instrumentation conventions ("request_latency_microseconds" -> "rest_client_request_latency_seconds", "request_status_codes" -> "rest_client_requests_total"). Please update your alerting pipeline if you rely on them. 
```
2017-03-10 09:04:18 -08:00
Maciej Szulik 597a359c38 Error out when cronjob generator not specified, but cronjobs are not available 2017-03-10 12:08:01 +01:00