Commit Graph

54818 Commits (3a0dabcaeab738207b0a9881d77732cf1e74e1fa)

Author SHA1 Message Date
Janet Kuo 3a0dabcaea Refactor function 2017-09-25 10:27:31 -07:00
Janet Kuo 241f4fbc98 Move deployment collision avoidance e2e test to integration 2017-09-25 10:27:31 -07:00
Kubernetes Submit Queue 00c1ec5201 Merge pull request #52582 from foxish/statefulset-upgrade-tests3
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..

Add statefulset upgrade tests to be run as part of upgrade testing

Statefulset upgrade testing is not running at all in any testsuite. This has caused issues in the past like: https://github.com/kubernetes/kubernetes/issues/48327
Changing the tag to make it run in existing upgrade test clusters.

@krzyzacy @kubernetes/sig-apps-misc @kubernetes/sig-release-members @kow3ns @enisoc
2017-09-18 11:47:24 -07:00
Kubernetes Submit Queue 8ca1d9f19b Merge pull request #52550 from piosz/owners
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..

Added OWNERS for metrics-server

https://github.com/kubernetes/features/issues/271
2017-09-17 23:09:33 -07:00
Kubernetes Submit Queue 1a44e26670 Merge pull request #48216 from kargakis/update-pds-api-comment
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..

api: update progressdeadlineseconds comment for deployments

@kubernetes/sig-apps-api-reviews we may never end up doing autorollback - this drops the comment from the pds field for now
2017-09-16 15:33:18 -07:00
Kubernetes Submit Queue 8163d147ec Merge pull request #52572 from DirectXMan12/bug/fix-missing-hpa-metrics-policy
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..

Add bootstrap policy for HPA metrics REST clients

Since we weren't running the HPA with metrics REST clients by default,
we had no bootstrap policy enabling the HPA controller to talk to the
metrics APIs.

This adds permissions for the HPA controller to talk list
pods.metrics.k8s.io, and list any resource in custom.metrics.k8s.io.

```release-note
Introduce policy to allow the HPA to consume the metrics.k8s.io and custom.metrics.k8s.io API groups.
```
2017-09-16 13:41:32 -07:00
Kubernetes Submit Queue d48611a1da Merge pull request #43152 from ncdc/watch-cache-retry-live-object-on-conflict
Automatic merge from submit-queue (batch tested with PRs 52176, 43152). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..

etcd3 store: retry with live object on conflict if there was a suggestion

Retry with a live object instead of the cached version if the watch
cache receives a conflict trying to do the update.

Fixes #41892
2017-09-16 09:45:31 -07:00
Kubernetes Submit Queue 3277de69b4 Merge pull request #52176 from liggitt/heartbeat-timeout
Automatic merge from submit-queue (batch tested with PRs 52176, 43152). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..

Eliminate hangs/throttling of node heartbeat

Fixes https://github.com/kubernetes/kubernetes/issues/48638
Fixes #50304

Stops kubelet from wedging when updating node status if unable to establish tcp connection.

 Notes that this only affects the node status loop. The pod sync loop would still hang until the dead TCP connections timed out,  so more work is needed to keep the sync loop responsive in the face of network issues, but this change lets existing pods coast without the node controller trying to evict them

```release-note
kubelet to master communication when doing node status updates now has a timeout to prevent indefinite hangs
```
2017-09-16 09:45:29 -07:00
Kubernetes Submit Queue d3731ddb8b Merge pull request #52576 from fabriziopandini/fixAddonPhase
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..

fix Kubeadm phase addon regression

What this PR does / why we need it:
fix Kubeadm phase addon regression

Special notes for your reviewer:
CC @luxas
2017-09-16 08:14:03 -07:00
Kubernetes Submit Queue 3899491d2b Merge pull request #52524 from karataliu/ccm_clustername
Automatic merge from submit-queue (batch tested with PRs 52486, 52588, 52524). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..

Add cluster-name option for cloud controller manager

**What this PR does / why we need it**:
`cluster-name` is used by servicecontroller and routecontroller, for controller-manager, we have a parameter to set it, but for cloud-controller-manager, it will always be of default value 'kubernetes'.

An example of impact is Azure's loadbalancer, the loadbalancer resource created will always have the name 'kubernetes', while it used to be the cluster name set via controller manger's option.

**Which issue this PR fixes**
Fixes #52522

**Special notes for your reviewer**:

**Release note**:
```release-note
```
2017-09-16 06:34:27 -07:00
Kubernetes Submit Queue 02731cc767 Merge pull request #52588 from bobbypage/cristatsprovider-fixes
Automatic merge from submit-queue (batch tested with PRs 52486, 52588, 52524). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..

Fix nil pointer dereference in cri stats provider when there are no image file systems

**What this PR does / why we need it**:

This PR fixes a nil pointer dereference in CRI stats provider when there are no image filesystems. See https://github.com/kubernetes/kubernetes/pull/51152 for discussion.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
```

/cc yujuhong yguo0905
2017-09-16 06:34:23 -07:00
Kubernetes Submit Queue 5079186eb4 Merge pull request #52486 from oracle/for/upstream/master/cnc-recorder-fix
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..

Changes the node cloud controller to use its name for events

**What this PR does / why we need it**:

Updates the event recorder component to be the `cloud-node-controller` instead of `cloudcontrollermanager`, which aligns with how other controllers are setup like the daemonset controller which uses `daemonset-controller` and the ccm uses `cloud-controller-manager`.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
Use `cloud-node-controller` for cloud node controller events
```

/cc @luxas @wlan0 @jhorwit2
2017-09-16 06:13:25 -07:00
Kubernetes Submit Queue 0f7aa6727c Merge pull request #52548 from piosz/ms-bump
Automatic merge from submit-queue (batch tested with PRs 52488, 52548). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..

Bumped Metrics Server to v0.2.0

ref https://github.com/kubernetes/features/issues/271

**Release note**:
```release-note
Introduced Metrics Server in version v0.2.0. For more details see https://github.com/kubernetes-incubator/metrics-server/releases/tag/v0.2.0.
```
2017-09-15 18:34:27 -07:00
Kubernetes Submit Queue 549bd71ea7 Merge pull request #52488 from kawych/master
Automatic merge from submit-queue (batch tested with PRs 52488, 52548). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..

Enable overriding Heapster resource requirements in GCP

This PR enables to override Heapster resource requirements in GCP.

**Release note:**
```release-note
```
2017-09-15 18:34:25 -07:00
David Porter aee1e58d58 Handle nil WritableLayer 2017-09-16 00:13:17 +00:00
David Porter 0b1f806557 Fix nil dereference if storage id is nil 2017-09-16 00:13:04 +00:00
Anirudh 1762bc428e update tag 2017-09-15 16:47:42 -07:00
Anirudh 83ad6900e5 Add statefulset upgrade tests to be run as part of
all upgrade testsuites
2017-09-15 16:29:06 -07:00
Kubernetes Submit Queue c4f3017f15 Merge pull request #52539 from piosz/metrics-v1alpha1
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>..

Do not install metrics/v1alpha1 by default

We want to have `metrics/v1alpha1` in the repo in order to support the previous version of HPA, but we don't want to install them by default.

ref https://github.com/kubernetes-incubator/metrics-server/pull/15
2017-09-15 15:08:52 -07:00
fabriziopandini d21040b8e6 fix addon error 2017-09-16 00:03:35 +02:00
Solly Ross 8cbbbac27d Add bootstrap policy for HPA metrics REST clients
Since we weren't running the HPA with metrics REST clients by default,
we had no bootstrap policy enabling the HPA controller to talk to the
metrics APIs.

This adds permissions for the HPA controller to talk list
pods.metrics.k8s.io, and list any resource in custom.metrics.k8s.io.
2017-09-15 17:27:38 -04:00
Piotr Szczesniak cc072e868d Do not install metrics/v1at lpah1 by default 2017-09-15 21:44:58 +02:00
Andy Goldstein bf33df16b5 etcd3 store: retry w/live object on conflict
In GuaranteedUpdate, if it was called with a suggestion (e.g. via the
watch cache), and the suggested object is stale, perform a live lookup
and then retry the update.

Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
2017-09-15 14:02:09 -04:00
Kubernetes Submit Queue 98ed5dd8a2 Merge pull request #52518 from xiangpengzhao/move-out-1.3-release-notes
Automatic merge from submit-queue

Move 1.3.* release notes out of CHANGELOG.md

**What this PR does / why we need it**:
The latest CHANGELOG almost reach the size of 1M again. Move 1.3 release notes out of CHANGELOG.md

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
ref: #48985 #52174

**Special notes for your reviewer**:
I wish I could split all <1.7 releases in one PR, but it's a bit painful to do that way. So I just split each release at a time.

/cc @jdumars @dchen1107 

**Release note**:

```release-note
NONE
```
2017-09-15 09:19:12 -07:00
Kubernetes Submit Queue 9ef9a1b8f0 Merge pull request #52544 from shyamjvs/increase-window-of-prometheus-metric
Automatic merge from submit-queue

Increase sliding window to 5hr for request_latencies metric

We're seeing high latency values for a couple of types of api calls in our density test (ref https://github.com/kubernetes/kubernetes/issues/51899). And we're recording values from only the last 1 hour in the metric (as @wojtek-t told me offline) - so our test result is pretty much counting only the calls during the delete phase.

cc @kubernetes/sig-scalability-misc @kubernetes/sig-api-machinery-misc @gmarek
2017-09-15 09:19:06 -07:00
Kubernetes Submit Queue d7770f9c11 Merge pull request #52536 from aleksandra-malinowska/revert-update-addon-resizer-version
Automatic merge from submit-queue

Revert "Update addon-resizer version"

This reverts #46850 due to several issues with new version of addon resizer (#50599, #52535), as recommended by @piosz. 

cc @fgrzadkowski @x13n
2017-09-15 09:18:48 -07:00
Kubernetes Submit Queue 9f954c146f Merge pull request #52543 from shyamjvs/add-more-traces-to-delete-resource-handler
Automatic merge from submit-queue

Add extra steps to delete resource handler trace

Based on https://github.com/kubernetes/kubernetes/issues/51899#issuecomment-329786131

cc @kubernetes/sig-scalability-misc @kubernetes/sig-api-machinery-misc @gmarek
2017-09-15 08:27:43 -07:00
Piotr Szczesniak 45f84b9c9f Added OWNERS for metrics-server 2017-09-15 17:03:37 +02:00
Piotr Szczesniak c632649ec7 Bumped Metrics Server to v0.2.0 2017-09-15 16:38:57 +02:00
Shyam Jeedigunta 6089cadab3 Add extra steps to delete resource handler trace 2017-09-15 16:12:13 +02:00
Shyam Jeedigunta e1ba3da16c Increase sliding window to 5hr for request_latencies metric 2017-09-15 16:11:18 +02:00
Kubernetes Submit Queue 9aef242a4c Merge pull request #52223 from bsalamat/approver
Automatic merge from submit-queue (batch tested with PRs 51796, 52223)

Add bsalamat to sig-scheduling-maintainers

**What this PR does / why we need it**:
Adds bsalamat to sig-scheduling-maintainers.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes # N/A

**Release note**:

```release-note
NONE
```

@kubernetes/sig-scheduling-pr-reviews @davidopp @timothysc @k82cn @wojtek-t
2017-09-15 05:51:23 -07:00
Kubernetes Submit Queue ea22affd08 Merge pull request #51796 from Dirbaio/fix/pod-node-switch
Automatic merge from submit-queue (batch tested with PRs 51796, 52223)

Fix pod and node names switched around in error message.

**What this PR does / why we need it**: This PR fixes a pod name and a node name switched around in an error message from the daemon controller.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: No issue that I know of

**Special notes for your reviewer**: -

**Release note**:

```release-note
NONE
```
2017-09-15 05:51:21 -07:00
Aleksandra Malinowska 68d3a9db2a Revert "Update addon-resizer version"
This reverts commit 63ccedcfa7.
2017-09-15 14:30:47 +02:00
Kubernetes Submit Queue 87a1b5f6d7 Merge pull request #52476 from clamoriniere1A/bugfix/e2e_job_backoff_flaky
Automatic merge from submit-queue

Bugfix: Fix e2e Flaky Apps/Job BackoffLimit test

This fix is linked to the PR #51153 that introduce the `JobSpec.BackoffLimit`.

Previously the Timeout used in the test was too aggressive and generates flaky test execution. Now it used the default `framework.JobTimeout` used in others tests.



**What this PR does / why we need it**:
This PR should fix flaky "[sig-apps] Job should exceed backoffLimit" test, due to a too short timeout duration.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
fixes #51153 

**Special notes for your reviewer**:

**Release note**:

```release-note
```
2017-09-15 03:30:27 -07:00
Karol Wychowaniec 8cfeb4f172 Enable overriding Heapster resource requirements in GCP 2017-09-15 11:45:37 +02:00
Kubernetes Submit Queue b5fbd71bbc Merge pull request #52290 from jiayingz/deviceplugin-failure
Automatic merge from submit-queue (batch tested with PRs 52452, 52115, 52260, 52290)

Fixes device plugin re-registration handling logic to make sure:

- If a device plugin exits, its exported resource will be removed.
- No capacity change if a new device plugin instance comes up to replace the old instance.



**What this PR does / why we need it**:

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/kubernetes/kubernetes/issues/52510

**Special notes for your reviewer**:

**Release note**:

```release-note
```
2017-09-15 02:00:08 -07:00
Kubernetes Submit Queue b7953a787e Merge pull request #52260 from andyzhangx/azuremounter-issue
Automatic merge from submit-queue (batch tested with PRs 52452, 52115, 52260, 52290)

fix azure disk mounter issue

**What this PR does / why we need it**:
fix azure disk mounter issue, it's a P1 bug, it exists in 1.7, 1.8 release, should cherry pick to 1.7, 1.8

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
fixes #52261 

consider following issue: 
1) A pod mounting an azure disk in a k8s agent
2) The kubelet is restarted in that k8s agent
3) The pod could not start up, it always reports error as following:

  4d            1m              3065    kubelet, 14777acs9000                   Warning         FailedMount     MountVolume.SetUp failed for volume "pvc-7a0cdeb9-92c7-11e7-b86b-000d3a36d70c" : azureDisk - No
t a mounting point for disk andykubewin175-dynamic-pvc-7a0cdeb9-92c7-11e7-b86b-000d3a36d70c on \var\lib\kubelet\pods\d146c023-92c7-11e7-b86b-000d3a36d70c\volumes\kubernetes.io~azure-disk\pvc-7a0cdeb9-92c7-11
e7-b86b-000d3a36d70c
  4d            1m              3157    kubelet, 14777acs9000                   Warning         FailedMount     Error syncing pod

**Special notes for your reviewer**:
If you take a look at following implementation of vsphere or gce, it will return nil instead of error:
https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/vsphere_volume/vsphere_volume.go#L217-L220
https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/gce_pd/gce_pd.go#L273-L275

The logic of return info parsing here, it's wrong to return error
https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/util/operationexecutor/operation_generator.go#L469-L475

**Release note**:

```release-note
```
2017-09-15 02:00:03 -07:00
Kubernetes Submit Queue 0c1dcb01c5 Merge pull request #52115 from jcbsmpsn/flag-enable-kubelet-certificate-rotation
Automatic merge from submit-queue (batch tested with PRs 52452, 52115, 52260, 52290)

Add env var to enable kubelet rotation in kube-up.sh.

Fixes https://github.com/kubernetes/kubernetes/issues/52114

```release-note
Adds ROTATE_CERTIFICATES environment variable to kube-up.sh script for GCE
clusters. When that var is set to true, the command line flag enabling kubelet
client certificate rotation will be added to the kubelet command line.
```
2017-09-15 01:59:59 -07:00
Kubernetes Submit Queue 935726f109 Merge pull request #52452 from gnufied/fix-quota-on-update
Automatic merge from submit-queue (batch tested with PRs 52452, 52115, 52260, 52290)

Fix support for updating quota on update

This PR implements support for properly handling quota when resources are updated. We never take negative values and add them up.

Fixes https://github.com/kubernetes/kubernetes/issues/51736 

cc @derekwaynecarr 

/sig storage

```release-note
Make sure that resources being updated are handled correctly by Quota system
```
2017-09-15 01:59:56 -07:00
Kubernetes Submit Queue 93ddb7be5f Merge pull request #52237 from smarterclayton/watch_metric
Automatic merge from submit-queue (batch tested with PRs 51824, 50476, 52451, 52009, 52237)

Improve apiserver metrics reporting

Normalize "WATCHLIST" to "WATCH", add "scope" to the other metrics (listing 50k pods is != listing pods in a namespace), and add a new scope "resource" to cover individual resource calls.

This roughly aligns metrics with our ACL model (technically resource scope is GET, but POST to a subresource and POST to a namespace are not the same thing).

```release-note
WATCHLIST calls are now reported as WATCH verbs in prometheus for the apiserver_request_* series.  A new "scope" label is added to all apiserver_request_* values that is either 'cluster', 'resource', or 'namespace' depending on which level the query is performed at.
```
2017-09-15 01:08:11 -07:00
Kubernetes Submit Queue 6b5462803e Merge pull request #52009 from zjj2wry/export-svc
Automatic merge from submit-queue (batch tested with PRs 51824, 50476, 52451, 52009, 52237)

fix issue(#47976)Invalid value error when creating service from expor…

…ted config



**What this PR does / why we need it**:
close issue #47976
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2017-09-15 01:08:08 -07:00
Kubernetes Submit Queue 86dc5fceda Merge pull request #52451 from yujuhong/enable-cri-stats
Automatic merge from submit-queue (batch tested with PRs 51824, 50476, 52451, 52009, 52237)

kubelet: enable CRI container metrics

Fixes #46984
2017-09-15 01:08:05 -07:00
Kubernetes Submit Queue 7181dd4946 Merge pull request #50476 from caesarxuchao/plumb-proxy
Automatic merge from submit-queue (batch tested with PRs 51824, 50476, 52451, 52009, 52237)

Plumbing the proxy dialer to the webhook admission plugin

* Fixing https://github.com/kubernetes/kubernetes/issues/49987. Plumb the `Dial` function to the `transport.Config`
* Fixing https://github.com/kubernetes/kubernetes/issues/52366. Let the webhook admission plugin sets the `TLSConfg.ServerName`.

I tested it in my gke setup. I don't have time to implement an e2e test before 1.8 release. I think it's ok to add the test later, because *i)* the change only affects the alpha webhook admission feature, and *ii)* the webhook feature is unusable without the fix. That said, it's up to my reviewer to decide.

Filed https://github.com/kubernetes/kubernetes/issues/52368 for the missing e2e test.

( The second commit is https://github.com/kubernetes/kubernetes/pull/52372, which is just a cleanup of client configuration in e2e tests. It removed a function that marshalled the client config to json and then unmarshalled it. It is a prerequisite of this PR, because this PR added the `Dial` function to the config which is not json marshallable.)

```release-note
Fixed the webhook admission plugin so that it works even if the apiserver and the nodes are in two networks (e.g., in GKE).
Fixed the webhook admission plugin so that webhook author could use the DNS name of the service as the CommonName when generating the server cert for the webhook.

Action required:
Anyone who generated server cert for admission webhooks need to regenerate the cert. Previously, when generating server cert for the admission webhook, the CN value doesn't matter. Now you must set it to the DNS name of the webhook service, i.e., `<service.Name>.<service.Namespace>.svc`.
```
2017-09-15 01:08:01 -07:00
Kubernetes Submit Queue b3e641d7f3 Merge pull request #51824 from ihmccreery/oss-mdc
Automatic merge from submit-queue (batch tested with PRs 51824, 50476, 52451, 52009, 52237)

Allow metadata firewall & proxy on in GCE, off by default

**What this PR does / why we need it**: Add necessary variables in kube-env to allow a user to turn on metadata firewall and proxy for K8s on GCE.

Ref #8867.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: 

**Special notes for your reviewer**:

**Release note**:

```release-note
GCE users can enable the metadata firewall and metadata proxy with KUBE_FIREWALL_METADATA_SERVER and ENABLE_METADATA_PROXY, respectively.
```
2017-09-15 01:07:58 -07:00
Kubernetes Submit Queue 9d8c11924f Merge pull request #51781 from bsalamat/preemption_tests
Automatic merge from submit-queue (batch tested with PRs 52442, 52247, 46542, 52363, 51781)

Add more tests for pod preemption

**What this PR does / why we need it**:
Adds more e2e and integration tests for pod preemption.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:
This PR is based on #50949. Only the last commit is new.

**Release note**:

```release-note
NONE
```

ref/ #47604

@kubernetes/sig-scheduling-pr-reviews @davidopp
2017-09-15 00:11:17 -07:00
Kubernetes Submit Queue ce5c41ab0f Merge pull request #52363 from balajismaniam/fix-cpuman-restartpol-never-bug
Automatic merge from submit-queue (batch tested with PRs 52442, 52247, 46542, 52363, 51781)

Make CPU manager release CPUs when Pod enters completed phase. 

**What this PR does / why we need it**: When CPU manager is enabled, this PR releases allocated CPUs when container is not running and is non-restartable. 

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #52351

**Special notes for your reviewer**:
This bug is only reproduced for pods with `restartPolicy` = `Never` or `OnFailure`.  The following output is from a 4 CPU node. This bug can be reproduced as long >= half the cores are requested. 

pod1.yaml:
```
apiVersion: v1
kind: Pod
metadata:
  name: test-pod1
spec:
  containers:
  - image: ubuntu
    command: ["/bin/bash"]
    args: ["-c", "sleep 5"]
    name: test-container1
    resources:
      requests:
        cpu: 2
        memory: 100Mi
      limits:
        cpu: 2
        memory: 100Mi
  restartPolicy: "Never"
```

pod2.yaml:
```
apiVersion: v1
kind: Pod
metadata:
  name: test-pod2
spec:
  containers:
  - image: ubuntu
    command: ["/bin/bash"]
    args: ["-c", "sleep 5"]
    name: test-container1
    resources:
      requests:
        cpu: 2
        memory: 100Mi
      limits:
        cpu: 2
        memory: 100Mi
  restartPolicy: "Never"
```
Run a local Kubernetes cluster with CPU manager enabled. 
```sh
KUBELET_FLAGS='--feature-gates=CPUManager=true --cpu-manager-policy=static --cpu-manager-reconcile-period=1s --kube-reserved=cpu=500m' ./hack/local-up-cluster.sh
```
_Before:_
Create `test-pod1` using pod1.yaml. 
```
./cluster/kubectl.sh create -f pod1.yaml
```
Wait for the pod to complete and wait another 90 seconds (give enough time for GC to kick-in). 

Create `test-pod2` using pod2.yaml. 
```
./cluster/kubectl.sh create -f pod2.yaml
```

Get all pods in the cluster. 
```
./cluster/kubectl.sh get pods -a
NAME        READY     STATUS                                         RESTARTS   AGE
test-pod1   0/1       Completed                                      0          1m
test-pod2   0/1       not enough cpus available to satisfy request   0          9s
```

_After:_
Create `test-pod1` using pod1.yaml. 
```
./cluster/kubectl.sh create -f pod1.yaml
```
Wait for the pod to complete and wait another 90 seconds (give enough time for GC to kick-in). 

Create `test-pod2` using pod2.yaml. 
```
./cluster/kubectl.sh create -f pod2.yaml
```

Get all pods in the cluster. 
```
./cluster/kubectl.sh get pods -a
NAME        READY     STATUS      RESTARTS   AGE
test-pod1   0/1       Completed    0          1m
test-pod2   0/1       Completed    0          9s
```
2017-09-15 00:11:14 -07:00
Kubernetes Submit Queue 20a4112e88 Merge pull request #46542 from derekwaynecarr/quota-ignore-pod-whose-node-lost
Automatic merge from submit-queue (batch tested with PRs 52442, 52247, 46542, 52363, 51781)

Ignore pods for quota marked for deletion whose node is unreachable

**What this PR does / why we need it**:
Traditionally, we charge to quota all pods that are in a non-terminal phase.  We have a user report that noted the behavior change in kube 1.5 for the node controller to no longer force delete pods whose nodes have been lost.  Instead, the pod is marked for deletion, and the reason is updated to state that the node is unreachable.  The user expected the quota to be released.  If the user was at their quota limit, their application may not be able to create a new replica given the current behavior.  As a result, this PR ignores pods marked for deletion that have exceeded their grace period.

**Which issue this PR fixes**
xref https://bugzilla.redhat.com/show_bug.cgi?id=1455743
fixes https://github.com/kubernetes/kubernetes/issues/52436

**Release note**:
```release-note
Ignore pods marked for deletion that exceed their grace period in ResourceQuota
```
2017-09-15 00:11:10 -07:00
Kubernetes Submit Queue 1646db0ba7 Merge pull request #52247 from wackxu/atd
Automatic merge from submit-queue (batch tested with PRs 52442, 52247, 46542, 52363, 51781)

Add some test case in default_test.go

**What this PR does / why we need it**:

Add some test case in default_test.go

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #


**Release note**:

```release-note
NONE
```
2017-09-15 00:11:08 -07:00
Kubernetes Submit Queue 2c81db53ce Merge pull request #52442 from crassirostris/sd-logging-e2e-fix-trimming
Automatic merge from submit-queue

[fluentd-gcp addon] Remove some e2e tests out of blocking suites

Fixes https://github.com/kubernetes/kubernetes/issues/52433

Some Stackdriver Logging e2e tests are broken in release-blocking suites:

- Due to the change in Docker 1.13, on some systems logs are automatically split by 16K chunks. This PR removes an e2e test that assumes otherwise
- In large clusters, it's not possible to ingest system logs from all nodes

Since it's not a Kubernetes problem per se, mitigating this by removing these tests from blocking suites.
2017-09-14 23:38:04 -07:00