Commit Graph

6704 Commits (ab5a82d6e6fd9afc322b82ae448b6930c98d540b)

Author SHA1 Message Date
Kubernetes Submit Queue 747b153265 Merge pull request #42661 from dashpole/garbage_fix
Automatic merge from submit-queue

[Bug Fix] Garbage Collect Node e2e Failing

This node e2e test uses its own deletion timeout (1 minute) instead of the default (3 minutes).
#41644 likely increased time for deletion.  See that PR for analysis on that.
There may be other problems with this test, but those are difficult to pick apart from hitting this low timeout.

This PR changes the Garbage Collector test to use the default timeout.  This should allow us to discern if there are any actual bugs to fix.

cc @kubernetes/sig-node-bugs @calebamiles @derekwaynecarr
2017-03-07 16:09:44 -08:00
Kubernetes Submit Queue d6ae61c3c9 Merge pull request #42588 from krousey/upgrades
Automatic merge from submit-queue

Create "framework" per upgrade test

There were already a few tests just using the default framework
namespace instead of creating a new one. Also there are several
testing libraries that use the default framework's default namespace
as well. It's just easier this way.
2017-03-07 14:56:12 -08:00
David Ashpole 6a0d5506c2 use default timeout 2017-03-07 11:45:59 -08:00
Kubernetes Submit Queue feb81270d1 Merge pull request #42346 from kargakis/e2e-deployment-fix
Automatic merge from submit-queue

e2e: require minimum completeness in deployment tests

@janetkuo should reduce deployment-related flakes

For example:
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gke/5334#k8sio-deployment-scaled-rollout-deployment-should-not-block-on-annotation-check
Deployment with 5 pods, all of them updated to the latest replica set. For some reason one of them is never scheduled and even though the strategy parameters in the Deployment allow 2 maxUnavailable the test is stuck for 5mins waiting for the last pod.
```
I0228 12:33:31.406] Feb 28 12:33:20.255: INFO: Pod nginx-2629236890-r7z5n is not available:
I0228 12:33:31.406] {TypeMeta:{Kind: APIVersion:} ObjectMeta:{Name:nginx-2629236890-r7z5n GenerateName:nginx-2629236890- Namespace:e2e-tests-deployment-gctjw SelfLink:/api/v1/namespaces/e2e-tests-deployment-gctjw/pods/nginx-2629236890-r7z5n UID:7095ad3e-fdf4-11e6-af4a-42010af00010 ResourceVersion:7804 Generation:0 CreationTimestamp:2017-02-28 12:28:18.408376434 -0800 PST DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[pod-template-hash:2629236890 name:nginx] Annotations:map[kubernetes.io/created-by:{"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"e2e-tests-deployment-gctjw","name":"nginx-2629236890","uid":"7085ba3f-fdf4-11e6-af4a-42010af00010","apiVersion":"extensions","resourceVersion":"7743"}}
I0228 12:33:31.407] ] OwnerReferences:[{APIVersion:extensions/v1beta1 Kind:ReplicaSet Name:nginx-2629236890 UID:7085ba3f-fdf4-11e6-af4a-42010af00010 Controller:0xc420beccf0}] Finalizers:[] ClusterName:} Spec:{Volumes:[{Name:default-token-58h80 VolumeSource:{HostPath:nil EmptyDir:nil GCEPersistentDisk:nil AWSElasticBlockStore:nil GitRepo:nil Secret:&SecretVolumeSource{SecretName:default-token-58h80,Items:[],DefaultMode:*420,Optional:nil,} NFS:nil ISCSI:nil Glusterfs:nil PersistentVolumeClaim:nil RBD:nil FlexVolume:nil Cinder:nil CephFS:nil Flocker:nil DownwardAPI:nil FC:nil AzureFile:nil ConfigMap:nil VsphereVolume:nil Quobyte:nil AzureDisk:nil PhotonPersistentDisk:nil Projected:nil}}] InitContainers:[] Containers:[{Name:nginx Image:gcr.io/google_containers/update-demo:kitten Command:[] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:default-token-58h80 ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false}] RestartPolicy:Always TerminationGracePeriodSeconds:0xc420becd28 ActiveDeadlineSeconds:<nil> DNSPolicy:ClusterFirst NodeSelector:map[] ServiceAccountName:default DeprecatedServiceAccount:default AutomountServiceAccountToken:<nil> NodeName:gke-bootstrap-e2e-default-pool-800b6140-fsl5 HostNetwork:false HostPID:false HostIPC:false SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,} ImagePullSecrets:[] Hostname: Subdomain: Affinity:nil SchedulerName:default-scheduler Tolerations:[]} Status:{Phase:Pending Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2017-02-28 12:28:18 -0800 PST Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2017-02-28 12:28:18 -0800 PST Reason:ContainersNotReady Message:containers with unready status: [nginx]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2017-02-28 12:28:18 -0800 PST Reason: Message:}] Message: Reason: HostIP:10.240.0.2 PodIP: StartTime:2017-02-28 12:28:18 -0800 PST InitContainerStatuses:[] ContainerStatuses:[{Name:nginx State:{Waiting:&ContainerStateWaiting{Reason:ContainerCreating,Message:,} Running:nil Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:gcr.io/google_containers/update-demo:kitten ImageID: ContainerID:}] QOSClass:BestEffort}}
```

@kubernetes/sig-apps-misc
2017-03-07 09:11:25 -08:00
Kubernetes Submit Queue cf1160702b Merge pull request #42641 from deads2k/api-09-patch
Automatic merge from submit-queue

deflake TestPatch by waiting for cache

Fixes #39471

Rather than retry on conflicts for an unknown number of times, we have the resource version after the previous patch and we can use that to wait for a GET response that is at least as current as that resourceVersion.

@kubernetes/sig-api-machinery-pr-reviews @liggitt @wojtek-t
2017-03-07 08:11:07 -08:00
Kubernetes Submit Queue 5cc6a4e269 Merge pull request #42609 from intelsdi-x/test-out-of-oir
Automatic merge from submit-queue (batch tested with PRs 41890, 42593, 42633, 42626, 42609)

Pods pending due to insufficient OIR should get scheduled once sufficient OIR becomes available (e2e disabled).

#41870 was reverted because it introduced an e2e test flake. This is the same code with the e2e for OIR disabled again.

We can attempt to enable the e2e test cases one-by-one in follow-up PRs, but it would be preferable to get the main fix merged in time for 1.6 since OIR is broken on master (see #41861).

cc @timothysc
2017-03-07 08:10:46 -08:00
Kubernetes Submit Queue ed04316828 Merge pull request #41890 from soltysh/issue37166
Automatic merge from submit-queue (batch tested with PRs 41890, 42593, 42633, 42626, 42609)

Remove everything that is not new from batch/v2alpha1

Fixes #37166.

@lavalamp you've asked for it 
@erictune this is a prereq for moving CronJobs to beta. I initially planned to put all in one PR, but after I did that I figured out it'll be easier to review separately. ptal 

@kubernetes/api-approvers @kubernetes/sig-api-machinery-pr-reviews ptal
2017-03-07 08:10:38 -08:00
deads2k 5fd0162a8a deflake TestPatch by waiting for cache 2017-03-07 08:53:29 -05:00
Kubernetes Submit Queue 302aab9ba2 Merge pull request #42584 from shashidharatd/federation
Automatic merge from submit-queue (batch tested with PRs 42506, 42585, 42596, 42584)

[Federation][e2e] Add wait for namespaces to appear in federated cluster

Some federation e2e tests are failing because the namespace (created for every test case) does not seem to appear in one (or more) of federated clusters.
This may be a timing issue, so introducing a wait until the namespace is created in federated clusters.

cc @madhusudancs, @nikhiljindal, @kubernetes/sig-federation-bugs
2017-03-06 22:20:15 -08:00
Kubernetes Submit Queue 31db570a00 Merge pull request #42497 from derekwaynecarr/lower_cgroup_names
Automatic merge from submit-queue

cgroup names created by kubelet should be lowercased

**What this PR does / why we need it**:
This PR modifies the kubelet to create cgroupfs names that are lowercased.  This better aligns us with the naming convention for cgroups v2 and other cgroup managers in ecosystem (docker, systemd, etc.)

See: https://www.kernel.org/doc/Documentation/cgroup-v2.txt
"2-6-2. Avoid Name Collisions"

**Special notes for your reviewer**:
none

**Release note**:
```release-note
kubelet created cgroups follow lowercase naming conventions
```
2017-03-06 20:43:03 -08:00
Connor Doyle 364dbc0ca5 Revert "Revert "Pods pending due to insufficient OIR should get scheduled once sufficient OIR becomes available.""
- This reverts commit 60758f3fff.
- Disabled opaque integer resource end-to-end tests.
2017-03-06 17:48:09 -08:00
Kubernetes Submit Queue 20c13c1564 Merge pull request #42555 from nicksardo/upgrade-ingress-flake
Automatic merge from submit-queue (batch tested with PRs 42080, 41653, 42598, 42555)

Fix resource cleanup in ingress_utils.go within e2e/framework

**What this PR does / why we need it**:
The GLBC is failing to delete resources during the etcd rollback tests and the e2e cleanup is leaking them. After a short while, tests are failing to create new resources. 
This PR addresses the e2e/framework's ability to delete GLBC-created resources and adds more logging.

**Which issue this PR fixes**:
Helps #38569 but does not completely close this flake

**Special notes for your reviewer**:
Resources were not being deleted because resource names were being truncated and then their ability to be deleted was determined by the entire cluster id existing in the name. Truncated names also have an extra '0' append to the end of their name (unknown origin). This PR tries to match on a common prefix. 

Minor changes were made to improve log readability. 

**Testing this PR**:
This was tested by running a master upgrade test and by adding a second forwarding-rule mid-run.  This forwarding rule referenced the same url-map used by the first forwarding-rule created by the GLBC. Therefore, the GLBC will be able to delete the forwarding-rule but not anymore L7 resources. This second forwarding rule's name was nearly identical to the first forwarding rule so that the cleanup code will find it. 

As you can see from the test run below, the cleanup code deleted all the resources that the GLBC could not.

```log
...
Mar  5 18:35:53.112: INFO: Monitoring glbc's cleanup of gce resources:
k8s-fws-e2e-tests-ingress-upgrsde-0px85-static-ip--5f38ac0e2420 (forwarding rule)
k8s-tps-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e2420 (target-https-proxy)
k8s-um-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e24260 (url-map)
k8s-be-32331--5f38ac0e2426f796 (backend-service)
k8s-be-32613--5f38ac0e2426f796 (backend-service)
k8s-be-32331--5f38ac0e2426f796 (http-health-check)
k8s-be-32613--5f38ac0e2426f796 (http-health-check)
k8s-ig--5f38ac0e2426f796 (instance-group)
k8s-ssl-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e2420 (ssl-certificate)

STEP: Performing final delete of any remaining resources
Mar  5 18:35:54.055: INFO: Deleting forwarding-rules: k8s-fws-e2e-tests-ingress-upgrsde-0px85-static-ip--5f38ac0e2420
Mar  5 18:36:06.945: INFO: Deleting target-https-proxies: k8s-tps-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e2420
Mar  5 18:36:14.301: INFO: Deleting url-map: k8s-um-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e24260
Mar  5 18:36:18.309: INFO: Deleting backed-service: k8s-be-32331--5f38ac0e2426f796
Mar  5 18:36:22.112: INFO: Deleting backed-service: k8s-be-32613--5f38ac0e2426f796
Mar  5 18:36:26.192: INFO: Deleting http-health-check: k8s-be-32331--5f38ac0e2426f796
Mar  5 18:36:29.846: INFO: Deleting http-health-check: k8s-be-32613--5f38ac0e2426f796
Mar  5 18:36:33.722: INFO: Deleting instance-group: k8s-ig--5f38ac0e2426f796
Mar  5 18:36:37.762: INFO: Deleting ssl-certificate: k8s-ssl-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e2420
STEP: No resources leaked.
Mar  5 18:36:46.441: INFO: Deleting addresses: e2e-tests-ingress-upgrade-0px85-static-ip
Mar  5 18:36:53.902: INFO: L7 controller failed to delete all cloud resources on time. timed out waiting for the condition
...
```
2017-03-06 17:16:17 -08:00
Kubernetes Submit Queue 8e52bec3cd Merge pull request #42598 from kubernetes/revert-41870-test-out-of-oir
Automatic merge from submit-queue (batch tested with PRs 42080, 41653, 42598, 42555)

Revert "Pods pending due to insufficient OIR should get scheduled once sufficient OIR becomes available."

Reverts kubernetes/kubernetes#41870 for stopping bleeding edge: #42597

cc/ @ConnorDoyle @kubernetes/release-team 

Connor if there is a pending pr to fix the issue, please point it out to me. We can close this one, otherwise, I would like to revert the pr first. You can resubmit the fix. Thanks!
2017-03-06 17:16:15 -08:00
Kubernetes Submit Queue d50a59ec66 Merge pull request #42080 from enisoc/controller-ref-ss
Automatic merge from submit-queue (batch tested with PRs 42080, 41653, 42598, 42555)

StatefulSet: Respect ControllerRef

**What this PR does / why we need it**:

This is part of the completion of the [ControllerRef](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md) proposal. It brings StatefulSet into full compliance with ControllerRef. See the individual commit messages for details.

**Which issue this PR fixes**:

Fixes #36859

**Special notes for your reviewer**:

**Release note**:

```release-note
StatefulSet now respects ControllerRef to avoid fighting over Pods. At the time of upgrade, **you must not have StatefulSets with selectors that overlap** with any other controllers (such as ReplicaSets), or else [ownership of Pods may change](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md#upgrading).
```
cc @erictune @kubernetes/sig-apps-pr-reviews
2017-03-06 17:16:10 -08:00
shashidharatd a4b6c72f26 Add a wait for namespaces to appear in federated cluster 2017-03-07 05:53:55 +05:30
Kubernetes Submit Queue e82834e4d8 Merge pull request #42405 from jszczepkowski/e2e-upgrade-hpa
Automatic merge from submit-queue (batch tested with PRs 41826, 42405)

Fixed too long name in HPA e2e upgrade test.

Fixed too long name in HPA e2e upgrade test.

```release-note
NONE
```
2017-03-06 15:06:07 -08:00
Kubernetes Submit Queue d731dc7546 Merge pull request #41826 from bowei/stub-2
Automatic merge from submit-queue (batch tested with PRs 41826, 42405)

Add stubDomains and upstreamNameservers configuration to kube-dns

```release-note
Updates the dnsmasq cache/mux layer to be managed by dnsmasq-nanny.
dnsmasq-nanny manages dnsmasq based on values from the
kube-system:kube-dns configmap:

"stubDomains": {
	"acme.local": ["1.2.3.4"]
},

is a map of domain to list of nameservers for the domain. This is used
to inject private DNS domains into the kube-dns namespace. In the above
example, any DNS requests for *.acme.local will be served by the
nameserver 1.2.3.4.

"upstreamNameservers": ["8.8.8.8", "8.8.4.4"]

is a list of upstreamNameservers to use, overriding the configuration
specified in /etc/resolv.conf.
```
2017-03-06 15:06:04 -08:00
Dawn Chen 60758f3fff Revert "Pods pending due to insufficient OIR should get scheduled once sufficient OIR becomes available." 2017-03-06 14:27:17 -08:00
Kris a2eac2160e Create "framework" per upgrade test
There were already a few tests just using the default framework
namespace instead of creating a new one. Also there are several
testing libraries that use the default framework's default namespace
as well. It's just easier this way.
2017-03-06 11:32:23 -08:00
Kubernetes Submit Queue 0fad9ce5e2 Merge pull request #41870 from intelsdi-x/test-out-of-oir
Automatic merge from submit-queue (batch tested with PRs 31783, 41988, 42535, 42572, 41870)

Pods pending due to insufficient OIR should get scheduled once sufficient OIR becomes available.

This appears to be a regression since v1.5.0 in scheduler behavior for opaque integer resources, reported in https://github.com/kubernetes/kubernetes/issues/41861.

- [X] Add failing e2e test to trigger the regression
- [x] Restore previous behavior (pods pending due to insufficient OIR get scheduled once sufficient OIR becomes available.)
2017-03-06 11:30:24 -08:00
Kubernetes Submit Queue cbfbf090c5 Merge pull request #42572 from deads2k/api-08-initializer
Automatic merge from submit-queue (batch tested with PRs 31783, 41988, 42535, 42572, 41870)

update names for kube plugin initializer to avoid conflicts

Fixes #42581

Other API servers are likely to create admission plugin initializers and so the names we choose for our interfaces matter (they may want to run multiple initializers in the chain).  This updates the names for the plugin initializers to be more specific.  No other changes.

@ncdc
2017-03-06 11:30:18 -08:00
Anthony Yeh d8a0017559 StatefulSet: Use synchronous Delete of SS in e2e.
This is needed because we changed the default for SS to
OrphanDependents.
2017-03-06 09:46:04 -08:00
Derek Carr 48d822eafe cgroup names created by kubelet should be lowercased 2017-03-06 11:19:21 -05:00
deads2k d89862beca update names for kube plugin initializer to avoid conflicts 2017-03-06 10:18:21 -05:00
Seth Jennings ccd87fca3f kubelet: add cgroup manager metrics 2017-03-06 08:53:47 -06:00
Maciej Szulik 6c0cd24865 Drop NewDefultGroupVersionFramework in favor of client automatically handling proper GroupVersions 2017-03-06 12:12:38 +01:00
Maciej Szulik 7cba9d9c92 Issue 37166: remove everything from batch/v2alpha1 that is not new 2017-03-06 12:12:38 +01:00
Nick Sardo 6bba665c81 Cleanup gce resources from ingress upgrade tests 2017-03-05 18:38:45 -08:00
Kubernetes Submit Queue 79883dc48d Merge pull request #42070 from luxas/remove_kube_discovery
Automatic merge from submit-queue

Remove the kube-discovery binary from the tree

**What this PR does / why we need it**:

kube-discovery was a temporary solution to implementing proposal: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/bootstrap-discovery.md

However, this functionality is now gonna be implemented in the core for v1.6 and will fully replace kube-discovery:
 - https://github.com/kubernetes/kubernetes/pull/36101 
 - https://github.com/kubernetes/kubernetes/pull/41281
 - https://github.com/kubernetes/kubernetes/pull/41417

So due to that `kube-discovery` isn't used in any v1.6 code, it should be removed.
The image `gcr.io/google_containers/kube-discovery-${ARCH}:1.0` should and will continue to exist so kubeadm <= v1.5 continues to work.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
Remove cmd/kube-discovery from the tree since it's not necessary anymore
```
@jbeda @dgoodwin @mikedanese @dmmcquay @lukemarsden @errordeveloper @pires
2017-03-04 12:58:23 -08:00
Connor Doyle e402c5d5e5 Added missing OIR e2e test: sched after release. 2017-03-04 09:26:20 -08:00
Kubernetes Submit Queue 2ebf6edef3 Merge pull request #41942 from csbell/fw-name
Automatic merge from submit-queue

Add ProviderUid support to Federated Ingress

This PR (along with GLBC support [here](https://github.com/kubernetes/ingress/pull/278)) is a proposed fix for #39989. The Ingress controller uses a configMap reconciliation process to ensure that all underlying ingresses agree on a unique UID. This works for all of GLBC's resources except firewalls which need their own cluster-unique UID. This PR introduces a ProviderUid which is maintained and synchronized cross-cluster much like the UID. We chose to derive the ProviderUid from the cluster name (via md5 hash).

Testing here is augmented to guarantee that configMaps are adequately propagated prior to Ingress creation.

```release-note
Federated Ingress over GCE no longer requires separate firewall rules to be created for each cluster to circumvent flapping firewall health checks.
```

cc @madhusudancs @quinton-hoole
2017-03-04 02:51:04 -08:00
Kubernetes Submit Queue 52f4d38069 Merge pull request #42370 from janetkuo/ds-e2e-ignore-no-schedule-taint
Automatic merge from submit-queue (batch tested with PRs 42456, 42457, 42414, 42480, 42370)

In DaemonSet e2e test, don't check nodes with NoSchedule taints

Fixes #42345 

For example, master node has a ismaster:NoSchedule taint. We don't expect pods to be created there without toleration. 

cc @marun @lukaszo @kargakis @yujuhong @Random-Liu @davidopp @kubernetes/sig-apps-pr-reviews
2017-03-04 00:17:47 -08:00
Kubernetes Submit Queue cb0728c50f Merge pull request #42457 from yujuhong/do_not_panic
Automatic merge from submit-queue (batch tested with PRs 42456, 42457, 42414, 42480, 42370)

node e2e: apparmor test should fail instead of panicking

This doesn't fix #42420, but at least stop the test from panicking.
2017-03-04 00:17:42 -08:00
Kubernetes Submit Queue c6d9d9c5ad Merge pull request #42456 from Random-Liu/update-npd-in-kubemark
Automatic merge from submit-queue (batch tested with PRs 42456, 42457, 42414, 42480, 42370)

Update npd in kubemark since #42201 is merged.

Revert https://github.com/kubernetes/kubernetes/pull/41716.

#42201 has been merged, and #41713 is fixed. Now we could retry update npd in kubemark.

/cc @shyamjvs @wojtek-t @dchen1107
2017-03-04 00:17:40 -08:00
Kubernetes Submit Queue a56bba263d Merge pull request #42455 from gmarek/kubemark-logs
Automatic merge from submit-queue (batch tested with PRs 42369, 42375, 42397, 42435, 42455)

Add alsologtostderr flag to hollow node

@yujuhong @wojtek-t that should solve the kubemark log issue.
2017-03-03 23:21:49 -08:00
Kubernetes Submit Queue f9ccee7714 Merge pull request #42435 from dashpole/timestamps_for_fsstats
Automatic merge from submit-queue (batch tested with PRs 42369, 42375, 42397, 42435, 42455)

[Bug Fix]: Avoid evicting more pods than necessary by adding Timestamps for fsstats and ignoring stale stats

Continuation of #33121.  Credit for most of this goes to @sjenning.  I added volume fs timestamps.

**why is this a bug** 
This PR attempts to fix part of https://github.com/kubernetes/kubernetes/issues/31362 which results in multiple pods getting evicted unnecessarily whenever the node runs into resource pressure. This PR reduces the chances of such disruptions by avoiding reacting to old/stale metrics.
Without this PR, kubernetes nodes under resource pressure will cause unnecessary disruptions to user workloads. 
This PR will also help deflake a node e2e test suite.

The eviction manager currently avoids evicting pods if metrics are old.  However, timestamp data is not available for filesystem data, and this causes lots of extra evictions.
See the [inode eviction test flakes](https://k8s-testgrid.appspot.com/google-node#kubelet-flaky-gce-e2e) for examples.
This should probably be treated as a bugfix, as it should help mitigate extra evictions.

cc: @kubernetes/sig-storage-pr-reviews  @kubernetes/sig-node-pr-reviews @vishh @derekwaynecarr @sjenning
2017-03-03 23:21:48 -08:00
Kubernetes Submit Queue 2d319bd406 Merge pull request #42204 from dashpole/allocatable_eviction
Automatic merge from submit-queue

Eviction Manager Enforces Allocatable Thresholds

This PR modifies the eviction manager to enforce node allocatable thresholds for memory as described in kubernetes/community#348.
This PR should be merged after #41234. 

cc @kubernetes/sig-node-pr-reviews @kubernetes/sig-node-feature-requests @vishh 

** Why is this a bug/regression**

Kubelet uses `oom_score_adj` to enforce QoS policies. But the `oom_score_adj` is based on overall memory requested, which means that a Burstable pod that requested a lot of memory can lead to OOM kills for Guaranteed pods, which violates QoS. Even worse, we have observed system daemons like kubelet or kube-proxy being killed by the OOM killer.
Without this PR, v1.6 will have node stability issues and regressions in an existing GA feature `out of Resource` handling.
2017-03-03 20:20:12 -08:00
Kubernetes Submit Queue f750721b84 Merge pull request #42367 from kow3ns/ss-flakes
Automatic merge from submit-queue (batch tested with PRs 42443, 38924, 42367, 42391, 42310)

Fix StatefulSet e2e flake

**What this PR does / why we need it**:
Fixes StatefulSet e2e flake by ensuring that the StatefulSet controller has observed the unreadiness of Pods prior to attempting to exercise scale functionality.
**Which issue this PR fixes** 
fixes #41889
```release-note
NONE
```
2017-03-03 18:08:42 -08:00
Kubernetes Submit Queue 67500b3947 Merge pull request #42443 from Random-Liu/fix-node-e2e-npd
Automatic merge from submit-queue (batch tested with PRs 42443, 38924, 42367, 42391, 42310)

Cast system uptime to time.Duration to fix cross build.

Fixes https://github.com/kubernetes/kubernetes/issues/42441.

Cast system uptime to `time.Duration` to avoid different behavior on different architectures.

@sjenning @ixdy @ncdc
2017-03-03 18:08:38 -08:00
Kubernetes Submit Queue 98eae9b222 Merge pull request #42341 from dashpole/critial_pod_test
Automatic merge from submit-queue

Critial pod test uses allocatable instead of capacity

This solves #42239.

When this test was first introduced, pods could request up to the capacity of the node.
With the addition of allocatable introduced in #41234, this is no longer the case, and pods can only use up to allocatable.

This should be included in 1.6, as it is a bug related to a 1.6 feature.

cc @vish @yujuhong
2017-03-03 14:34:37 -08:00
Kenneth Owens 08f95aff0f Fixes e2e flake by ensuring that the StatefulSet observes mutations to
Pods prior mutating the StatefulSet object to trigger sclaing.

Add ObervedVersion check
2017-03-03 11:24:47 -08:00
Kubernetes Submit Queue a2c7eb2754 Merge pull request #42266 from wojtek-t/fix_secret_tests_in_large_clusters
Automatic merge from submit-queue (batch tested with PRs 41306, 42187, 41666, 42275, 42266)

Bump test timeouts to make secret tests work in large clusters
2017-03-03 10:54:45 -08:00
Clayton Coleman f3d9ad678a
Handle potential conflicts on service endpoint test deletion
Cleanup should ignore conflicts and notfound errors
2017-03-03 01:29:57 -05:00
Anthony Yeh bdc4540e92 e2e/framework: Deflake cleanup of RCs in service_util.
The previous Get/Update pattern with no retry on resource version mismatch
would flake with the following error:

"the object has been modified; please apply your changes to the latest
version and try again"
2017-03-02 16:51:14 -08:00
Yu-Ju Hong 1d907dbf4f node e2e: apparmor test should fail instead of panicking 2017-03-02 16:36:52 -08:00
Random-Liu 3f30532b0f Update npd in kubemark since #42201 is merged. 2017-03-02 16:29:24 -08:00
gmarek 30b9490d66 Add alsologtostderr flag to hollow node 2017-03-03 01:29:02 +01:00
Janet Kuo 7bdf54a30a In DaemonSet e2e test, don't check nodes with NoSchedule taints 2017-03-02 16:04:39 -08:00
David Ashpole a90c7951d4 add volume timestamps 2017-03-02 15:01:59 -08:00
Random-Liu d41c2503e7 Cast system uptime to time.Duration to fix cross build. 2017-03-02 14:48:09 -08:00