Commit Graph

659 Commits (5e29e1ee05556800cd2fe0dc5341d5de54b4ab47)

Author SHA1 Message Date
shashidharatd a14f8dc346 auto generated bazel build files 2017-03-11 01:39:56 +05:30
shashidharatd 662f0ef531 Add framework for federation upgrade tests 2017-03-11 01:39:56 +05:30
shashidharatd 4443a1b40d Move few reusable functions to upgrade_utils.go 2017-03-11 01:39:56 +05:30
Kubernetes Submit Queue 4ff0af821a Merge pull request #42879 from jsafrane/test-pod-logs
Automatic merge from submit-queue

e2e test: Log container output on TestContainerOutput error

When a pod started with TestContainerOutput or TestContainerOutputRegexp
fails from unknown reason, we should log all output of all its containers
so we can analyze what went wrong.

This would help us to see what wrong in https://github.com/kubernetes/kubernetes/issues/40811 - a container is running there for 3 minutes and dies and we want to see what it did for these 3 minutes.

```release-note
NONE
```
2017-03-10 06:13:44 -08:00
Jan Safranek bc06c636d1 e2e test: Log container output on TestContainerOutput error
When a pod started with TestContainerOutput or TestContainerOutputRegexp
fails from unknown reason, we should log all output of all its containers
so we can analyze what went wrong.
2017-03-10 10:08:57 +01:00
Kubernetes Submit Queue 4540674b04 Merge pull request #42758 from krousey/downgrades
Automatic merge from submit-queue (batch tested with PRs 42734, 42745, 42758, 42814, 42694)

Implement automated downgrade testing.

Node version cannot be higher than the master version, so we must
switch the node version first. Also, we must use the upgrade script
from the appropriate version for GCE.
2017-03-09 15:06:56 -08:00
Kris cc84e0895a Implement automated downgrade testing.
Node version cannot be higher than the master version, so we must
switch the node version first. Also, we must use the upgrade script
from the appropriate version for GCE.
2017-03-09 12:45:20 -08:00
David Ashpole 3806d386df use default timeout for deletion 2017-03-08 14:40:19 -08:00
Kubernetes Submit Queue 1402cc588f Merge pull request #42681 from yujuhong/fix_restart
Automatic merge from submit-queue (batch tested with PRs 42652, 42681, 42708, 42730)

e2e: fix restarting the apiserver

The string used to match the image name of the apiserver (e.g., `gcr.io/google_containers/kube-apiserver:3be...`),
but this no longer works. Change the test to locate the kube-apiserver container by name.
2017-03-08 11:38:07 -08:00
Kubernetes Submit Queue d306acca86 Merge pull request #42175 from enisoc/controller-ref-dep
Automatic merge from submit-queue

Deployment: Fully Respect ControllerRef

**What this PR does / why we need it**:

This is part of the completion of the [ControllerRef](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md) proposal. It brings Deployment into full compliance with ControllerRef. See the individual commit messages for details.

**Which issue this PR fixes**:

This ensures that Deployment does not fight with other controllers over control of Pods and ReplicaSets.

Ref: https://github.com/kubernetes/kubernetes/issues/24433

**Special notes for your reviewer**:

**Release note**:

```release-note
Deployment now fully respects ControllerRef to avoid fighting over Pods and ReplicaSets. At the time of upgrade, **you must not have Deployments with selectors that overlap**, or else [ownership of ReplicaSets may change](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md#upgrading).
```
cc @erictune @kubernetes/sig-apps-pr-reviews
2017-03-07 20:44:36 -08:00
Yu-Ju Hong 9401c39a4e e2e: fix restarting the apiserver 2017-03-07 14:54:59 -08:00
Kubernetes Submit Queue feb81270d1 Merge pull request #42346 from kargakis/e2e-deployment-fix
Automatic merge from submit-queue

e2e: require minimum completeness in deployment tests

@janetkuo should reduce deployment-related flakes

For example:
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gke/5334#k8sio-deployment-scaled-rollout-deployment-should-not-block-on-annotation-check
Deployment with 5 pods, all of them updated to the latest replica set. For some reason one of them is never scheduled and even though the strategy parameters in the Deployment allow 2 maxUnavailable the test is stuck for 5mins waiting for the last pod.
```
I0228 12:33:31.406] Feb 28 12:33:20.255: INFO: Pod nginx-2629236890-r7z5n is not available:
I0228 12:33:31.406] {TypeMeta:{Kind: APIVersion:} ObjectMeta:{Name:nginx-2629236890-r7z5n GenerateName:nginx-2629236890- Namespace:e2e-tests-deployment-gctjw SelfLink:/api/v1/namespaces/e2e-tests-deployment-gctjw/pods/nginx-2629236890-r7z5n UID:7095ad3e-fdf4-11e6-af4a-42010af00010 ResourceVersion:7804 Generation:0 CreationTimestamp:2017-02-28 12:28:18.408376434 -0800 PST DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[pod-template-hash:2629236890 name:nginx] Annotations:map[kubernetes.io/created-by:{"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"e2e-tests-deployment-gctjw","name":"nginx-2629236890","uid":"7085ba3f-fdf4-11e6-af4a-42010af00010","apiVersion":"extensions","resourceVersion":"7743"}}
I0228 12:33:31.407] ] OwnerReferences:[{APIVersion:extensions/v1beta1 Kind:ReplicaSet Name:nginx-2629236890 UID:7085ba3f-fdf4-11e6-af4a-42010af00010 Controller:0xc420beccf0}] Finalizers:[] ClusterName:} Spec:{Volumes:[{Name:default-token-58h80 VolumeSource:{HostPath:nil EmptyDir:nil GCEPersistentDisk:nil AWSElasticBlockStore:nil GitRepo:nil Secret:&SecretVolumeSource{SecretName:default-token-58h80,Items:[],DefaultMode:*420,Optional:nil,} NFS:nil ISCSI:nil Glusterfs:nil PersistentVolumeClaim:nil RBD:nil FlexVolume:nil Cinder:nil CephFS:nil Flocker:nil DownwardAPI:nil FC:nil AzureFile:nil ConfigMap:nil VsphereVolume:nil Quobyte:nil AzureDisk:nil PhotonPersistentDisk:nil Projected:nil}}] InitContainers:[] Containers:[{Name:nginx Image:gcr.io/google_containers/update-demo:kitten Command:[] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:default-token-58h80 ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false}] RestartPolicy:Always TerminationGracePeriodSeconds:0xc420becd28 ActiveDeadlineSeconds:<nil> DNSPolicy:ClusterFirst NodeSelector:map[] ServiceAccountName:default DeprecatedServiceAccount:default AutomountServiceAccountToken:<nil> NodeName:gke-bootstrap-e2e-default-pool-800b6140-fsl5 HostNetwork:false HostPID:false HostIPC:false SecurityContext:&PodSecurityContext{SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,SupplementalGroups:[],FSGroup:nil,} ImagePullSecrets:[] Hostname: Subdomain: Affinity:nil SchedulerName:default-scheduler Tolerations:[]} Status:{Phase:Pending Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2017-02-28 12:28:18 -0800 PST Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2017-02-28 12:28:18 -0800 PST Reason:ContainersNotReady Message:containers with unready status: [nginx]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2017-02-28 12:28:18 -0800 PST Reason: Message:}] Message: Reason: HostIP:10.240.0.2 PodIP: StartTime:2017-02-28 12:28:18 -0800 PST InitContainerStatuses:[] ContainerStatuses:[{Name:nginx State:{Waiting:&ContainerStateWaiting{Reason:ContainerCreating,Message:,} Running:nil Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:gcr.io/google_containers/update-demo:kitten ImageID: ContainerID:}] QOSClass:BestEffort}}
```

@kubernetes/sig-apps-misc
2017-03-07 09:11:25 -08:00
Kubernetes Submit Queue ed04316828 Merge pull request #41890 from soltysh/issue37166
Automatic merge from submit-queue (batch tested with PRs 41890, 42593, 42633, 42626, 42609)

Remove everything that is not new from batch/v2alpha1

Fixes #37166.

@lavalamp you've asked for it 
@erictune this is a prereq for moving CronJobs to beta. I initially planned to put all in one PR, but after I did that I figured out it'll be easier to review separately. ptal 

@kubernetes/api-approvers @kubernetes/sig-api-machinery-pr-reviews ptal
2017-03-07 08:10:38 -08:00
Kubernetes Submit Queue 20c13c1564 Merge pull request #42555 from nicksardo/upgrade-ingress-flake
Automatic merge from submit-queue (batch tested with PRs 42080, 41653, 42598, 42555)

Fix resource cleanup in ingress_utils.go within e2e/framework

**What this PR does / why we need it**:
The GLBC is failing to delete resources during the etcd rollback tests and the e2e cleanup is leaking them. After a short while, tests are failing to create new resources. 
This PR addresses the e2e/framework's ability to delete GLBC-created resources and adds more logging.

**Which issue this PR fixes**:
Helps #38569 but does not completely close this flake

**Special notes for your reviewer**:
Resources were not being deleted because resource names were being truncated and then their ability to be deleted was determined by the entire cluster id existing in the name. Truncated names also have an extra '0' append to the end of their name (unknown origin). This PR tries to match on a common prefix. 

Minor changes were made to improve log readability. 

**Testing this PR**:
This was tested by running a master upgrade test and by adding a second forwarding-rule mid-run.  This forwarding rule referenced the same url-map used by the first forwarding-rule created by the GLBC. Therefore, the GLBC will be able to delete the forwarding-rule but not anymore L7 resources. This second forwarding rule's name was nearly identical to the first forwarding rule so that the cleanup code will find it. 

As you can see from the test run below, the cleanup code deleted all the resources that the GLBC could not.

```log
...
Mar  5 18:35:53.112: INFO: Monitoring glbc's cleanup of gce resources:
k8s-fws-e2e-tests-ingress-upgrsde-0px85-static-ip--5f38ac0e2420 (forwarding rule)
k8s-tps-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e2420 (target-https-proxy)
k8s-um-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e24260 (url-map)
k8s-be-32331--5f38ac0e2426f796 (backend-service)
k8s-be-32613--5f38ac0e2426f796 (backend-service)
k8s-be-32331--5f38ac0e2426f796 (http-health-check)
k8s-be-32613--5f38ac0e2426f796 (http-health-check)
k8s-ig--5f38ac0e2426f796 (instance-group)
k8s-ssl-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e2420 (ssl-certificate)

STEP: Performing final delete of any remaining resources
Mar  5 18:35:54.055: INFO: Deleting forwarding-rules: k8s-fws-e2e-tests-ingress-upgrsde-0px85-static-ip--5f38ac0e2420
Mar  5 18:36:06.945: INFO: Deleting target-https-proxies: k8s-tps-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e2420
Mar  5 18:36:14.301: INFO: Deleting url-map: k8s-um-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e24260
Mar  5 18:36:18.309: INFO: Deleting backed-service: k8s-be-32331--5f38ac0e2426f796
Mar  5 18:36:22.112: INFO: Deleting backed-service: k8s-be-32613--5f38ac0e2426f796
Mar  5 18:36:26.192: INFO: Deleting http-health-check: k8s-be-32331--5f38ac0e2426f796
Mar  5 18:36:29.846: INFO: Deleting http-health-check: k8s-be-32613--5f38ac0e2426f796
Mar  5 18:36:33.722: INFO: Deleting instance-group: k8s-ig--5f38ac0e2426f796
Mar  5 18:36:37.762: INFO: Deleting ssl-certificate: k8s-ssl-e2e-tests-ingress-upgrade-0px85-static-ip--5f38ac0e2420
STEP: No resources leaked.
Mar  5 18:36:46.441: INFO: Deleting addresses: e2e-tests-ingress-upgrade-0px85-static-ip
Mar  5 18:36:53.902: INFO: L7 controller failed to delete all cloud resources on time. timed out waiting for the condition
...
```
2017-03-06 17:16:17 -08:00
Kubernetes Submit Queue d50a59ec66 Merge pull request #42080 from enisoc/controller-ref-ss
Automatic merge from submit-queue (batch tested with PRs 42080, 41653, 42598, 42555)

StatefulSet: Respect ControllerRef

**What this PR does / why we need it**:

This is part of the completion of the [ControllerRef](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md) proposal. It brings StatefulSet into full compliance with ControllerRef. See the individual commit messages for details.

**Which issue this PR fixes**:

Fixes #36859

**Special notes for your reviewer**:

**Release note**:

```release-note
StatefulSet now respects ControllerRef to avoid fighting over Pods. At the time of upgrade, **you must not have StatefulSets with selectors that overlap** with any other controllers (such as ReplicaSets), or else [ownership of Pods may change](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md#upgrading).
```
cc @erictune @kubernetes/sig-apps-pr-reviews
2017-03-06 17:16:10 -08:00
Anthony Yeh f020c9ae6c Deployment: Update overlapping e2e test for ControllerRef. 2017-03-06 15:12:07 -08:00
Anthony Yeh 887acb07ea Deployment/util: Filter by ControllerRef.
The list functions in deployment/util are used outside the Deployment
controller itself. Therefore, they don't do actual adoption/orphaning.
However, they still need to avoid listing things that don't belong.
2017-03-06 15:12:06 -08:00
Anthony Yeh d8a0017559 StatefulSet: Use synchronous Delete of SS in e2e.
This is needed because we changed the default for SS to
OrphanDependents.
2017-03-06 09:46:04 -08:00
Seth Jennings ccd87fca3f kubelet: add cgroup manager metrics 2017-03-06 08:53:47 -06:00
Maciej Szulik 6c0cd24865 Drop NewDefultGroupVersionFramework in favor of client automatically handling proper GroupVersions 2017-03-06 12:12:38 +01:00
Nick Sardo 6bba665c81 Cleanup gce resources from ingress upgrade tests 2017-03-05 18:38:45 -08:00
Kubernetes Submit Queue f750721b84 Merge pull request #42367 from kow3ns/ss-flakes
Automatic merge from submit-queue (batch tested with PRs 42443, 38924, 42367, 42391, 42310)

Fix StatefulSet e2e flake

**What this PR does / why we need it**:
Fixes StatefulSet e2e flake by ensuring that the StatefulSet controller has observed the unreadiness of Pods prior to attempting to exercise scale functionality.
**Which issue this PR fixes** 
fixes #41889
```release-note
NONE
```
2017-03-03 18:08:42 -08:00
Kenneth Owens 08f95aff0f Fixes e2e flake by ensuring that the StatefulSet observes mutations to
Pods prior mutating the StatefulSet object to trigger sclaing.

Add ObervedVersion check
2017-03-03 11:24:47 -08:00
Kubernetes Submit Queue a2c7eb2754 Merge pull request #42266 from wojtek-t/fix_secret_tests_in_large_clusters
Automatic merge from submit-queue (batch tested with PRs 41306, 42187, 41666, 42275, 42266)

Bump test timeouts to make secret tests work in large clusters
2017-03-03 10:54:45 -08:00
Clayton Coleman f3d9ad678a
Handle potential conflicts on service endpoint test deletion
Cleanup should ignore conflicts and notfound errors
2017-03-03 01:29:57 -05:00
Anthony Yeh bdc4540e92 e2e/framework: Deflake cleanup of RCs in service_util.
The previous Get/Update pattern with no retry on resource version mismatch
would flake with the following error:

"the object has been modified; please apply your changes to the latest
version and try again"
2017-03-02 16:51:14 -08:00
Kubernetes Submit Queue effbad9ad0 Merge pull request #42020 from nicksardo/ingress_upgrade
Automatic merge from submit-queue (batch tested with PRs 41644, 42020, 41753, 42206, 42212)

Ingress-glbc upgrade tests

Basically #41676 but with some fixes and added comments.  @bprashanth has been away this week and it's desirable to have this in before code freeze.
2017-03-01 15:30:32 -08:00
Nick Sardo 9dc98f5296 upgrade tests: ingress/glbc 2017-03-01 11:52:55 -08:00
Michail Kargakis 3921a2a2f6 e2e: require minimum completeness in deployment tests 2017-03-01 19:25:12 +01:00
Wojciech Tyczynski 9db6aa50f0 Make secret tests work in large clusters 2017-03-01 17:58:30 +01:00
Kubernetes Submit Queue ed479163fa Merge pull request #42116 from vishh/gpu-experimental-support
Automatic merge from submit-queue

Extend experimental support to multiple Nvidia GPUs

Extended from #28216

```release-note
`--experimental-nvidia-gpus` flag is **replaced** by `Accelerators` alpha feature gate along with  support for multiple Nvidia GPUs. 
To use GPUs, pass `Accelerators=true` as part of `--feature-gates` flag.
Works only with Docker runtime.
```

1. Automated testing for this PR is not possible since creation of clusters with GPUs isn't supported yet in GCP.
1. To test this PR locally, use the node e2e.
```shell
TEST_ARGS='--feature-gates=DynamicKubeletConfig=true' FOCUS=GPU SKIP="" make test-e2e-node
```

TODO:

- [x] Run manual tests
- [x] Add node e2e
- [x] Add unit tests for GPU manager (< 100% coverage)
- [ ] Add unit tests in kubelet package
2017-03-01 04:52:50 -08:00
Chao Xu 31cb266340 tests 2017-02-28 23:05:41 -08:00
Vishnu kannan 318f4e102a adding an e2e for GPUs
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-02-28 13:42:08 -08:00
Kubernetes Submit Queue 5891fa021b Merge pull request #42193 from mml/etcd-upgrade-test
Automatic merge from submit-queue

Change etcd test image to 3.0.17.

Follow-up from #41540
2017-02-27 23:10:45 -08:00
Matt Liggett f37f7d42ba Change etcd test image to 3.0.17. 2017-02-27 14:23:14 -08:00
Kenneth Owens 116eda0909 Implements an upgrade test for Job. Job common functionality is refactored into the framework package to allow for code sharing between e2e and upgrade tests. 2017-02-27 09:24:59 -08:00
Kubernetes Submit Queue 37acb8423c Merge pull request #42019 from msau42/pv_upgrade_test
Automatic merge from submit-queue (batch tested with PRs 41962, 42055, 42062, 42019, 42054)

PV upgrade test

**What this PR does / why we need it**:
This PR adds a PV upgrade test to the new upgrade test framework.  Before, this test had to be done manually.  Currently the upgrade test framework only works on the GCE environment, so I plan to add support for other providers later.  In order to write the test, I had to modify and refactor some volume test util libraries.  I reran the impacted tests to make sure they still passed.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:
It's probably easier to review the two commits separately.  I split it up into the refactor changes, and the upgrade test changes.

**Release note**:

NONE


cc @saad-ali @krousey
2017-02-27 00:16:58 -08:00
Kubernetes Submit Queue f1fecc3074 Merge pull request #41964 from sttts/sttts-upgrade-test-sysctl
Automatic merge from submit-queue (batch tested with PRs 35408, 41915, 41992, 41964, 41925)

e2e/upgrade: add sysctls

Add sysctl upgrade tests.

How can these be effectively tested?
2017-02-26 18:07:59 -08:00
Kubernetes Submit Queue 16f87fe7d8 Merge pull request #40952 from dashpole/premption
Automatic merge from submit-queue (batch tested with PRs 41994, 41969, 41997, 40952, 40576)

Guaranteed admission for Critical Pods

This is the first step in implementing node-level preemption for critical pods.
It defines the AdmissionFailureHandler interface, which allows callers, like the kubelet, to define how failed predicates are handled, and take steps to correct failures if necessary.
In the kubelet's implementation, it triggers preemption if the pod being admitted is critical, and if the only failed predicates are InsufficientResourceErrors, then it prempts (not yet implemented) other other pods to allow admission of the critical pod.

cc: @vishh
2017-02-26 12:57:59 -08:00
Kubernetes Submit Queue f2c2791e87 Merge pull request #41852 from mml/etcd-upgrade-test
Automatic merge from submit-queue (batch tested with PRs 42106, 42094, 42069, 42098, 41852)

Write etcd_upgrade test.

Part of the fix for #40636
2017-02-26 04:34:02 -08:00
Zihong Zheng 7eb9b81d67 Updates test/e2e/addon_update.go to match addon-manager's new behavior 2017-02-24 16:44:21 -08:00
Matt Liggett 281a57aeaa Add etcd upgrade test. 2017-02-24 10:40:04 -08:00
gmarek 6637592b1d generated 2017-02-24 09:24:33 +01:00
gmarek d88af7806c NodeController sets NodeTaints instead of deleting Pods 2017-02-24 09:24:33 +01:00
Michelle Au c522b77505 Enhance and refactor volume test util functions
Fix bash command's execution in MakePod().
Add isPriviledged as a parameter to MakePod().
Move PD utils to pv_util.go

Ran all the tests in pd.go, persistent_volumes.go,
persistent_volumes-disruptive.go.

These changes are needed for the PV upgrade test I am working on.
2017-02-23 21:05:12 -08:00
deads2k bf30b0c71b add WATCH to list of excluded verbs for latency metrics 2017-02-23 15:47:28 -05:00
David Ashpole c58970e47c critical pods can preempt other pods to be admitted 2017-02-23 10:31:20 -08:00
Kubernetes Submit Queue e5c2d716d9 Merge pull request #41887 from liggitt/watch-verb
Automatic merge from submit-queue (batch tested with PRs 39855, 41433, 41567, 41887, 41652)

Use watch param instead of deprecated /watch/ prefix

Reopen of https://github.com/kubernetes/kubernetes/pull/41722 after reverted in https://github.com/kubernetes/kubernetes/pull/41774

Required https://github.com/kubernetes/kubernetes/pull/41797 to merge first

cc @deads2k @wojtek-t
2017-02-23 09:36:35 -08:00
Dr. Stefan Schimanski ea52f59c8c e2e/upgrade: add sysctls 2017-02-23 10:49:41 +01:00
Wojciech Tyczynski 59cec9c1a6 Merge pull request #41886 from wojtek-t/allow_for_disabling_log_dump
Add ability to disable dumping logs
2017-02-23 08:08:25 +01:00