Automatic merge from submit-queue
Node E2E: Fix wrong permission bit for log file.
When creating log for logs from journald, we use `0755` which is weird to me.
This PR changes it to `0666`.
Automatic merge from submit-queue
Add e2e test for Source IP preservation (pod to service cluster IP)
Working on #27134.
This PR added the e2e test for source ip preservation (pod to service cluster IP) in service.go. Test scenario described as below:
- Pick two different nodes in cluster.
- Create a clusterIP type service.
- Create an echo server, which echoes back client IP, to be part of the service.
- Create a client on another node. Hit the server through service cluster IP.
- Verify the source IP.
@girishkalele @freehan
Automatic merge from submit-queue
Bumped memory limit for resource consumer. Fixes#31591.
Bumped memory limit for resource consumer from 100 MB to 200 MB, increased request sizes so that the number of consumers will be smaller. Fixes#31591.
Automatic merge from submit-queue
Check server version when running scheduled job e2e tests
@janetkuo this is the promised followup to #30575 which is checking minimal server version when running ScheduledJob e2e's.
Automatic merge from submit-queue
update e2e test for federation replicaset controlelr
e2e test to verify replicases synced to underlying clusters.
@quinton-hoole @nikhiljindal @deepak-vij @kshafiee @mwielgus
Automatic merge from submit-queue
Return detailed error message for better debugging.
Try to provide more details error message for debugging when this flake #31561 happens again.
@pwittrock
Automatic merge from submit-queue
Bump nfs server image tag in pv e2e
Image modified in https://github.com/kubernetes/kubernetes/pull/30084 has been pushed, so we can bump this back up to enable the part where pod writes to server with restrictive permissions
Automatic merge from submit-queue
Adding namespaces/finalizer subresource to federation apiserver
Fixes https://github.com/kubernetes/kubernetes/issues/31077
cc @kubernetes/sig-cluster-federation @mwielgus
Verified manually that I can delete federation namespaces now.
Will update federation-namespace e2e test to verify that namespace is deleted fine
Automatic merge from submit-queue
test/e2e: fix flake in kubelet expose should create services for rc
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
NONE
```
Add a loop to retry the request to account for the TLS Timeout and API
credential error responses outlined by the flakes in #29227.
Fixes#29227
Automatic merge from submit-queue
Move wait for pressure to subside to AfterEach
so we still wait if the test part of the test for eviction order fails.
Automatic merge from submit-queue
Automated Docker Validation: Add automated docker performance validation.
Use the node e2e performance benchmark to automatically validate newest docker release.
And it can also help us validate docker 1.12 this release.
@dchen1107 @coufon
Automatic merge from submit-queue
Skip hazelcast E2E test
**What this PR does / why we need it**:
Skip hazelcast e2e test due to flakiness, which in turn is (most likely) due to a race condition upstream. See https://github.com/pires/hazelcast-kubernetes-bootstrapper/issues/9 for comments.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
https://github.com/kubernetes/kubernetes/issues/30672
**Special notes for your reviewer**:
This is temporary pending upstream changes.
**Release note**:
NONE
Automatic merge from submit-queue
Pick a specific GCI version by default on GCE.
Prior to this change, a K8s branch (master as well as release) was
pinned to a GCI milestone. It would pick up the latest GCI release on
that milestone at the time of cluster creation. The rationale was the
K8s users would automatically get the bug fixes in newer versions of
GCI. However in practice, it makes the runtime environment
non-deterministic, and lack of continuous e2e tests mean we would run
into breakages sooner or later.
With this change, each K8s release will pick a specific version
of GCI by default (similar to how the Debian-based container-vm gets used).
Users can override the default version through KUBE_GCE_MASTER_IMAGE and
KUBE_GCE_NODE_IMAGE environment variables.
We expect the default GCI version will be updated relatively frequently stay
updated with newer GCI releases. We can also automate the process to
automatically bump the hard-coded GCI version in future.
@vishh @adityakali can you please review?
cc @kubernetes/goog-image FYI
Automatic merge from submit-queue
Create a file from data stored in gobindata to fix kubectl-based exam…
Fix#31539
Adding 1.4 milestone as this fixes P0 flake issue (test completely broken by moving to gobindata). @pwittrock
cc @jayunit100
Prior to this change, a K8s branch (master as well as release) was
pinned to a GCI milestone. It would pick up the latest GCI release on
that milestone at the time of cluster creation. The rationale was the
K8s users would automatically get the bug fixes in newer versions of
GCI. However in practice, it makes the runtime environment
non-deterministic, and lack of continuous e2e tests mean we would run
into breakages sooner or later.
With this change, each K8s release will pick a specific version
of GCI by default (similar to how the Debian-based container-vm gets used).
Users can override the default version through KUBE_GCE_MASTER_IMAGE and
KUBE_GCE_NODE_IMAGE environment variables.
We expect the default GCI version will be updated relatively frequently stay
updated with newer GCI releases. We can also automate the process to
automatically bump the hard-coded GCI version in future.
Automatic merge from submit-queue
test Metadata.ClusterName not saved into etcd
integration test that verifies that we are not storing ClusterName in etcd.
#28921
@nikhiljindal @deepak-vij @quinton-hoole
This commit enables the dynamic kubelet configuration feature for the
node e2e Jenkins serial tests, which is where the test for dynamic kubelet
configuration currently runs.
This gives the node e2e test binary a --feature-gates flag that populates a
FeatureGates field on the test context. The value of this field is forwarded
to the kubelet's --feature-gates flag and is also used to populate the global
DefaultFeatureGate object so that statically-linked components see the same
feature gate settings as provided via the flag.
This means that you can set feature gates via the TEST_ARGS environment
variable when running node e2e tests. For example:
TEST_ARGS='--feature-gates=DynamicKubeletConfig=true'
Automatic merge from submit-queue
Improve e2e framework namespace deletion
This PR addresses the following:
1. framework would delete same namespace multiple times in subsequent test if ns failed to delete in previous test. this caused incorrect error reporting on subsequent tests. updated framework to call delete on all namespaces, and then always clear out namespaces to delete.
1. deleteNs was not verifying all content was removed from the namespace, just pods. this made flakes hard to debug in tests that did not create pods and whose namespace didnt delete. updated framework to verify all content is removed from namespace.
1. improved debugging output when namespace did not delete with more detail on what remains.
This should stop the test from flaking while we figure out why there is
a mismatch between the reported pressure condition and the eviction
manager's decision to evict due to memory pressure.
Automatic merge from submit-queue
Make a scheduler predicates test resiliant to race for scheduledCondi…
Fix#31341
@pwittrock - this fixes a P1 flake.
FYI @mwielgus - I don't think that the race that caused this flake can impact cluster autoscaling, but you probably should know about it.
cc @wojtek-t
Automatic merge from submit-queue
Node E2E: Move host info around test result.
Discussed offline with @yujuhong and @dchen1107. Currently, the node e2e result is organized as:
```
================================================================
Success Finished Host tmp-node-e2e-b6c375c7-e2e-node-containervm-v20160321-image Test Suite
{ginkgo-output}
{framework-error}
================================================================
```
This makes it painful to find which image the test is failing on. The `{ginkgo-output}` is usually quite long, so we have to scroll mouse up and down to find the host name.
This PR changes the test result to:
```
================================================================
Start Host tmp-node-e2e-b6c375c7-e2e-node-containervm-v20160321-image Test Suite
{ginkgo-output}
Success Finished Host tmp-node-e2e-b6c375c7-e2e-node-containervm-v20160321-image Test Suite
{framework-error}
================================================================
```
This is not perfect, but much better than before. We can easily find the host name under the ginkgo test result, like this:
```
================================================================
Start Host test-gci-dev-54-8743-3-0 Test Suite
Running Suite: E2eNode Suite
============================
Random Seed: 1472511489 - Will randomize all specs
Will run 0 of 131 specs
Running in parallel across 8 nodes
I0829 22:58:13.727764 1143 e2e_node_suite_test.go:98] Pre-pulling images so that they are cached for the tests.
I0829 22:58:28.562459 1143 e2e_node_suite_test.go:111] Node services started. Running tests...
I0829 22:58:28.562477 1143 e2e_node_suite_test.go:116] Wait for the node to be ready
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
------------------------------
I0829 22:58:29.742596 1143 e2e_node_suite_test.go:136] Stopping node services...
I0829 22:58:29.742650 1143 services.go:673] Killing process 1423 (services) with -TERM
I0829 22:58:29.860893 1143 e2e_node_suite_test.go:141] Tests Finished
Ran 0 of 131 Specs in 16.185 seconds
SUCCESS! -- 0 Passed | 0 Failed | 0 Pending | 131 Skipped
Ginkgo ran 1 suite in 19.939034297s
Test Suite Passed
Success Finished Host test-gci-dev-54-8743-3-0 Test Suite
================================================================
```
In a following PR, I'll print the test result from different images into different files to make it more clear for debugging. Mark v1.4 because this helps us de-flake test.
/cc @kubernetes/sig-node
Automatic merge from submit-queue
Explicitly delete pods in node performance tests
This PR explicitly deletes all created pods at the end in node e2e performance related tests.
The large number of pods may cause namespace cleanup times out (in #30878), therefore we explicitly delete all pods for cleaning up.
Automatic merge from submit-queue
Rewrite disruption e2e test to use versioned client.
This currently includes the changes from #31638. I will rebase once that is merged.
Automatic merge from submit-queue
increase latency and resource limit accroding to test results
This PR increases the latency limit of node e2e density test according to previous test results.
Fixed#30878
Automatic merge from submit-queue
e2e: log wget output on CheckConnectivityToHost error
Log output might help to diagnose e2e flakes, whether they are caused by dns issues or connection timeouts.
Might help with flake https://github.com/kubernetes/kubernetes/issues/28188.
Automatic merge from submit-queue
Fix make test-integration under OSX
Just throw in a doc.go so there's something compilable in the
test/integration/metrics directory.
Fixes#31587
Automatic merge from submit-queue
test/node-e2e: Update CoreOS update disabling
Previously in this saga... #25004
This disables update-engine and locksmithd with ignition instead of
cloud-init so that they're really totally 100% disabled. Our ignition guy promises.
Pretty much every way of disabling them with cloud-init is mildly racy.
Fixes#31633
I think @vishh can say "I told you so" after the comment on https://github.com/kubernetes/kubernetes/pull/30023#discussion-diff-73431324 .. he was right, but it turns out "stop" there doesn't really work either because of the mess that is cloud-init. Fortunately, converting our cloud-init to json and calling it "ignition" works quite well 😄
Testing done: I ssh'd in and verified that yes, they're disabled. I didn't wait on the e2e tests to pass, so we'll let this PR check that.
Automatic merge from submit-queue
Add e2e tests for Federated Ingress
This is e2e code for federation ingress controller.
Based util functions, add federation ingress e2e cases(reuse current k8s ones) and add logic to validate the result.
Automatic merge from submit-queue
E2E tests for the Source IP Preservation for LoadBalancers
Breaking out E2E changes from the main PR - these tests require the Alpha feature gate turned on for this feature otherwise they will consistently fail.
This disables update-engine and locksmithd with ignition instead of
cloud-init so that they're really totally 100% disabled.
Pretty much every way of disabling them with cloud-init is mildly racy.
Fixes#31633
Add a loop to retry the request to account for the TLS Timeout and API
credential error responses outlined by the flakes in #29227.
Fixes#29227
Signed-off-by: Jess Frazelle <me@jessfraz.com>
Automatic merge from submit-queue
add retries for add/update/remove taints on node in taints e2e
fixes taint update conflict in taints e2e by adding retries for add/update/remove taints on node.
ref #27655 and #31066
Automatic merge from submit-queue
Get network name via e2e environment.
This should work, right? I plan to pipe it through into the TestContext soon, just not today, and I'd like some test runtime over the weekend. Open to suggestions.
Automatic merge from submit-queue
[AppArmor] Promote AppArmor annotations to beta
Justification for promoting AppArmor to beta:
1. We will provide an upgrade path to GA
2. We don't anticipate any major changes to the design, and will continue to invest in this feature
3. We will thoroughly test it. If any serious issues are uncovered we can reevaluate, and we're committed to fixing them.
4. We plan to provide beta-level support for the feature anyway (responding quickly to issues).
Note that this does not include the yet-to-be-merged status annotation (https://github.com/kubernetes/kubernetes/pull/31382). I'd like to propose keeping that one alpha for now because I'm not sure the PodStatus is the right long-term home for it (I think a separate monitoring channel, e.g. cAdvisor, would be a better solution).
/cc @thockin @matchstick @erictune
Automatic merge from submit-queue
Delete the broken Celery+RabbitMQ example
The celery container used in the example is broken and does not come up
on most distros. The e2e test that was validating this example was not
detecting the fact the celery pod was crash looping.
I attempted to fix the celery container, but it proved to be tedious.
The proposed fix is to update the glibc version to >= 2.23. In this case
it requires updating the python docker image and the celery base image.
https://github.com/kubernetes/kubernetes/issues/31456
has more details.
I'm deleting the example instead of marking it as broken because a user
might overlook the broken warning and it should be trivial to revert
this PR if someone can fix the celery container.
This function does not actually attempt to connect to the docker daemon,
it just creates a client object that can be used to do so later. The old
name was confusing, as it implied that a failure to touch the docker daemon
could cause program termination (rather than just a failure to create the
client).