Automatic merge from submit-queue
Fix the equality checks for numeric values in cluster/gce/util.sh.
**What this PR does / why we need it**: This PR fixes an error in the gce shell scripts that results in inconsistent/incorrect behavior.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#37385
**Special notes for your reviewer**: This needs to be backported to 1.5 and 1.4.
@jszczepkowski
Automatic merge from submit-queue
kubelet: don't reject pods without adding them to the pod manager
kubelet relies on the pod manager as a cache of the pods in the apiserver (and
other sources) . The cache should be kept up-to-date even when rejecting pods.
Without this, kubelet may decide at any point to drop the status update
(request to the apiserver) for the rejected pod since it would think the pod no
longer exists in the apiserver.
This should fix#37658
Automatic merge from submit-queue
build: clean platform envs to prevent cross-contamination
**What this PR does / why we need it**: As I described in https://github.com/kubernetes/kubernetes/issues/37079#issuecomment-263733509, we are leaking platform compilation envs between build stages for different platforms in the non-parallel dockerized cross build. This PR uses a subshell for the non-parallel build, more closely matching the parallel build.
This also adds some logging, which had it existed previously, may have made the bug more immediately obvious.
**Which issue this PR fixes**: fixes#37079
cc @sebgoa @iTagir @saad-ali
Automatic merge from submit-queue
hack/e2e.go: Dump cluster logs in case of Up failure
**What this PR does / why we need it**: A failure in `Up` currently results in no attempt to grab cluster logs. This fixes that hole. (Sigh, a ton of holes for this diagnosis path.)
Automatic merge from submit-queue
remove checking mount point in cleanupOrphanedPodDirs
To avoid nfs hung problem, remove the mountpoint checking code in
cleanupOrphanedPodDirs(). This removal should still be safe because it checks whether there are still directories under pod's volume and if so, do not delete the pod directory.
Note: After removing the mountpoint check code in cleanupOrphanedPodDirs(), the directories might not be cleaned up in such situation.
1. delete pod, kubelet reconciler tries to unmount the volume directory successfully
2. before reconciler tries to delete the volume directory, kubelet gets retarted
3. since under pod directory, there are still volume directors exist (but not mounted), cleanupOrphanedPodDIrs() will not clean them up.
Will work on a follow up PR to solve above issue.
Automatic merge from submit-queue
Fix photon controller plugin to construct with correct PdID
**What this PR does / why we need it**:
This PR is to fix a mismatching of unmount path in photon volume plugin, which is resulted from the assigning volume spec name to persistent disk ID. Without this path, unmounting process is stalling in reconciler when a pod is deleted. Restart the same pod will see a mount failure because the previous unmounting is still going on.
The input variable of function ConstructVolumeSpec is the volume spec name instead of persistent disk ID. Previously the function directly construct new volume spec by assigning volume spec name to persistent disk ID, which will result in mismatching of mount path. The fix will find the pdID according to mount path and construct volume spec with the correct pdID.
I have tested the patch with back-to-back pod creation/deletion and mounting/unmounting of photon persistent disk volume source performs normal now.
This need to be cherry-picked to 1.5 release branch.
Automatic merge from submit-queue
Update Stateful Set example files for 1.5
1. Remove initialized annotation from statefulset examples
2. Update storage class annotation to beta in statefulset examples
3. Remove alpha limitation on PetSet in cassandra example
cc @erictune @foxish @kow3ns @enisoc @chrislovecnm @kubernetes/sig-apps
```release-note
NONE
```
Automatic merge from submit-queue
move parts of the mega generic run struct out
This splits the main `ServerRunOptions` into composeable pieces that are bindable separately and adds easy paths for composing servers to run delegating authentication and authorization.
@sttts @ncdc alright, I think this is as far as I need to go to make the composing servers reasonable to write. I'll try leaving it here
Automatic merge from submit-queue
controller manager refactors
The controller manager needs some significant cleanup. This starts us down the patch by respecting parameters like `stopCh`, simplifying discovery checks, removing unnecessary parameters, preventing unncessary fatals, and using our client builder.
@sttts @ncdc
here variable err returned by function NewForConfig(&eventClientConfig) if CreateAPIServerClientConfig() function is executed correctly. We should use else instead of if.
Or put block (if err != nil) to block (if err == nil) above
Automatic merge from submit-queue
support customize repository prefix of image through environment KUBE…
## Problem
kubeadm does not support customize repository prefix of image. this prevent us from using our own image repository to deploy k8s.
## Fix
make ```gcr.io/google_containers/ ``` be configurable.
ADD environment variable KUBE_REPO_PREFIX
Signed-off-by: yaoyao.xyy <yaoyao.xyy@alibaba-inc.com>
kubelet relies on the pod manager as a cache of the pods in the apiserver (and
other sources) . The cache should be kept up-to-date even when rejecting pods.
Without this, kubelet may decide at any point to drop the status update
(request to the apiserver) for the rejected pod since it would think the pod no
longer exists in the apiserver.
Also check if the pod to-be-admitted has terminated or not. In the case where
it has terminated, skip the admission process completely.
Automatic merge from submit-queue
Fix package aliases to follow golang convention
Some package aliases are not not align with golang convention https://blog.golang.org/package-names. This PR fixes them. Also adds a verify script and presubmit checks.
Fixes#35070.
cc/ @timstclair @Random-Liu
Automatic merge from submit-queue
Collect logs for dead kubelets too
Collect logs via journalctl if journalctl is installed, rather than only if
kubelet.service is running. The old way resulted in us losing logs any
time the kubelet was failing. This, of course, breaks on a node if
someone decided to install journalctl but not use it. But that is not
the case on any of the images used by cluster-level tests at present.
^^^^FYI @Random-Liu not sure if `which journalctl` implies that journalctl is actually used on all of the nodes we test in the node-e2e suites. This may be of consequence if we move to using `cluster/log-dump.sh` to scrape logs for node-e2e.
P0 because this is somewhat in the way of debugging https://github.com/kubernetes/kubernetes/issues/33882
@jessfraz @saad-ali This should be cherry-picked to 1.4 and 1.5 as well.
Automatic merge from submit-queue
When --grace-period=0 is provided, wait for deletion
The grace-period is automatically set to 1 unless --force is provided, and the client waits until the object is deleted.
This preserves backwards compatibility with 1.4 and earlier. It does not handle scenarios where the object is deleted and a new object is created with the same name because we don't have the initial object loaded (and that's a larger change for 1.5).
Fixes#37117 by relaxing the guarantees provided.
```release-note
When deleting an object with `--grace-period=0`, the client will begin a graceful deletion and wait until the resource is fully deleted. To force deletion, use the `--force` flag.
```
Automatic merge from submit-queue
log-dump: Change USE_KUBECTL path to instead call out to a custom function
**What this PR does / why we need it**: The LOG_DUMP_USE_KUBECTL path is fine, once the cluster is up. However, we've had a continuous low-grade Up flake in the kops builds, so I'd like to grab logs using the aws CLI.
This makes log-dump.sh extensible, so you can do:
```
function log-dump-custom-get-instances() { ... }
export -f log-dump-custom-get-instances
go run hack/e2e.go ...
```
Automatic merge from submit-queue
Fixes Addon Manager's pruning issue for old Deployments
Fixes#37641.
Attaches the `last-applied`annotations to the existing Deployments for pruning.
Below images are built and pushed:
- gcr.io/google-containers/kube-addon-manager:v6.1
- gcr.io/google-containers/kube-addon-manager-amd64:v6.1
- gcr.io/google-containers/kube-addon-manager-arm:v6.1
- gcr.io/google-containers/kube-addon-manager-arm64:v6.1
- gcr.io/google-containers/kube-addon-manager-ppc64le:v6.1
@mikedanese
cc @saad-ali @krousey
Automatic merge from submit-queue
Fix infinte loop in federated ingress controller
Previously ingress controller was constantly scheduling reconcilation, even if no updates were needed. That behavior creates a big mess in the logs and consumes resources.
This pr also fixes the stop function for federated ingress controller.
cc: @nikhiljindal @madhusudancs
Automatic merge from submit-queue
Revision handling in federated deployment controller
Deployment controller in regular kubernetes automatically adds an annotation in deployment. This causes a bit of confusion in controller and tests. This PR skips revision annotation in checks. In the next K8S release we will need to have better support for deployment revisions.
Helps with #36588
cc: @nikhiljindal @madhusudancs
Automatic merge from submit-queue
fix typo in `kubectl proxy` command line help
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**: improves docs
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: none
**Special notes for your reviewer**: doc only
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
```
(docs only) fixed port from 8011 to 8001 (the default) because in that particular line no specific port is specified and thus the default is going to be used.
Hardcoded known stable version will be returned if user
didn't request specific version and kubeadm for some reason
not able to fetch latest stable information from release servers.
For now, fallback version is v1.4.6
Now, defaults can be pointing to "stable" and users will always get
latest available stable build of Kubernetes via kubeadm.
There is no need anymore to hardcode version string inside kubeadm
binary.
It is also possible to use labels like "latest" or point to exact
branch: "stable-1.4"
Collect logs via journalctl if journalctl is installed, rather than only if
kubelet.service is running. The old way resulted in us losing logs any
time the kubelet was failing. This, of course, breaks on a node if
someone decided to install journalctl but not use it. But that is not
the case on any of the images used by cluster-level tests at present.
Automatic merge from submit-queue
Stop deleting underlying services when federation service is deleted
Fixes https://github.com/kubernetes/kubernetes/issues/36799
Fixing federation service controller to not delete services from underlying clusters when federated service is deleted.
None of the federation controller should do this unless explicitly asked by the user using DeleteOptions. This is the only federation controller that does that.
cc @kubernetes/sig-cluster-federation @madhusudancs
```release-note
federation service controller: stop deleting services from underlying clusters when federated service is deleted.
```
Automatic merge from submit-queue
hold namespaces briefly before processing deletion
possible fix for #36891
in HA scenarios (either HA apiserver or HA etcd), it is possible for deletion of resources from namespace cleanup to race with creation of objects in the terminating namespace
HA master timeline:
1. "delete namespace n" API call goes to apiserver 1, deletion timestamp is set in etcd
2. namespace controller observes namespace deletion, starts cleaning up resources, lists deployments
3. "create deployment d" API call goes to apiserver 2, gets persisted to etcd
4. apiserver 2 observes namespace deletion, stops allowing new objects to be created
5. namespace controller finishes deleting listed deployments, deletes namespace
HA etcd timeline:
1. "create deployment d" API call goes to apiserver, gets persisted to etcd
2. "delete namespace n" API call goes to apiserver, deletion timestamp is set in etcd
3. namespace controller observes namespace deletion, starts cleaning up resources, lists deployments
4. list call goes to non-leader etcd member that hasn't observed the new deployment or the deleted namespace yet
5. namespace controller finishes deleting the listed deployments, deletes namespace
In both cases, simply waiting to clean up the namespace (either for etcd members to observe objects created at the last second in the namespace, or for other apiservers to observe the namespace move to terminating phase and disallow additional creations) resolves the issue
Possible other fixes:
* do a second sweep of objects before deleting the namespace
* have the namespace controller check for and clean up objects in namespaces that no longer exist
* ...?
Automatic merge from submit-queue
make drain retry forever and use a new graceful period
Implemented the 1st approach according to https://github.com/kubernetes/kubernetes/issues/37460#issuecomment-263437516
1) Make drain retry forever if the error is always Too Many Requests (429) generated by Pod Disruption Budget.
2) Use a new graceful period per #37460
3) Update the message printed out when successfully deleting or evicting a pod.
fixes#37460
cc: @davidopp @erictune