Automatic merge from submit-queue (batch tested with PRs 38294, 37009, 36778, 38130, 37835)
Only configure basic auth on gci if KUBE_USER and KUBE_PASSWORD are specified.
This should not change the existing flow when KUBE_USER/KUBE_PASSWORD are specified.
It makes not specifying those a valid option that means "don't turn on basic auth".
I only did it for cluster/gce/gci for now, but others should be somewhat similar.
Automatic merge from submit-queue
add a configuration for kubelet to register as a node with taints
and deprecate --register-schedulable
ref #28687#29178
cc @dchen1107 @davidopp @roberthbailey
Automatic merge from submit-queue (batch tested with PRs 38049, 37823, 38000, 36646)
Fixes kubedns logging level
We should have bumped up the verbose level to v=2 for `kubedns` after cutting the last release, as the TODO indicates.
@bowei @thockin
Automatic merge from submit-queue (batch tested with PRs 37692, 37785, 37647, 37941, 37856)
Use unified gcp fluentd image for gci and cvm
Follow-up of https://github.com/kubernetes/kubernetes/pull/37681
Actually unify the pod specs for CVM and GCI, to simplify the configuration
CC @piosz
Automatic merge from submit-queue
Exit with error if <version number or publication> is not the final parameter.
getopts stops parsing flags after a non-flag, non-arg-to-a-flag parameter.
This commit adds an error message if any parameters are passed after the
first non-flag, non-arg-to-a-flag parameter in the arg list.
getopts stops parsing flags after a non-flag, non-arg-to-a-flag parameter.
This commit adds an error message if any parameters are passed after the
first non-flag, non-arg-to-a-flag parameter in the arg list.
Automatic merge from submit-queue
Unify fluentd-gcp configurations
There're two different configs and two different pod specs for fluentd agent for GCL: one for GCI and one for CVM. This PR makes it possible to use only one config and only one pod spec.
CC @piosz
Automatic merge from submit-queue
Don't update gcloud in cluster/*/util.sh
**What this PR does / why we need it**:
Removes automatic gcloud update commands from `cluster/gce/util.sh`, `cluster/gke/util.sh`. Setting env `KUBE_PROMPT_FOR_UPDATE=y` will update required components, otherwise it will only verify that required components are present and at a minimum required version.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#35834
**Special notes for your reviewer**:
Inline python is nasty but I *really* don't want to do version comparison in bash. Open to other suggestions for verifying required version of gcloud components. cc @kubernetes/sig-cluster-lifecycle, @kubernetes/sig-testing
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
`kube-up.sh`/`kube-down.sh` no longer force update gcloud for provider=gce|gke.
```
Automatic merge from submit-queue
Set strategy spec for kube-dns to support zero downtime rolling update
From #37728 and coreos/kube-aws#111.
Set `maxUnavailable` to 0 to prevent DNS service outage during update when the replica number is only 1.
Also keeps all kube-dns yaml files in sync.
@bowei @thockin
Automatic merge from submit-queue
Fix the equality checks for numeric values in cluster/gce/util.sh.
**What this PR does / why we need it**: This PR fixes an error in the gce shell scripts that results in inconsistent/incorrect behavior.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#37385
**Special notes for your reviewer**: This needs to be backported to 1.5 and 1.4.
@jszczepkowski
Automatic merge from submit-queue
Update Stateful Set example files for 1.5
1. Remove initialized annotation from statefulset examples
2. Update storage class annotation to beta in statefulset examples
3. Remove alpha limitation on PetSet in cassandra example
cc @erictune @foxish @kow3ns @enisoc @chrislovecnm @kubernetes/sig-apps
```release-note
NONE
```
Automatic merge from submit-queue
Collect logs for dead kubelets too
Collect logs via journalctl if journalctl is installed, rather than only if
kubelet.service is running. The old way resulted in us losing logs any
time the kubelet was failing. This, of course, breaks on a node if
someone decided to install journalctl but not use it. But that is not
the case on any of the images used by cluster-level tests at present.
^^^^FYI @Random-Liu not sure if `which journalctl` implies that journalctl is actually used on all of the nodes we test in the node-e2e suites. This may be of consequence if we move to using `cluster/log-dump.sh` to scrape logs for node-e2e.
P0 because this is somewhat in the way of debugging https://github.com/kubernetes/kubernetes/issues/33882
@jessfraz @saad-ali This should be cherry-picked to 1.4 and 1.5 as well.
Automatic merge from submit-queue
log-dump: Change USE_KUBECTL path to instead call out to a custom function
**What this PR does / why we need it**: The LOG_DUMP_USE_KUBECTL path is fine, once the cluster is up. However, we've had a continuous low-grade Up flake in the kops builds, so I'd like to grab logs using the aws CLI.
This makes log-dump.sh extensible, so you can do:
```
function log-dump-custom-get-instances() { ... }
export -f log-dump-custom-get-instances
go run hack/e2e.go ...
```
Automatic merge from submit-queue
Fixes Addon Manager's pruning issue for old Deployments
Fixes#37641.
Attaches the `last-applied`annotations to the existing Deployments for pruning.
Below images are built and pushed:
- gcr.io/google-containers/kube-addon-manager:v6.1
- gcr.io/google-containers/kube-addon-manager-amd64:v6.1
- gcr.io/google-containers/kube-addon-manager-arm:v6.1
- gcr.io/google-containers/kube-addon-manager-arm64:v6.1
- gcr.io/google-containers/kube-addon-manager-ppc64le:v6.1
@mikedanese
cc @saad-ali @krousey
Collect logs via journalctl if journalctl is installed, rather than only if
kubelet.service is running. The old way resulted in us losing logs any
time the kubelet was failing. This, of course, breaks on a node if
someone decided to install journalctl but not use it. But that is not
the case on any of the images used by cluster-level tests at present.
Automatic merge from submit-queue
kubemark: add KUBEMARK_NUM_NODES and KUBEMARK_MASTER_SIZE config
A lot of test infra scripts are using these two parameters and repeatedly set NUM_NODES and MASTER_SIZE before running kubemark. When we try to use those scripts, we need to manually set these again and again.
It would come handy if kubemark config could take these into account and reduce duplication.
Automatic merge from submit-queue
Set Dashboard UI version to v1.5.0-beta1
There will be one more such PR coming for 1.5 release. In one week.
Setting release note to none. Will set notes for final version PR.
Github release info:
https://github.com/kubernetes/dashboard/releases/tag/v1.5.0-beta1
Automatic merge from submit-queue
Deploy a default StorageClass instance on AWS and GCE
This needs a newer kubectl in kube-addons-manager container. It's quite tricky to test as I cannot push new container image to gcr.io and I must copy the newer container manually.
cc @kubernetes/sig-storage
**Release note**:
```release-note
Kubernetes now installs a default StorageClass object when deployed on AWS, GCE and
OpenStack with kube-up.sh scripts. This StorageClass will automatically provision
a PeristentVolume in corresponding cloud for a PersistentVolumeClaim that cannot be
satisfied by any existing matching PersistentVolume in Kubernetes.
To override this default provisioning, administrators must manually delete this default StorageClass.
```
The LOG_DUMP_USE_KUBECTL path is fine, once the cluster is up.
However, we've had a continuous low-grade Up flake in the kops builds,
so I'd like to grab logs using the aws CLI.
This makes log-dump.sh extensible, so you can do:
function log_dump_custom_get_instances() { ... }
export -f log_dump_custom_get_instances
go run hack/e2e.go ...
Automatic merge from submit-queue
Use gsed on the mac.
**What this PR does / why we need it**: Fixes node upgrades when run from a mac
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#37474
**Special notes for your reviewer**:
Automatic merge from submit-queue
Bumps up Addon Manager to v6.0 with full support of kubectl apply
Below images are built and pushed:
- gcr.io/google-containers/kube-addon-manager:v6.0
- gcr.io/google-containers/kube-addon-manager-amd64:v6.0
- gcr.io/google-containers/kube-addon-manager-arm:v6.0
- gcr.io/google-containers/kube-addon-manager-arm64:v6.0
- gcr.io/google-containers/kube-addon-manager-ppc64le:v6.0
The actual change made is upgrade kubectl version from `v1.5.0-alpha.1` to `v1.5.0-beta.1`, which is released today.
@mikedanese
@saad-ali This need to get into 1.5 because Addon Manager v6.0-alpha.1 (currently in used) does not have full support of `kubectl apply --prune`.
Automatic merge from submit-queue
fix: elasticsearch template mapping to parse kubernetes.labels
**What this PR does / why we need it**:
This PR updates the field mappings for the elasticsearch template that ships with the EFK stack implementation.
Specifically, elasticsearch cannot parse the `kubernetes.labels` object because it attempts to treat it as a string and produces an error. This update treats `kubernetes.labels` as an object and all of the properties within as a string, allowing accurate indexing and allowing users in kibana to search on `kubernetes.labels.*`.
**Release note**:
```release-note
Fluentd/Elastisearch add-on: correctly parse and index kubernetes labels
```
Automatic merge from submit-queue
Fix salt master check using hard coded string
<!-- Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->
**What this PR does / why we need it**:
**Which issue this PR fixes**
This is for vsphere only.
If var $INSTANCE_PREFIX is changed in cluster/vsphere/config-default.sh, then salt master check will fail due to the hard coded string "kubernetes-master". The fix uses $MASTER_NAME instead.
**Special notes for your reviewer**:
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
```
Automatic merge from submit-queue
Add missing variable to openstack provider
`FIXED_NETWORK_CIDR` environment variable is mandatory by
openstack-heat kubernetes provider, but it's missing as
default value. Adding this environment variable is helpful
to build kubernetes cluster using openstack-heat provider.
So this patch adds it.
Automatic merge from submit-queue
Fix an else branch in configure-helper.sh
**What this PR does / why we need it**: bug fix for upgrade.sh needed in 1.5
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#37257
Automatic merge from submit-queue
Use shasum if sha1sum doesn't exist in the path
**What this PR does / why we need it**: bug fix for running upgrade.sh from a mac
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#37355
Automatic merge from submit-queue
Modify GCI mounter to enable NFSv3
In order to make NFSv3 work, mounter needs to start rpcbind daemon. This
change modify mounter's Dockerfile and mounter script to start the
rpcbind daemon if it is not running on the host.
After this change, need to make push the image and update the sha number in Changelog.
Automatic merge from submit-queue
Fixed e2e tests for HA master.
Set of fixes that allows HA master e2e tests to pass for removal/addition master replicas.
The summary of changes:
- fixed host name in etcd certs,
- added cluster validation after kube-down,
- fixed the number of master replicas in cluster validation,
- made MULTIZONE=true required for HA master deployments, ensured we correctly handle MULTIZONE=true when user wants to create HA master but not kubelets in multiple zones,
- extended verification of master replicas in HA master e2e tests.
`FIXED_NETWORK_CIDR` environment variable is mandatory by
openstack-heat kubernetes provider, but it's missing as
default value. Adding this environment variable is helpful
to build kubernetes cluster using openstack-heat provider.
So this patch adds it.
In order to make NFSv3 work, mounter needs to start rpcbind daemon. This
change modify mounter's Dockerfile and mounter script to start the
rpcbind daemon if it is not running on the host.
After this change, need to make push the image and update the sha number in Changelog.
Automatic merge from submit-queue
Add kube-proxy logs to fluentd configs
Related to https://github.com/kubernetes/kubernetes/issues/37107
Makes fluentd collect logs from kube-proxy. It's completely backward-compatible change that does not cause problems currently, so I suggest not to bump version.
cc @piosz
Automatic merge from submit-queue
Fix etcd unavailable error when performing kube-up.sh for Ubuntu prov…
**What this PR does / why we need it**:
This PR fixes 'etcd unavailable error' when performing kube-up.sh for Ubuntu provider
**Which issue this PR fixes**
fixes: https://github.com/kubernetes/kubernetes/issues/36340
Automatic merge from submit-queue
Eviction Thresholds Update
Sets the defaults for the eviction-hard threshold for GCE based on what we were using during testing: "memory.available<250Mi,nodefs.available<10%,nodefs.inodesFree<5%".
Sets flags for e2e tests to use eviction-minimum-reclaim: "nodefs.available<5%,nodefs.inodesFree<5%"
this fixes#32537
- Adds command line flags --config-map, --config-map-ns.
- Fixes 36194 (https://github.com/kubernetes/kubernetes/issues/36194)
- Update kube-dns yamls
- Update bazel (hack/update-bazel.sh)
- Update known command line flags
- Temporarily reference new kube-dns image (this will be fixed with
a separate commit when the DNS image is created)
Automatic merge from submit-queue
Add e2e test for statefulset updates
Verify that one can (manually) update statefulset template
cc @erictune @foxish @kow3ns @kubernetes/sig-apps
Automatic merge from submit-queue
K8s 1.5 keeps container-vm as default node image on GCE
There is a concern that some GCE users may be running automation that
(a) turns up ephemeral clusters and (b) always uses the latest K8s
release. If any of these workloads fall outside the set supported on
GCI, cutting the release will break the automation. We are therefore
delaying this change until we have provided sufficient warning.
```release-note
K8s 1.5 keeps container-vm as the default node image on GCE for backwards compatibility reasons. Please beware that container-vm is officially deprecated and you should replace it with GCI if at all possible. You can review the migration guide here for more detail: https://cloud.google.com/container-engine/docs/node-image-migration
```
/cc @aronchick @vishh @roberthbailey
There is a concern that some GCE users may be running automation that
(a) turns up ephemeral clusters and (b) always uses the latest K8s
release. If any of these workloads fall outside the set supported on
GCI, cutting the release will break the automation. We are therefore
delaying this change until we have provided sufficient warning.
Automatic merge from submit-queue
Update Dashboard UI version to 1.4.2
**What this PR does / why we need it**:
Dashboard 1.4.2 contains a fix for an XSS security bug, so I think it would be prudent to update the Dashboard version 'shipped' with kubernetes to this version
**Special notes for your reviewer**:
**Release note**:
- Updated dashboard version in addons to 1.4.2```
Automatic merge from submit-queue
SSL certificates for etcd cluster.
Added generation of SSL certificates for etcd cluster's internal communication.
Turned on on GCE (gci, trusty and debain).
Automatic merge from submit-queue
Fix Docker Registry image version to 2.5.1
`registry:2` is constantly being updated with new versions. This means there's a possibility that the image may be changed unintentionally. For example, when the Pod is rescheduled on nodes that does not already have the image, depending on the time of the pull, `registry:2` may result in different images.
Fix this to the latest `registry:2.5.1` instead to avoid this problem.
@uluyol @freehan
Automatic merge from submit-queue
Change master to advertise external IP in kubernetes service.
Change master to advertise external IP in kubernetes service.
In effect, in HA mode in case of multiple masters, IP of external load
balancer will be advertise in kubernetes service.
Dashboard 1.4.2 contains a fix for an XSS security bug, so I think it would be prudent to update the Dashboard version 'shipped' with kubernetes to this version
Automatic merge from submit-queue
Migrates addons from RCs to Deployments
Fixes#33698.
Below addons are being migrated:
- kube-dns
- GLBC default backend
- Dashboard UI
- Kibana
For the new deployments, the version suffixes are removed from their names. Version related labels are also removed because they are confusing and not needed any more with regard to how Deployment and the new Addon Manager works.
The `replica` field in `kube-dns` Deployment manifest is removed for the incoming DNS horizontal autoscaling feature #33239.
The `replica` field in `Dashboard` Deployment manifest is also removed because the rescheduler e2e test is manually scaling it.
Some resource limit related fields in `heapster-controller.yaml` are removed, as they will be set up by the `addon resizer` containers. Detailed reasons in #34513.
Three e2e tests are modified:
- `rescheduler.go`: Changed to resize Dashboard UI Deployment instead of ReplicationController.
- `addon_update.go`: Some namespace related changes in order to make it compatible with the new Addon Manager.
- `dns_autoscaling.go`: Changed to examine kube-dns Deployment instead of ReplicationController.
Both of above two tests passed on my own cluster. The upgrade process --- from old Addons with RCs to new Addons with Deployments --- was also tested and worked as expected.
The last commit upgrades Addon Manager to v6.0. It is still a work in process and currently waiting for #35220 to be finished. (The Addon Manager image in used comes from a non-official registry but it mostly works except some corner cases.)
@piosz @gmarek could you please review the heapster part and the rescheduler test?
@mikedanese @thockin
cc @kubernetes/sig-cluster-lifecycle
---
Notes:
- Kube-dns manifest still uses *-rc.yaml for the new Deployment. The stale file names are preserved here for receiving faster review. May send out PR to re-organize kube-dns's file names after this.
- Heapster Deployment's name remains in the old fashion(with `-v1.2.0` suffix) for avoiding describe this upgrade transition explicitly. In this way we don't need to attach fake apply labels to the old Deployments.
Automatic merge from submit-queue
Use new fluentd-gcp image version
In #35618 we used new version of fluentd agent, which includes new version of jeamalloc, allowing us to use it.
Additionally, we came up with a hacky way to encourage Ruby GC to be invoked more often by using RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR variable.
@piosz
Automatic merge from submit-queue
Better messaging for missing volume binaries on host
**What this PR does / why we need it**:
When mount binaries are not present on a host, the error returned is a generic one.
This change is to check the mount binaries before the mount and return a user-friendly error message.
This change is specific to GCI and the flag is experimental now.
https://github.com/kubernetes/kubernetes/issues/36098
**Release note**:
Introduces a flag `check-node-capabilities-before-mount` which if set, enables a check (`CanMount()`) prior to mount operations to verify that the required components (binaries, etc.) to mount the volume are available on the underlying node. If the check is enabled and `CanMount()` returns an error, the mount operation fails. Implements the `CanMount()` check for NFS.
Sample output post change :
rkouj@rkouj0:~/go/src/k8s.io/kubernetes$ kubectl describe pods
Name: sleepyrc-fzhyl
Namespace: default
Node: e2e-test-rkouj-minion-group-oxxa/10.240.0.3
Start Time: Mon, 07 Nov 2016 21:28:36 -0800
Labels: name=sleepy
Status: Pending
IP:
Controllers: ReplicationController/sleepyrc
Containers:
sleepycontainer1:
Container ID:
Image: gcr.io/google_containers/busybox
Image ID:
Port:
Command:
sleep
6000
QoS Tier:
cpu: Burstable
memory: BestEffort
Requests:
cpu: 100m
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment Variables:
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
data:
Type: NFS (an NFS mount that lasts the lifetime of a pod)
Server: 127.0.0.1
Path: /export
ReadOnly: false
default-token-d13tj:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-d13tj
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
7s 7s 1 {default-scheduler } Normal Scheduled Successfully assigned sleepyrc-fzhyl to e2e-test-rkouj-minion-group-oxxa
6s 3s 4 {kubelet e2e-test-rkouj-minion-group-oxxa} Warning FailedMount Unable to mount volume kubernetes.io/nfs/32c7ef16-a574-11e6-813d-42010af00002-data (spec.Name: data) on pod sleepyrc-fzhyl (UID: 32c7ef16-a574-11e6-813d-42010af00002). Verify that your node machine has the required components before attempting to mount this volume type. Required binary /sbin/mount.nfs is missing
Automatic merge from submit-queue
Fix startup script bug in kibana image
Big thanks to @lhopki01 for noticing this!
As mention in discussion in https://github.com/kubernetes/kubernetes/pull/36103 current image crashes if we don't want to work behind proxy because of string interpolation in bash.
@piosz
Automatic merge from submit-queue
Fixes token_found bug in addon manager
From #35832.
Above PR exposed addon manager's logs on Jenkins, found below error on the gce e2e test artifacts:
```
Error from server: serviceaccounts "default" not found
error executing template "{{with index .secrets 0}}{{.name}}{{end}}": template: output:1:7: executing "output" at <index .secrets 0>: error calling index: index of untyped nil
== default service account in the kube-system namespace has token Error executing template: template: output:1:7: executing "output" at <index .secrets 0>: error calling index: index of untyped nil. Printing more information for debugging the template:
template was:
{{with index .secrets 0}}{{.name}}{{end}}
raw data was:
{"kind":"ServiceAccount","apiVersion":"v1","metadata":{"name":"default","namespace":"kube-system","selfLink":"/api/v1/namespaces/kube-system/serviceaccounts/default","uid":"de3f2f85-9d6a-11e6-9df3-42010af00002","resourceVersion":"48","creationTimestamp":"2016-10-29T00:01:40Z"}}
object given to template engine was:
map[apiVersion:v1 metadata:map[selfLink:/api/v1/namespaces/kube-system/serviceaccounts/default uid:de3f2f85-9d6a-11e6-9df3-42010af00002 resourceVersion:48 creationTimestamp:2016-10-29T00:01:40Z name:default namespace:kube-system] kind:ServiceAccount] ==
```
Seems like the script failed to retrieve service token at the first time and mistakenly used the error message as the token content. Fixes by replacing `|| true` with if condition.
https://hub.docker.com/r/library/registry/tags/
`registry:2` is constantly being updated with new versions. This means there's a possibility that the image may be changed unintentionally. For example, when the Pod is rescheduled on nodes that does not already have the image, depending on the time of the pull, `registry:2` may result in different images.
Fix this to the latest `registry:2.5.1` instead to avoid this problem.