Automatic merge from submit-queue
fix kubedns-sa.yaml missing "namespace: kube-system" value
The file kubedns-sa.yaml missing `namespace: kube-system`, so it will create ServiceAccount kube-dns in default namespace, this will cause kube-dns deployment's pods be blocked forever;
Some logs as following:
> - lastTransitionTime: 2017-04-06T19:02:12Z
> lastUpdateTime: 2017-04-06T19:02:12Z
> message: 'unable to create pods: pods "kube-dns-699984412-" is forbidden: service
> account kube-system/kube-dns was not found, retry after the service account
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 42025, 44169, 43940)
if we have a dedicated serviceaccount keypair, use it to verify serviceaccounts
Automatic merge from submit-queue
[Federation] Remove FEDERATIONS_DOMAIN_MAP references
Remove all references to FEDERATIONS_DOMAIN_MAP as this method is no longer is used and is replaced by adding federation domain map to kube-dns configmap.
cc @madhusudancs @kubernetes/sig-federation-pr-reviews
**Release note**:
```
[Federation] Mechanism of adding `federation domain maps` to kube-dns deployment via `--federations` flag is superseded by adding/updating `federations` key in `kube-system/kube-dns` configmap. If user is using kubefed tool to join cluster federation, adding federation domain maps to kube-dns is already taken care by `kubefed join` and does not need further action.
```
Automatic merge from submit-queue
Use shared informers for proxy endpoints and service configs
Use shared informers instead of creating local controllers/reflectors
for the proxy's endpoints and service configs. This allows downstream
integrators to pass in preexisting shared informers to save on memory &
cpu usage.
This also enables the cache mutation detector for kube-proxy for those
presubmit jobs that already turn it on.
Follow-up to #43295 cc @wojtek-t
Will race with #43937 for conflicting changes 😄 cc @thockin
cc @smarterclayton @sttts @liggitt @deads2k @derekwaynecarr @eparis @kubernetes/rh-cluster-infra
Automatic merge from submit-queue (batch tested with PRs 44047, 43514, 44037, 43467)
Juju: Enable GPU mode if GPU hardware detected
**What this PR does / why we need it**:
Automatically configures kubernetes-worker node to utilize GPU hardware when such hardware is detected.
layer-nvidia-cuda does the hardware detection, installs CUDA and Nvidia
drivers, and sets a state that the k8s-worker can react to.
When gpu is available, worker updates config and restarts kubelet to
enable gpu mode. Worker then notifies master that it's in gpu mode via
the kube-control relation.
When master sees that a worker is in gpu mode, it updates to privileged
mode and restarts kube-apiserver.
The kube-control interface has subsumed the kube-dns interface
functionality.
An 'allow-privileged' config option has been added to both worker and
master charms. The gpu enablement respects the value of this option;
i.e., we can't enable gpu mode if the operator has set
allow-privileged="false".
**Special notes for your reviewer**:
Quickest test setup is as follows:
```bash
# Bootstrap. If your aws account doesn't have a default vpc, you'll need to
# specify one at bootstrap time so that juju can provision a p2.xlarge.
# Otherwise you can leave out the --config "vpc-id=vpc-xxxxxxxx" bit.
juju bootstrap --config "vpc-id=vpc-xxxxxxxx" --constraints "cores=4 mem=16G root-disk=64G" aws/us-east-1 k8s
# Deploy the bundle containing master and worker charms built from
# https://github.com/tvansteenburgh/kubernetes/tree/gpu-support/cluster/juju/layers
juju deploy cs:~tvansteenburgh/bundle/kubernetes-gpu-support-3
# Setup kubectl locally
mkdir -p ~/.kube
juju scp kubernetes-master/0:config ~/.kube/config
juju scp kubernetes-master/0:kubectl ./kubectl
# Download a gpu-dependent job spec
wget -O /tmp/nvidia-smi.yaml https://raw.githubusercontent.com/madeden/blogposts/master/k8s-gpu-cloud/src/nvidia-smi.yaml
# Create the job
kubectl create -f /tmp/nvidia-smi.yaml
# You should see a new nvidia-smi-xxxxx pod created
kubectl get pods
# Wait a bit for the job to run, then view logs; you should see the
# nvidia-smi table output
kubectl logs $(kubectl get pods -l name=nvidia-smi -o=name -a)
```
kube-control interface: https://github.com/juju-solutions/interface-kube-control
nvidia-cuda layer: https://github.com/juju-solutions/layer-nvidia-cuda
(Both are registered on http://interfaces.juju.solutions/)
**Release note**:
```release-note
Juju: Enable GPU mode if GPU hardware detected
```
Automatic merge from submit-queue
get-kube-local.sh checks pods with option "--namespace=kube-system"
**What this PR does / why we need it**:
Local cluster creation using get-kube-local.sh is never finished.
The get-kube-local.sh monitors running_count of pods such as etcd,
master and kube-proxy, but these pods are created under the namespace
kube-system. Therefore kubectl can't find these pods then cluster
creation isn't completed.
The get-kube-local.sh should monitor created pods with option
"--namespace=kube-system".
**Which issue this PR fixes** : fixes#42517
**Release note**:
```
`NONE`
```
Automatic merge from submit-queue
cluster/log-dump - chmod files before dumping
We make the files world-readable, so that installation techniques that
lock down the logfiles can still be dumped.
Issue https://github.com/kubernetes/test-infra/issues/2397
```release-note
NONE
```
Use shared informers instead of creating local controllers/reflectors
for the proxy's endpoints and service configs. This allows downstream
integrators to pass in preexisting shared informers to save on memory &
cpu usage.
This also enables the cache mutation detector for kube-proxy for those
presubmit jobs that already turn it on.
Automatic merge from submit-queue
Adding more proxy options and header to nginx load-balancer.
**What this PR does / why we need it**: The kubeapi-load-balancer uses nginx to proxy commands to the kube-apiserver. It currently does not support SPDY and therefore the `kubectl exec` command is broken.
**Which issue this PR fixes** :
fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/226
fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/201
**Special notes for your reviewer**: This only changes the nginx configuration no code change was required.
**Release note**:
```release-note
Using http2 in kubeapi-load-balancer to fix kubectl exec uses
```
Automatic merge from submit-queue (batch tested with PRs 42379, 42668, 42876, 41473, 43260)
Silence error messages from the docker rmi call we expect to fail
**What this PR does / why we need it**: when we removed `docker tag -f` in #34361 we added a bunch of `docker rmi` calls to preserve behavior for older docker versions. That step is usually a no-op, however, and results in confusing messages like
```
Tagging docker image gcr.io/google_containers/kube-proxy:c8d0b2e7a06b451117a8ac58fc3bb3d3 as gcr.io/kubernetes-release-test/kube-proxy-amd64:v1.5.4
Error response from daemon: No such image: gcr.io/kubernetes-release-test/kube-proxy-amd64:v1.5.4
```
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#42665
**Special notes for your reviewer**: I could probably remove the `docker rmi` calls entirely, though I don't know if folks are still using docker < 1.10. (I think Jenkins still has 1.9.1.)
**Release note**:
```release-note
NONE
```
cc @jessfraz
Automatic merge from submit-queue (batch tested with PRs 43726, 43643)
Make a smaller redis image for testing, based on Alpine.
**What this PR does / why we need it**:
This shrinks gcr.io/google_containers/redis from 400MB to 5MB, which should reduce flakes.
**Which issue this PR fixes**:
fixes#43631
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Moves dns-horizontal-autoscaler to a separate service account
Similar to #38816.
As one of the cluster add-ons, dns-horizontal-autoscaler is now using the default service account in kube-system namespace, which is introduced by https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/e2e-rbac-bindings/random-addon-grabbag.yaml for the ease of transition. This default service account will be removed in the future.
This PR subdivides dns-horizontal-autoscaler to a separate service account and setup the necessary permissions.
@bowei
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43518, 42467)
install/kube-up: fix some errors while install k8s through kube-up/down.sh
What this PR does / why we need it:
etcd2.3.1 will be installed follow this scripts, but k8s use etcd3 as default storage backend, so the next error will always be apprear:
API server: rpc error: code = 13 desc = transport is closing
so i think we should change the version of etcd
thank you!
Automatic merge from submit-queue
Centos provider: generate SSL certificates for etcd cluster.
**What this PR does / why we need it**:
Support secure etcd cluster for centos provider, generate SSL certificates for etcd in default. Running it w/o SSL is exposing cluster data to everyone and is not recommended. [#39462](https://github.com/kubernetes/kubernetes/pull/39462#issuecomment-271601547)
/cc @jszczepkowski @zmerlynn
**Release note**:
```release-note
Support secure etcd cluster for centos provider.
```
Automatic merge from submit-queue
added prompt warning if etcd3 media type isn't set during upgrade
**What this PR does / why we need it**:
This adds a prompt confirming the upgrade when `STORAGE_MEDIA_TYPE` is not explicitly set. This is to prevent users from accidentally upgrading to protobuf.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
Alongs with docs, addresses #43669
**Special notes for your reviewer**:
Should be cherrypicked onto `release-1.6`
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 41297, 42638, 42666, 43039, 42567)
Allow minion floating IPs to be optional
**What this PR does / why we need it**:
Makes the generation of floating IPs for worker nodes optional, based on an env var. To quote the original issue:
> Currently, the OpenStack installation method assigns a floating IP to every single worker node. While this is fine for smaller clusters with a good sized IP pool, it can cause issues in environments with high node counts or less IPs available.
**Which issue this PR fixes**:
https://github.com/kubernetes/kubernetes/issues/40737
**Special notes for your reviewer**:
I used the conditions section of the Heat spec: https://docs.openstack.org/developer/heat/template_guide/hot_spec.html#conditions-section
**Release note**:
```release-note
OpenStack clusters can now specify whether worker nodes are assigned a floating IP
```
Automatic merge from submit-queue (batch tested with PRs 43048, 43624, 43649)
Remove E2E_UPGRADE_TEST check in config-test.sh
Once https://github.com/kubernetes/test-infra/pull/2330 merges, the upgrade tests will drive the exact behavior they want, and we can remove the check for envvars leaked from the job env
Automatic merge from submit-queue
Update NPD rbac.
I recently enabled NPD in gke.
However, I found that in gke e2e test (https://k8s-testgrid.appspot.com/google-gke#gci-gke), npd on the node could not talk with apiserver, and reported full of following errors:
```
E0324 05:08:26.745545 1328 manager.go:160] failed to update node conditions: the server does not allow access to the requested resource (patch nodes gke-bootstrap-e2e-default-pool-fd91d792-mqh4)
E0324 05:08:37.719423 1328 manager.go:160] failed to update node conditions: the server does not allow access to the requested resource (patch nodes gke-bootstrap-e2e-default-pool-fd91d792-mqh4)
E0324 05:08:47.719694 1328 manager.go:160] failed to update node conditions: the server does not allow access to the requested resource (patch nodes gke-bootstrap-e2e-default-pool-fd91d792-mqh4)
```
I created a GKE cluster (v1.7.0-alpha.0.1483+1e879c69ecf09e) myself, and found that addon manager could not create npd binding with the following error:
```
error: error validating "/etc/kubernetes/addons/node-problem-detector/standalone/npd-binding.yaml": error validating data: couldn't find type: v1alpha1.ClusterRoleBinding; if you choose to ignore these errors, turn validation off with --validate=false
```
I found that rbac was updated to beta, but npd was missed because it was merged after 9e6a3496b4 (diff-b05c70853d9a772b310db71a61297841).
I updated rbac to beta in the master manifest and npd on the node could talk with apiserver immediately.
We must get this in 1.6 to make NPD working. @dchen1107
@dchen1107 @fabioy @liggitt
Automatic merge from submit-queue (batch tested with PRs 43546, 43544)
Default to enabling legacy ABAC policy in non-test kube-up.sh environments
Fixes https://github.com/kubernetes/kubernetes/issues/43541
In 1.5, we unconditionally stomped the abac policy file if KUBE_USER was set, and unconditionally used ABAC mode pointing to that file.
In 1.6, unless the user opts out (via `ENABLE_LEGACY_ABAC=false`), we want the same legacy policy included as a fallback to RBAC.
This PR:
* defaults legacy ABAC **on** in normal deployments
* defaults legacy ABAC **on** in upgrade E2Es (ensures combination of ABAC and RBAC works properly for upgraded clusters)
* defaults legacy ABAC **off** in non-upgrade E2Es (ensures e2e tests 1.6+ run with tightened permissions, and that default RBAC roles cover the required core components)
GKE changes to drive the `ENABLE_LEGACY_ABAC` envvar were made by @cjcullen out of band
```release-note
`kube-up.sh` using the `gce` provider enables both RBAC authorization and the permissive legacy ABAC policy that makes all service accounts superusers. To opt out of the permissive ABAC policy, export the environment variable `ENABLE_LEGACY_ABAC=false` before running `cluster/kube-up.sh`.
```
Automatic merge from submit-queue
Bump CNI consumers to v0.5.1
**What this PR does / why we need it**:
- vendored CNI plugins properly handle `DEL` on missing resources
- update CNI version refs
**Which issue this PR fixes**
fixes#43488
**Release note**:
`bumps CNI to version v0.5.1 where plugins properly handle DEL on non existent resources`
layer-nvidia-cuda does the hardware detection and sets a state that the
worker can react to.
When gpu is available, worker updates config and restarts kubelet to
enable gpu mode. Worker then notifies master that it's in gpu mode via
the kube-control relation.
When master sees that a worker is in gpu mode, it updates to privileged
mode and restarts kube-apiserver.
The kube-control interface has subsumed the kube-dns interface
functionality.
An 'allow-privileged' config option has been added to both worker and
master charms. The gpu enablement respects the value of this option;
i.e., we can't enable gpu mode if the operator has set
allow-privileged="false".
Automatic merge from submit-queue
Add an env KUBE_ENABLE_MASTER_NOSCHEDULE_TAINT and disable it by default
This PR changed master `NoSchedule` taint to opt-in.
As is discussed with @bgrant0607 @janetkuo, `NoSchedule` master taint breaks existing user workload, we should not enable it by default.
Previously, NPD required the taint because it can only support one OS distro with a specific configuration. If master and node are using different OS distros, NPD will not work either on master or node. However, we've already fixed this in https://github.com/kubernetes/kubernetes/pull/40206, so for NPD it's fine to disable the taint.
This should work, but I'll still try it in my cluster to confirm.
@kubernetes/sig-scheduling-misc @dchen1107 @mikedanese