Automatic merge from submit-queue
Ensure only 1 Swift URL is used in cluster operations
**What this PR does / why we need it**:
Extracts only 1 Swift URL if multiple are returned from Keystone.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
https://github.com/kubernetes/kubernetes/issues/34930
**Special notes for your reviewer**:
**Release note**:
```release-note
Heat cluster operations now support environments that have multiple Swift URLs
```
Automatic merge from submit-queue
Use auto mode networks instead of legacy networks in GCP
Use of the --range flag creates legacy networks in GCP.
Legacy networks will not support new GCP features.
```release-note
NONE
```
Automatic merge from submit-queue
Add support for IP aliases for pod IPs (GCP alpha feature)
```release-note
Adds support for allocation of pod IPs via IP aliases.
# Adds KUBE_GCE_ENABLE_IP_ALIASES flag to the cluster up scripts (`kube-{up,down}.sh`).
KUBE_GCE_ENABLE_IP_ALIASES=true will enable allocation of PodCIDR ips
using the ip alias mechanism rather than using routes. This feature is currently
only available on GCE.
## Usage
$ CLUSTER_IP_RANGE=10.100.0.0/16 KUBE_GCE_ENABLE_IP_ALIASES=true bash -x cluster/kube-up.sh
# Adds CloudAllocator to the node CIDR allocator (kubernetes-controller manager).
If CIDRAllocatorType is set to `CloudCIDRAllocator`, then allocation
of CIDR allocation instead is done by the external cloud provider and
the node controller is only responsible for reflecting the allocation
into the node spec.
- Splits off the rangeAllocator from the cidr_allocator.go file.
- Adds cloudCIDRAllocator, which is used when the cloud provider allocates
the CIDR ranges externally. (GCE support only)
- Updates RBAC permission for node controller to include PATCH
```
KUBE_GCE_ENABLE_IP_ALIASES=true will enable allocation of PodCIDR ips
using the ip alias mechanism rather than using routes.
NODE_IP_RANGE will control the node instance IP cidr
KUBE_GCE_IP_ALIAS_SIZE controls the size of each podCIDR
IP_ALIAS_SUBNETWORK controls the name of the subnet created for the cluster
Automatic merge from submit-queue (batch tested with PRs 43866, 42748)
hack/cluster: download cfssl if not present
hack/local-up-cluster.sh uses cfssl to generate certificates and
will exit it cfssl is not already installed. But other cluster-up
mechanisms (GCE) that generate certs just download cfssl if not
present. Make local-up-cluster.sh do that too so users don't have
to bother installing it from somewhere.
Automatic merge from submit-queue
fix kubedns-sa.yaml missing "namespace: kube-system" value
The file kubedns-sa.yaml missing `namespace: kube-system`, so it will create ServiceAccount kube-dns in default namespace, this will cause kube-dns deployment's pods be blocked forever;
Some logs as following:
> - lastTransitionTime: 2017-04-06T19:02:12Z
> lastUpdateTime: 2017-04-06T19:02:12Z
> message: 'unable to create pods: pods "kube-dns-699984412-" is forbidden: service
> account kube-system/kube-dns was not found, retry after the service account
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 42025, 44169, 43940)
if we have a dedicated serviceaccount keypair, use it to verify serviceaccounts
Automatic merge from submit-queue
[Federation] Remove FEDERATIONS_DOMAIN_MAP references
Remove all references to FEDERATIONS_DOMAIN_MAP as this method is no longer is used and is replaced by adding federation domain map to kube-dns configmap.
cc @madhusudancs @kubernetes/sig-federation-pr-reviews
**Release note**:
```
[Federation] Mechanism of adding `federation domain maps` to kube-dns deployment via `--federations` flag is superseded by adding/updating `federations` key in `kube-system/kube-dns` configmap. If user is using kubefed tool to join cluster federation, adding federation domain maps to kube-dns is already taken care by `kubefed join` and does not need further action.
```
Automatic merge from submit-queue
Use shared informers for proxy endpoints and service configs
Use shared informers instead of creating local controllers/reflectors
for the proxy's endpoints and service configs. This allows downstream
integrators to pass in preexisting shared informers to save on memory &
cpu usage.
This also enables the cache mutation detector for kube-proxy for those
presubmit jobs that already turn it on.
Follow-up to #43295 cc @wojtek-t
Will race with #43937 for conflicting changes 😄 cc @thockin
cc @smarterclayton @sttts @liggitt @deads2k @derekwaynecarr @eparis @kubernetes/rh-cluster-infra
Automatic merge from submit-queue (batch tested with PRs 44047, 43514, 44037, 43467)
Juju: Enable GPU mode if GPU hardware detected
**What this PR does / why we need it**:
Automatically configures kubernetes-worker node to utilize GPU hardware when such hardware is detected.
layer-nvidia-cuda does the hardware detection, installs CUDA and Nvidia
drivers, and sets a state that the k8s-worker can react to.
When gpu is available, worker updates config and restarts kubelet to
enable gpu mode. Worker then notifies master that it's in gpu mode via
the kube-control relation.
When master sees that a worker is in gpu mode, it updates to privileged
mode and restarts kube-apiserver.
The kube-control interface has subsumed the kube-dns interface
functionality.
An 'allow-privileged' config option has been added to both worker and
master charms. The gpu enablement respects the value of this option;
i.e., we can't enable gpu mode if the operator has set
allow-privileged="false".
**Special notes for your reviewer**:
Quickest test setup is as follows:
```bash
# Bootstrap. If your aws account doesn't have a default vpc, you'll need to
# specify one at bootstrap time so that juju can provision a p2.xlarge.
# Otherwise you can leave out the --config "vpc-id=vpc-xxxxxxxx" bit.
juju bootstrap --config "vpc-id=vpc-xxxxxxxx" --constraints "cores=4 mem=16G root-disk=64G" aws/us-east-1 k8s
# Deploy the bundle containing master and worker charms built from
# https://github.com/tvansteenburgh/kubernetes/tree/gpu-support/cluster/juju/layers
juju deploy cs:~tvansteenburgh/bundle/kubernetes-gpu-support-3
# Setup kubectl locally
mkdir -p ~/.kube
juju scp kubernetes-master/0:config ~/.kube/config
juju scp kubernetes-master/0:kubectl ./kubectl
# Download a gpu-dependent job spec
wget -O /tmp/nvidia-smi.yaml https://raw.githubusercontent.com/madeden/blogposts/master/k8s-gpu-cloud/src/nvidia-smi.yaml
# Create the job
kubectl create -f /tmp/nvidia-smi.yaml
# You should see a new nvidia-smi-xxxxx pod created
kubectl get pods
# Wait a bit for the job to run, then view logs; you should see the
# nvidia-smi table output
kubectl logs $(kubectl get pods -l name=nvidia-smi -o=name -a)
```
kube-control interface: https://github.com/juju-solutions/interface-kube-control
nvidia-cuda layer: https://github.com/juju-solutions/layer-nvidia-cuda
(Both are registered on http://interfaces.juju.solutions/)
**Release note**:
```release-note
Juju: Enable GPU mode if GPU hardware detected
```
Automatic merge from submit-queue
get-kube-local.sh checks pods with option "--namespace=kube-system"
**What this PR does / why we need it**:
Local cluster creation using get-kube-local.sh is never finished.
The get-kube-local.sh monitors running_count of pods such as etcd,
master and kube-proxy, but these pods are created under the namespace
kube-system. Therefore kubectl can't find these pods then cluster
creation isn't completed.
The get-kube-local.sh should monitor created pods with option
"--namespace=kube-system".
**Which issue this PR fixes** : fixes#42517
**Release note**:
```
`NONE`
```
Automatic merge from submit-queue
cluster/log-dump - chmod files before dumping
We make the files world-readable, so that installation techniques that
lock down the logfiles can still be dumped.
Issue https://github.com/kubernetes/test-infra/issues/2397
```release-note
NONE
```
Use shared informers instead of creating local controllers/reflectors
for the proxy's endpoints and service configs. This allows downstream
integrators to pass in preexisting shared informers to save on memory &
cpu usage.
This also enables the cache mutation detector for kube-proxy for those
presubmit jobs that already turn it on.
Automatic merge from submit-queue
Adding more proxy options and header to nginx load-balancer.
**What this PR does / why we need it**: The kubeapi-load-balancer uses nginx to proxy commands to the kube-apiserver. It currently does not support SPDY and therefore the `kubectl exec` command is broken.
**Which issue this PR fixes** :
fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/226
fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/201
**Special notes for your reviewer**: This only changes the nginx configuration no code change was required.
**Release note**:
```release-note
Using http2 in kubeapi-load-balancer to fix kubectl exec uses
```
hack/local-up-cluster.sh uses cfssl to generate certificates and
will exit it cfssl is not already installed. But other cluster-up
mechanisms (GCE) that generate certs just download cfssl if not
present. Make local-up-cluster.sh do that too.
Automatic merge from submit-queue (batch tested with PRs 42379, 42668, 42876, 41473, 43260)
Silence error messages from the docker rmi call we expect to fail
**What this PR does / why we need it**: when we removed `docker tag -f` in #34361 we added a bunch of `docker rmi` calls to preserve behavior for older docker versions. That step is usually a no-op, however, and results in confusing messages like
```
Tagging docker image gcr.io/google_containers/kube-proxy:c8d0b2e7a06b451117a8ac58fc3bb3d3 as gcr.io/kubernetes-release-test/kube-proxy-amd64:v1.5.4
Error response from daemon: No such image: gcr.io/kubernetes-release-test/kube-proxy-amd64:v1.5.4
```
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#42665
**Special notes for your reviewer**: I could probably remove the `docker rmi` calls entirely, though I don't know if folks are still using docker < 1.10. (I think Jenkins still has 1.9.1.)
**Release note**:
```release-note
NONE
```
cc @jessfraz
Per Clayton's suggestion, move stuff from cluster/lib/util.sh to
hack/lib/util.sh. Also consolidate ensure-temp-dir and use the
hack/lib/util.sh implementation rather than cluster/common.sh.
Automatic merge from submit-queue (batch tested with PRs 43726, 43643)
Make a smaller redis image for testing, based on Alpine.
**What this PR does / why we need it**:
This shrinks gcr.io/google_containers/redis from 400MB to 5MB, which should reduce flakes.
**Which issue this PR fixes**:
fixes#43631
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Moves dns-horizontal-autoscaler to a separate service account
Similar to #38816.
As one of the cluster add-ons, dns-horizontal-autoscaler is now using the default service account in kube-system namespace, which is introduced by https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/e2e-rbac-bindings/random-addon-grabbag.yaml for the ease of transition. This default service account will be removed in the future.
This PR subdivides dns-horizontal-autoscaler to a separate service account and setup the necessary permissions.
@bowei
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43518, 42467)
install/kube-up: fix some errors while install k8s through kube-up/down.sh
What this PR does / why we need it:
etcd2.3.1 will be installed follow this scripts, but k8s use etcd3 as default storage backend, so the next error will always be apprear:
API server: rpc error: code = 13 desc = transport is closing
so i think we should change the version of etcd
thank you!
Automatic merge from submit-queue
Centos provider: generate SSL certificates for etcd cluster.
**What this PR does / why we need it**:
Support secure etcd cluster for centos provider, generate SSL certificates for etcd in default. Running it w/o SSL is exposing cluster data to everyone and is not recommended. [#39462](https://github.com/kubernetes/kubernetes/pull/39462#issuecomment-271601547)
/cc @jszczepkowski @zmerlynn
**Release note**:
```release-note
Support secure etcd cluster for centos provider.
```
Automatic merge from submit-queue
added prompt warning if etcd3 media type isn't set during upgrade
**What this PR does / why we need it**:
This adds a prompt confirming the upgrade when `STORAGE_MEDIA_TYPE` is not explicitly set. This is to prevent users from accidentally upgrading to protobuf.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
Alongs with docs, addresses #43669
**Special notes for your reviewer**:
Should be cherrypicked onto `release-1.6`
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 41297, 42638, 42666, 43039, 42567)
Allow minion floating IPs to be optional
**What this PR does / why we need it**:
Makes the generation of floating IPs for worker nodes optional, based on an env var. To quote the original issue:
> Currently, the OpenStack installation method assigns a floating IP to every single worker node. While this is fine for smaller clusters with a good sized IP pool, it can cause issues in environments with high node counts or less IPs available.
**Which issue this PR fixes**:
https://github.com/kubernetes/kubernetes/issues/40737
**Special notes for your reviewer**:
I used the conditions section of the Heat spec: https://docs.openstack.org/developer/heat/template_guide/hot_spec.html#conditions-section
**Release note**:
```release-note
OpenStack clusters can now specify whether worker nodes are assigned a floating IP
```
Automatic merge from submit-queue (batch tested with PRs 43048, 43624, 43649)
Remove E2E_UPGRADE_TEST check in config-test.sh
Once https://github.com/kubernetes/test-infra/pull/2330 merges, the upgrade tests will drive the exact behavior they want, and we can remove the check for envvars leaked from the job env
Automatic merge from submit-queue
Update NPD rbac.
I recently enabled NPD in gke.
However, I found that in gke e2e test (https://k8s-testgrid.appspot.com/google-gke#gci-gke), npd on the node could not talk with apiserver, and reported full of following errors:
```
E0324 05:08:26.745545 1328 manager.go:160] failed to update node conditions: the server does not allow access to the requested resource (patch nodes gke-bootstrap-e2e-default-pool-fd91d792-mqh4)
E0324 05:08:37.719423 1328 manager.go:160] failed to update node conditions: the server does not allow access to the requested resource (patch nodes gke-bootstrap-e2e-default-pool-fd91d792-mqh4)
E0324 05:08:47.719694 1328 manager.go:160] failed to update node conditions: the server does not allow access to the requested resource (patch nodes gke-bootstrap-e2e-default-pool-fd91d792-mqh4)
```
I created a GKE cluster (v1.7.0-alpha.0.1483+1e879c69ecf09e) myself, and found that addon manager could not create npd binding with the following error:
```
error: error validating "/etc/kubernetes/addons/node-problem-detector/standalone/npd-binding.yaml": error validating data: couldn't find type: v1alpha1.ClusterRoleBinding; if you choose to ignore these errors, turn validation off with --validate=false
```
I found that rbac was updated to beta, but npd was missed because it was merged after 9e6a3496b4 (diff-b05c70853d9a772b310db71a61297841).
I updated rbac to beta in the master manifest and npd on the node could talk with apiserver immediately.
We must get this in 1.6 to make NPD working. @dchen1107
@dchen1107 @fabioy @liggitt