This is the 2nd attempt. The previous was reverted while we figured out
the regional mirrors (oops).
New plan: k8s.gcr.io is a read-only facade that auto-detects your source
region (us, eu, or asia for now) and pulls from the closest. To publish
an image, push k8s-staging.gcr.io and it will be synced to the regionals
automatically (similar to today). For now the staging is an alias to
gcr.io/google_containers (the legacy URL).
When we move off of google-owned projects (working on it), then we just
do a one-time sync, and change the google-internal config, and nobody
outside should notice.
We can, in parallel, change the auto-sync into a manual sync - send a PR
to "promote" something from staging, and a bot activates it. Nice and
visible, easy to keep track of.
I'm running two Kubernetes clusters on GCE. One for production and one for staging. The instance prefix I use for production is `kubernetes` and for staging it's `staging-kubernetes`. This caused a problem when running `kube-up.sh` for production because when it tries to find all instances which match `kubernetes(-...)?` it finds both the production and staging instances. This probably results in multiple problems, but the most noticeable one for me was that I`NITIAL_ETCD_CLUSTER` was incorrect and so etcd wouldn't start up correctly so the api server doesn't start up correctly so nothing else starts up. I tested this manually and it seems to work for me, but I didn't write an automated test.
Automatic merge from submit-queue (batch tested with PRs 58246, 58247). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
cluster: remove support for cvm from gce kube-up
see #49213
```release-note
Remove deprecated ContainerVM support from GCE kube-up.
```
Automatic merge from submit-queue (batch tested with PRs 57670, 56888). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Limit number of pods listed as master liveness check.
**What this PR does / why we need it**:
Another step in making #55686 less likely.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 56208, 55690). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Dump last curl output if cluster fails to come up.
**What this PR does / why we need it**:
This is a step toward solving #55686
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Extract gnu-sed detection into a function
**What this PR does / why we need it**:
Moves gnu-sed detection into a reusable function across scripts (considering it's in multiple places).
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix log collection for kubeadm-gce tests
**What this PR does / why we need it**:
Separate out kuberenetes-anywhere provider under cluster/ but
delegate all the functionality to the "gce" one since the code
would be the same. Except for the name of the node, the
NODE_INSTANCE_PREFIX will be different, so account for that.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Separate out kuberenetes-anywhere provider under cluster/ but
delegate all the functionality to the "gce" one since the code
would be the same. Except for the name of the node, the
NODE_INSTANCE_PREFIX will be different, so account for that.
Automatic merge from submit-queue (batch tested with PRs 52868, 53196, 54207). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Allow users to configure the service account made available on their nodes
**What this PR does / why we need it**: This allows users (and tests) to configure what GCP service account nodes are given when they are created, to allow users to grant fewer permissions to their nodes via IAM (instead of scopes). Read more about service accounts and scopes here: https://cloud.google.com/compute/docs/access/service-accounts
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#53603
**Special notes for your reviewer**:
**Release note**:
```release-note
Allow GCE users to configure the service account made available on their nodes
```
Automatic merge from submit-queue (batch tested with PRs 53051, 52489, 53920). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Test gcloud exit status when detecting master for GCE e2e test
e2e tests exit on error, so without testing the exit status of a command its scripted error message will never be printed.
**What this PR does / why we need it**: This prints the intended "could not detect Kubernetes master" error message instead of a stack trace from e2e test
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#52474
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
[GCE kube-up] Allow creating/deleting custom network
**What this PR does / why we need it**:
From https://github.com/kubernetes/test-infra/issues/4472.
This is the first step to make PR jobs use custom network instead of auto network (so that we will be less likely hitting subnetwork quota issue).
The last commit is purely for testing out the changes on PR jobs. It will be removed after review.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #NONE.
**Special notes for your reviewer**:
/assign @bowei @nicksardo
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Auto-calculate master disk and root disk sizes in GCE
@gmarek PR https://github.com/kubernetes/kubernetes/pull/49282 didn't fix the issue because MASTER_DISK_SIZE was defaulting to 20GB in config-test.sh before being calculated inside get-master-disk-size() where you use pre-existing value if any.
It should be fixed by this now.
Automatic merge from submit-queue (batch tested with PRs 48565, 49172)
On GCE check whether NODE_LOCAL_SSDS=0 and handle this case appropriately
**What this PR does / why we need it**: Presently if you are using a mac and GCE and specify NODE_LOCAL_SSDS=0, or use the default, you end up with 2 local SSDs.
**Which issue this PR fixes** : fixes https://github.com/kubernetes/kubernetes/issues/49171
**Special notes for your reviewer**:
I've discovered that this issue is due to b353792f9c/cluster/gce/util.sh (L579)
If NODE_LOCAL_SSDS=0, this evaluates to $(seq 0)
```
$ for i in $(seq 0); do echo $i; done
1
0
```
From man seq on mac osx
```
The seq utility prints a sequence of numbers, one per line (default), from first (default 1),
to near last as possible, in increments of incr (default 1).When first is larger than last the
default incr is -1.
```
This was run on mac with the seq manpage indicating it comes from BSD Feb 19 2010.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 47403, 46646, 46906, 46527, 46792)
Avoid redundant copying of tars during kube-up for gce if the same file already exists
**What this PR does / why we need it**:
Whenever I execute cluster/kube-up.sh it copies my tar files to google cloud, even if the files haven't changed. This PR checks to see whether the files already exist, and avoids uploading them again. These files are large and can take a long time to upload.
**Which issue this PR fixes**: fixes#46791
**Special notes for your reviewer**:
Here is the new output:
cluster/kube-up.sh
... Starting cluster in us-central1-b using provider gce
... calling verify-prereqs
... calling verify-kube-binaries
... calling kube-up
Project: PROJECT
Zone: us-central1-b
+++ Staging server tars to Google Storage: gs://kubernetes-staging-PROJECT/kubernetes-devel
+++ kubernetes-server-linux-amd64.tar.gz uploaded earlier, cloud and local file md5 match (md5 = 3a095kcf27267a71fe58f91f89fab1bc)
**Release note**:
```cluster/kube-up.sh on gce now avoids redundant copying of kubernetes tars if the local and cloud files' md5 hash match```
Automatic merge from submit-queue (batch tested with PRs 46299, 46309, 46311, 46303, 46150)
Create a subnet for reserving the service cluster IP range
This will be done if IP aliases is enabled on GCP.
```release-note
NONE
```
IP aliases are an alpha feature, and node accelerators are a beta
feature. $gcloud determines which is appropriate.
Before, this would try to run "gcloud alpha beta", which is incoherent.
Using Ubuntu on GCE to run cluster e2e tests requires slightly different
node.yaml and master.yaml files than GCI, because Ubuntu uses systemd as
PID 1, wheras GCI uses upstart with a systemd delegate. Therefore the
e2e tests fail using those files since the kubernetes services are not
brought back up after a node/master reboot.
Automatic merge from submit-queue
Use auto mode networks instead of legacy networks in GCP
Use of the --range flag creates legacy networks in GCP.
Legacy networks will not support new GCP features.
```release-note
NONE
```
Automatic merge from submit-queue
Add support for IP aliases for pod IPs (GCP alpha feature)
```release-note
Adds support for allocation of pod IPs via IP aliases.
# Adds KUBE_GCE_ENABLE_IP_ALIASES flag to the cluster up scripts (`kube-{up,down}.sh`).
KUBE_GCE_ENABLE_IP_ALIASES=true will enable allocation of PodCIDR ips
using the ip alias mechanism rather than using routes. This feature is currently
only available on GCE.
## Usage
$ CLUSTER_IP_RANGE=10.100.0.0/16 KUBE_GCE_ENABLE_IP_ALIASES=true bash -x cluster/kube-up.sh
# Adds CloudAllocator to the node CIDR allocator (kubernetes-controller manager).
If CIDRAllocatorType is set to `CloudCIDRAllocator`, then allocation
of CIDR allocation instead is done by the external cloud provider and
the node controller is only responsible for reflecting the allocation
into the node spec.
- Splits off the rangeAllocator from the cidr_allocator.go file.
- Adds cloudCIDRAllocator, which is used when the cloud provider allocates
the CIDR ranges externally. (GCE support only)
- Updates RBAC permission for node controller to include PATCH
```
KUBE_GCE_ENABLE_IP_ALIASES=true will enable allocation of PodCIDR ips
using the ip alias mechanism rather than using routes.
NODE_IP_RANGE will control the node instance IP cidr
KUBE_GCE_IP_ALIAS_SIZE controls the size of each podCIDR
IP_ALIAS_SUBNETWORK controls the name of the subnet created for the cluster
Per Clayton's suggestion, move stuff from cluster/lib/util.sh to
hack/lib/util.sh. Also consolidate ensure-temp-dir and use the
hack/lib/util.sh implementation rather than cluster/common.sh.
Automatic merge from submit-queue
Use the cluster name in the names of the firewall rules that allow cluster-internal traffic to disambiguate the rules belonging to different clusters.
Also dropping the network name from these firewall rule names.
Network name was used to disambiguate firewall rules in a given network.
However, since two clusters cannot share a name in a GCE project, this
sufficiently disambiguates the firewall rule names. A potential confusion
arises when someone tries to create a firewall rule with the same name
in a different network, but that's also an indication that they shouldn't
be doing that.
@jszczepkowski due to PR #33094
@ixdy for test-infra
cc @kubernetes/sig-federation @nikhiljindal
Network name was used to disambiguate firewall rules in a given network.
However, since two clusters cannot share a name in a GCE project, this
sufficiently disambiguates the firewall rule names. A potential confusion
arises when someone tries to create a firewall rule with the same name
in a different network, but that's also an indication that they shouldn't
be doing that.
Automatic merge from submit-queue
Don't update gcloud in cluster/*/util.sh
**What this PR does / why we need it**:
Removes automatic gcloud update commands from `cluster/gce/util.sh`, `cluster/gke/util.sh`. Setting env `KUBE_PROMPT_FOR_UPDATE=y` will update required components, otherwise it will only verify that required components are present and at a minimum required version.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#35834
**Special notes for your reviewer**:
Inline python is nasty but I *really* don't want to do version comparison in bash. Open to other suggestions for verifying required version of gcloud components. cc @kubernetes/sig-cluster-lifecycle, @kubernetes/sig-testing
**Release note**:
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
-->
```release-note
`kube-up.sh`/`kube-down.sh` no longer force update gcloud for provider=gce|gke.
```
Automatic merge from submit-queue
Fix the equality checks for numeric values in cluster/gce/util.sh.
**What this PR does / why we need it**: This PR fixes an error in the gce shell scripts that results in inconsistent/incorrect behavior.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#37385
**Special notes for your reviewer**: This needs to be backported to 1.5 and 1.4.
@jszczepkowski
Automatic merge from submit-queue
Change master to advertise external IP in kubernetes service.
Change master to advertise external IP in kubernetes service.
In effect, in HA mode in case of multiple masters, IP of external load
balancer will be advertise in kubernetes service.
Change master to advertise external IP in kubernetes service.
In effect, in HA mode in case of multiple masters, IP of external load
balancer will be advertise in kubernetes service.
Automatic merge from submit-queue
Remove several red herring error messages in GCE cluster scripts
This fixes things like
```
I1018 15:57:53.524] Bringing down cluster
W1018 15:57:53.524] NODE_NAMES=
W1018 15:57:55.995] ERROR: (gcloud.compute.ssh) could not parse resource: []
W1018 15:57:56.392] ERROR: (gcloud.compute.ssh) could not parse resource: []
```
and
```
I1018 16:32:34.947] property "clusters.kubernetes-pr-cri-validation_cri-e2e-gce-agent-pr-25-0" unset.
I1018 16:32:35.079] property "users.kubernetes-pr-cri-validation_cri-e2e-gce-agent-pr-25-0" unset.
I1018 16:32:35.195] property "users.kubernetes-pr-cri-validation_cri-e2e-gce-agent-pr-25-0-basic-auth" unset.
I1018 16:32:35.307] property "contexts.kubernetes-pr-cri-validation_cri-e2e-gce-agent-pr-25-0" unset.
W1018 16:32:35.420] failed to get client config: Error in configuration: context was not found for specified context: kubernetes-pr-cri-validation_cri-e2e-gce-agent-pr-25-0
```
It seems like the `kubectl` behavior was introduced in #29236: if `current-context` is set to something invalid, it now complains.
Automatic merge from submit-queue
Implemented KUBE_DELETE_NODES flag in kube-down.
Implemented KUBE_DELETE_NODES flag in kube-down script.
It prevents removal of nodes when shutting down a HA master replica.
Automatic merge from submit-queue
cluster/gce: Update master root disk size
As part of #29213, the hyperkube image will be deployed alongside
existing dependencies.
This ends up just running over the root disk size of 10 during
extraction.
cc @yifan-gu @aaronlevy
As part of #29213, the hyperkube image will be deployed alongside
existing dependencies.
This ends up just running over the root disk size of 10 during
extraction.
Prior to this change, a K8s branch (master as well as release) was
pinned to a GCI milestone. It would pick up the latest GCI release on
that milestone at the time of cluster creation. The rationale was the
K8s users would automatically get the bug fixes in newer versions of
GCI. However in practice, it makes the runtime environment
non-deterministic, and lack of continuous e2e tests mean we would run
into breakages sooner or later.
With this change, each K8s release will pick a specific version
of GCI by default (similar to how the Debian-based container-vm gets used).
Users can override the default version through KUBE_GCE_MASTER_IMAGE and
KUBE_GCE_NODE_IMAGE environment variables.
We expect the default GCI version will be updated relatively frequently stay
updated with newer GCI releases. We can also automate the process to
automatically bump the hard-coded GCI version in future.
It can run tests against multiple existing images that match a regex.
GCI images will be using a regex.
Signed-off-by: Vishnu kannan <vishnuk@google.com>