Commit Graph

5945 Commits (f32b390cf08a9afd9f9899e0d97a90eb162b32a8)

Author SHA1 Message Date
Kubernetes Submit Queue 787f5e0fe5 Merge pull request #48735 from nicksardo/gce-empty-network-proj
Automatic merge from submit-queue (batch tested with PRs 48698, 48712, 48516, 48734, 48735)

GCE: Allow empty NETWORK_PROJECT_ID env var

Changes:
1. Adds `GCE_API_ENDPOINT` logic to container-linux as it was added to GCI in #47881.
1. Apply `NETWORK_PROJECT_ID` value to gce.conf only if the env var is set.

/sig network
/area platform/gce

**Release note**:
```release-note
NONE
```
2017-07-12 04:56:33 -07:00
Nick Sardo ebce7d2497 Allow missing NETWORK_PROJECT_ID env var 2017-07-10 14:26:47 -07:00
Mik Vyatskov b11084e76c Bump event-exporter version 2017-07-10 17:32:40 +02:00
Nick Sardo 06e328627c Use network project id for firewall/route mgmt and zone listing 2017-07-06 16:58:27 -07:00
Kubernetes Submit Queue 20e629b1c6 Merge pull request #44394 from rthallisey/pre-existing-provider
Automatic merge from submit-queue

Launch kubemark with an existing Kubemark master

In order to expand the use of kubemark, allow developers to use kubemark with a pre-existing Kubernetes cluster.

Ref issue  #44393
2017-07-06 04:41:53 -07:00
Kubernetes Submit Queue 40a21312d1 Merge pull request #48144 from juju-solutions/bug/worker-termination
Automatic merge from submit-queue (batch tested with PRs 48399, 48450, 48144)

Skip errors when unregistering juju kubernetes-workers

**What this PR does / why we need it**: When removing a kubernetes node from using Juju and for some reason kubernetes master fails we should not error the node, instead we should proceed with the removal of the node and the master will recognise that node as unavailable because it will fail heartbeats.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/300

**Special notes for your reviewer**:

**Release note**:

```
Clean decommission of Juju kubernetes worker units 
```
2017-07-05 08:58:39 -07:00
Kubernetes Submit Queue 7b13208d61 Merge pull request #48450 from wwwtyro/rye/lxc-disable-conntrack-max
Automatic merge from submit-queue (batch tested with PRs 48399, 48450, 48144)

configure kube-proxy to run with unset conntrack param when in lxc

**What this PR does / why we need it**: Configures the Juju Charm code to run kube-proxy with `conntrack-max-per-core` set to `0` when in an lxc as a workaround for issues when mounting `/sys/module/nf_conntrack/parameters/hashsize`

**Release note**:

```release-note
Configures the Juju Charm code to run kube-proxy with conntrack-max-per-core set to 0 when in an lxc as a workaround for issues when mounting /sys/module/nf_conntrack/parameters/hashsize
```
2017-07-05 08:58:37 -07:00
Ryan Hallisey 82e1d208f6 Launch kubemark with an existing Kubemark Master
In order to expand the use of kubemark, allow developers to
use kubemark with a pre-existing Kubemark master.
2017-07-05 09:14:53 -04:00
Konstantinos Tsakalozos 90a57931af Skip errors when unregistering juju kubernetes-workers 2017-07-05 16:00:37 +03:00
Kubernetes Submit Queue 5d21390561 Merge pull request #41790 from wojtek-t/allow_for_enabling_conversion_mismatch_detecto
Automatic merge from submit-queue

Add ability to enable patch conversion detector

Will rebase and fix once #41326 is merged.
2017-07-04 13:18:22 -07:00
Kubernetes Submit Queue 3823270b9e Merge pull request #48446 from Cynerva/gkk/stop-snaps
Automatic merge from submit-queue (batch tested with PRs 47043, 48448, 47515, 48446)

Fix charms leaving services running after remove-unit

**What this PR does / why we need it**:

This fixes a case where removed charm units can sometimes leave behind running services that interfere with the rest of the cluster.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
Fix charms leaving services running after remove-unit
```
2017-07-04 09:12:57 -07:00
Wojciech Tyczynski 37b5a214bc Add ability to enable patch conversion detector 2017-07-04 14:33:24 +02:00
Rye Terrell 05fbc7a7f8 configure kube-proxy to run with unset conntrack param when in lxc 2017-07-03 21:58:54 -05:00
Kubernetes Submit Queue e1d9ab205a Merge pull request #48440 from Cynerva/gkk/snap-upgrades-restart-services
Automatic merge from submit-queue (batch tested with PRs 48439, 48440, 48394)

Fix kubernetes charms not restarting services after snap upgrades

**What this PR does / why we need it**:

This fixes a problem where the Kubernetes charms don't restart services after upgrading snaps. This can cause certain fixes not to be picked up (for example https://github.com/juju-solutions/release/pull/10)

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
Fixed kubernetes charms not restarting services after snap upgrades
```
2017-07-03 13:05:28 -07:00
Kubernetes Submit Queue 937369bc21 Merge pull request #48439 from juju-solutions/bug/namespaces-path
Automatic merge from submit-queue (batch tested with PRs 48439, 48440, 48394)

Fix: namespace-create have kubectl in path

**What this PR does / why we need it**: In juju deployed clusters namespace-create action is failing

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/326

**Special notes for your reviewer**:

**Release note**:

```Fix: namespace-create action on Juju deployed clusters
```
2017-07-03 13:05:27 -07:00
George Kraft f0e08818d8 Fix charms leaving services running after unit removal 2017-07-03 14:55:07 -05:00
Konstantinos Tsakalozos cea934bcbc Fix: namespace-create have kubectl in path 2017-07-03 18:22:20 +03:00
George Kraft c21b305fe9 Fix kubernetes charms not restarting services after snap upgrades 2017-07-03 09:47:08 -05:00
Konstantinos Tsakalozos e2571a853a Non leaders should overwrite any local copies of keys they have with what the leader has. 2017-07-03 17:45:43 +03:00
Kubernetes Submit Queue 9848cdb3ac Merge pull request #48281 from hogepodge/configure-swift-store
Automatic merge from submit-queue

Add configuration for swift container name

**What this PR does / why we need it:**
This review updates the OpenStack Heat provider to allow for configuring the name of the Swift object store.

**Which issue this PR fixes:**
fixes #47966

**Special notes for your reviewer**:
Note that the terminology for OpenStack Swift conflicts with K8S terminology. In this instance, container is referring to the organization structure of Swift storage objects.

**Release note**:
```release-note
Adds configuration option for Swift object store container name to OpenStack Heat provider.
```
2017-07-02 08:02:42 -07:00
Kubernetes Submit Queue dc597291c1 Merge pull request #48351 from juju-solutions/bug/get-pass
Automatic merge from submit-queue (batch tested with PRs 48317, 48313, 48351, 48357, 48115)

Ensure get_password is accessing a file that exists.

**What this PR does / why we need it**: get_password will throw an exception instead of returning None in case the basic_auth.csv file is missing but /root/cdk/ is there in a juju deployment.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/324

**Special notes for your reviewer**:

**Release note**:

```
Fix race condition where /root/cdk is not yet initialised in kubernetes-master setup by Juju  
```
2017-06-30 19:54:27 -07:00
Kubernetes Submit Queue c0337c92cc Merge pull request #47881 from cadmuxe/endpoint
Automatic merge from submit-queue (batch tested with PRs 47918, 47964, 48151, 47881, 48299)

Add ApiEndpoint support to GCE config.

**What this PR does / why we need it**:
Add the ability to change ApiEndpoint  for GCE.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:
```release-note
None
```
2017-06-30 18:42:40 -07:00
Kubernetes Submit Queue 87c6fb5de2 Merge pull request #42376 from jingxu97/Feb/mounter
Automatic merge from submit-queue (batch tested with PRs 43558, 48261, 42376, 46803, 47058)

Add bind mount /etc/resolv.conf from host to containerized mounter

Currently, in containerized mounter rootfs, there is no DNS setup. If client
try to set up volume with host name instead of IP address, it will fail to resolve
the host name. 
By bind mount the host's /etc/resolv.conf to mounter rootfs, VM hosts name
could be resolved when using host name during mount. 

```release-note
Fixes issue where you could not mount NFS or glusterFS volumes using hostnames on GCI/GKE with COS images.
```
2017-06-30 16:28:46 -07:00
Konstantinos Tsakalozos cd34d8f80d Ensure get_password is accessing a file that exists. 2017-06-30 20:24:35 +03:00
Kubernetes Submit Queue d19773d855 Merge pull request #47835 from juju-solutions/feature/security
Automatic merge from submit-queue (batch tested with PRs 47850, 47835, 46197, 47250, 48284)

Securing the cluster created by Juju

**What this PR does / why we need it**: This PR secures the deployments done with Juju master. Works around certain security issues inherent to kubernetes (see for example dashboard access)

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```
Securing Juju kubernetes dashboard
```
2017-06-29 15:16:39 -07:00
Chris Hoge c0621061c8 Add configuration for swift container name
Fixes Issue #47966
2017-06-29 10:15:55 -07:00
Koonwah Chen c4e84e263c Change KUBE_GCE_API_ENDPOINT to GCE_API_ENDPOINT 2017-06-29 10:04:53 -07:00
Kubernetes Submit Queue d3aa0d5a8a Merge pull request #46850 from x13n/nanny-version
Automatic merge from submit-queue (batch tested with PRs 46850, 47984)

Update addon-resizer version

Update addon-resizer version and remove the flags that have been deprecated in the new version.

**What this PR does / why we need it**:
ref kubernetes/contrib#2623

**Special notes for your reviewer**:
Need to wait for merging kubernetes/contrib#2623 first.

**Release note**:
```release-note
addon-resizer flapping behavior was removed.
```
2017-06-29 07:18:32 -07:00
Kubernetes Submit Queue 7018479968 Merge pull request #48204 from shyamjvs/logdump-only-n-nodes
Automatic merge from submit-queue

Allow log-dumping only N randomly-chosen nodes in the cluster

This should let us save "lots" (~3-4 hours) of time in our 5000-node cluster scale tests as we copy logs from all the nodes to jenkins worker and then upload all of them to gcs (while we don't need too many).
This will also prevent the jenkins container facing "No space left on device" error while dumping logs, that we saw in runs 12-13 of gce-enormous-cluster.

The longterm fix will be to enable [logexporter](https://github.com/kubernetes/test-infra/tree/master/logexporter) for our tests.

cc @kubernetes/sig-scalability-misc @kubernetes/test-infra-maintainers @gmarek @fejta
2017-06-29 04:23:58 -07:00
Daniel Kłobuszewski 63ccedcfa7 Update addon-resizer version
Also, remove the flags that have been deprecated in the new version.
2017-06-29 11:03:43 +02:00
Koonwah Chen b3956a689e Add KUBE_GCE_API_ENDPOINT for GCE API endpoint config. 2017-06-28 16:03:18 -07:00
Shyam Jeedigunta b960a0da12 Allow log-dumping only N randomly-chosen nodes in the cluster 2017-06-28 23:01:08 +02:00
Shyam Jeedigunta cc8bb857f9 Allow creating special node for heapster in GCE 2017-06-28 21:27:36 +02:00
Kubernetes Submit Queue a17f15a8a9 Merge pull request #48205 from piosz/heapster-1.4
Automatic merge from submit-queue (batch tested with PRs 48004, 48205, 48130, 48207)

Bumped Heapster to v1.4.0

``` release-note
Bumped Heapster to v1.4.0.
More details about the release https://github.com/kubernetes/heapster/releases/tag/v1.4.0
```

follow up #47961
The release candidate `v1.4.0-beta.0` turned out to be stable.
2017-06-28 10:35:12 -07:00
Kubernetes Submit Queue 63d4af44ac Merge pull request #48004 from dnardo/gke
Automatic merge from submit-queue (batch tested with PRs 48004, 48205, 48130, 48207)

Do not set CNI in cases where there is a private master and network policy provider is set.

**What this PR does / why we need it**:

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
In GCE and in a "private master" setup, do not set the network-plugin provider to CNI by default if a network policy provider is given.
```
2017-06-28 10:35:10 -07:00
Kubernetes Submit Queue ec729ad66d Merge pull request #48182 from gmarek/fluentd
Automatic merge from submit-queue (batch tested with PRs 48192, 48182)

Add generic NoSchedule toleration to fluentd in gcp config as a quick…

…-fix for #44445
2017-06-28 09:33:08 -07:00
Piotr Szczesniak 43280e274d Bumped Heapster to v1.4.0 2017-06-28 16:40:35 +02:00
gmarek 10ce8e2c0d Fix bug cluster-subnet logic 2017-06-28 14:27:52 +02:00
gmarek 3f57d8dba3 Add generic NoSchedule toleration to fluentd in gcp config as a quick-fix for #44445 2017-06-28 10:35:58 +02:00
Konstantinos Tsakalozos 0525b84a45 Disable anonymous-auth 2017-06-28 10:47:45 +03:00
Zach Loafman 903bc643b1 Bump GCE ContainerVM to container-vm-v20170627
Remove the built-in kubelet (finally), pick up security fixes.
2017-06-27 16:14:55 -07:00
Kubernetes Submit Queue 89579c45a4 Merge pull request #48054 from juju-solutions/bug/terminate-etcd
Automatic merge from submit-queue (batch tested with PRs 48139, 48042, 47645, 48054, 48003)

Add a failsafe for etcd not returning a connection string

**What this PR does / why we need it**: Removing a kubernetes-master will fail as described on this issue: https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/311

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/311

**Special notes for your reviewer**: This is a two liner defensive code. I am not totally sold on this patch. I might not be the right place to address the above issue. However, solving the problem on the etcd side and updating the interface scope to be unit (as suggested) seems much more involving.

**Release note**:

```
Fix error when removing juju kubernetes-master unit
```
2017-06-27 14:08:19 -07:00
Kubernetes Submit Queue f1b58f4e5f Merge pull request #48139 from crassirostris/fix-fluentd-config
Automatic merge from submit-queue (batch tested with PRs 48139, 48042, 47645, 48054, 48003)

Fix fluentd-gcp configuration to facilitate JSON parsing

There's a bug in https://github.com/kubernetes/kubernetes/pull/45734, because of which each records gets additional field and google-cloud plugin thinks it's not JSON (https://github.com/GoogleCloudPlatform/fluent-plugin-google-cloud/blob/master/lib/fluent/plugin/out_google_cloud.rb#L569)

Fixes https://github.com/kubernetes/kubernetes/issues/48108

/cc @piosz @fgrzadkowski
2017-06-27 14:08:07 -07:00
Kubernetes Submit Queue ede78d9ee7 Merge pull request #47513 from gmarek/subnet
Automatic merge from submit-queue

Make big clusters work again after introduction of subnets

This PR does two things: 
  - make IP aliases automatically pick Node IP Range based on number of Nodes,
  - fix logic for starting clusters >4095 Nodes that was broken by introduction of subnets,

cc @wojtek-t @shyamjvs 

```release-note
Setting env var ENABLE_BIG_CLUSTER_SUBNETS=true will allow kube-up.sh to start clusters bigger that 4095 Nodes on GCE.
```

Ref https://github.com/kubernetes/kubernetes/issues/47344
2017-06-27 08:52:50 -07:00
Kubernetes Submit Queue d65b87a00d Merge pull request #47847 from chuckbutler/cluster-juju-approvers
Automatic merge from submit-queue

Insert Cynerva and Kjackal to approvers list

**What this PR does / why we need it**:
Per the membership reviews, we're looking to promote Konstantinos and
George to approvers to help distribute the review/bug load for the `cluster/juju` code
tree.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: 

**Special notes for your reviewer**:
cc @marcoceppi and @tvansteenburgh 

**Release note**:

```release-note
NONE
```
2017-06-27 08:36:30 -07:00
Mik Vyatskov b6a0e442ce Fix fluentd-gcp configuration to facilitate JSON parsing 2017-06-27 16:16:00 +02:00
Maciej Pytel 04f7a96340 Fix typo in cluster-autoscaler config 2017-06-27 13:49:19 +02:00
Maciej Pytel b11175f73f Set cluster-autoscaler node balancing flag 2017-06-27 12:10:33 +02:00
Konstantinos Tsakalozos 0b01cd743b Improve security of Juju deployed clusters 2017-06-27 12:19:21 +03:00
Kubernetes Submit Queue 0dad2d0803 Merge pull request #47983 from yguo0905/memcg
Automatic merge from submit-queue (batch tested with PRs 48092, 47894, 47983)

Enables memcg notification in cluster/node e2e tests

Ref: https://github.com/kubernetes/kubernetes/issues/42676

This PR sets Kubelet flag `--experimental-kernel-memcg-notification=true` when running cluster/node e2e tests on COS and Ubuntu images.

Tested:
```
e2e-node-cos:
I0623 00:09:06.641776    1080 server.go:147] Starting server "kubelet" with command "/usr/bin/systemd-run --unit=kubelet-777178888.service --slice=runtime.slice --remain-after-exit /tmp/node-e2e-20170622T170739/kubelet --kubelet-cgroups=/kubelet.slice --cgroup-root=/ --api-servers http://localhost:8080 --address 0.0.0.0 --port 10250 --read-only-port 10255 --volume-stats-agg-period 10s --allow-privileged true --serialize-image-pulls false --pod-manifest-path /tmp/node-e2e-20170622T170739/pod-manifest571288056 --file-check-frequency 10s --pod-cidr 10.100.0.0/24 --eviction-pressure-transition-period 30s --feature-gates  --eviction-hard memory.available<250Mi,nodefs.available<10%%,nodefs.inodesFree<5%% --eviction-minimum-reclaim nodefs.available=5%%,nodefs.inodesFree=5%% --v 4 --logtostderr --network-plugin=kubenet --cni-bin-dir /tmp/node-e2e-20170622T170739/cni/bin --cni-conf-dir /tmp/node-e2e-20170622T170739/cni/net.d --hostname-override tmp-node-e2e-bfe5799d-cos-stable-59-9460-64-0 --experimental-mounter-path=/tmp/node-e2e-20170622T170739/cluster/gce/gci/mounter/mounter --experimental-kernel-memcg-notification=true"

e2e-node-ubuntu:
I0623 00:03:28.526984    2279 server.go:147] Starting server "kubelet" with command "/usr/bin/systemd-run --unit=kubelet-1407651753.service --slice=runtime.slice --remain-after-exit /tmp/node-e2e-20170622T170203/kubelet --kubelet-cgroups=/kubelet.slice --cgroup-root=/ --api-servers http://localhost:8080 --address 0.0.0.0 --port 10250 --read-only-port 10255 --volume-stats-agg-period 10s --allow-privileged true --serialize-image-pulls false --pod-manifest-path /tmp/node-e2e-20170622T170203/pod-manifest083943734 --file-check-frequency 10s --pod-cidr 10.100.0.0/24 --eviction-pressure-transition-period 30s --feature-gates  --eviction-hard memory.available<250Mi,nodefs.available<10%%,nodefs.inodesFree<5%% --eviction-minimum-reclaim nodefs.available=5%%,nodefs.inodesFree=5%% --v 4 --logtostderr --network-plugin=kubenet --cni-bin-dir /tmp/node-e2e-20170622T170203/cni/bin --cni-conf-dir /tmp/node-e2e-20170622T170203/cni/net.d --hostname-override tmp-node-e2e-e48cdd73-ubuntu-gke-1604-xenial-v20170420-1 --experimental-kernel-memcg-notification=true"

e2e-node-containervm:
I0623 00:14:35.392383    2774 server.go:147] Starting server "kubelet" with command "/tmp/node-e2e-20170622T171318/kubelet --runtime-cgroups=/docker-daemon --kubelet-cgroups=/kubelet --cgroup-root=/ --system-cgroups=/system --api-servers http://localhost:8080 --address 0.0.0.0 --port 10250 --read-only-port 10255 --volume-stats-agg-period 10s --allow-privileged true --serialize-image-pulls false --pod-manifest-path /tmp/node-e2e-20170622T171318/pod-manifest507536807 --file-check-frequency 10s --pod-cidr 10.100.0.0/24 --eviction-pressure-transition-period 30s --feature-gates  --eviction-hard memory.available<250Mi,nodefs.available<10%,nodefs.inodesFree<5% --eviction-minimum-reclaim nodefs.available=5%,nodefs.inodesFree=5% --v 4 --logtostderr --network-plugin=kubenet --cni-bin-dir /tmp/node-e2e-20170622T171318/cni/bin --cni-conf-dir /tmp/node-e2e-20170622T171318/cni/net.d --hostname-override tmp-node-e2e-9e3fdd7c-e2e-node-containervm-v20161208-image"

e2e-cos:
Jun 23 17:54:38 e2e-test-ygg-minion-group-t5r0 kubelet[2005]: I0623 17:54:38.646374    2005 flags.go:52] FLAG: --experimental-kernel-memcg-notification="true"

e2e-ubuntu:
Jun 23 18:25:27 e2e-test-ygg-minion-group-19qp kubelet[1547]: I0623 18:25:27.722253    1547 flags.go:52] FLAG: --experimental-kernel-memcg-notification="true"

e2e-containervm:
I0623 18:55:51.886632    3385 flags.go:52] FLAG: --experimental-kernel-memcg-notification="false"
```

**Release note**:
```
None
```

/sig node
/area node-e2e
/assign @dchen1107 @dashpole
2017-06-26 21:08:10 -07:00