Kubernetes initiates "graceful shutdown" by sending SIGTERM to pid 1.
The way the existing startup scripts worked, this signal arrived at
the shell wrapper, not elasticsearch, and the shell wrapper exited,
killing the container immediately.
Before this change:
1 ? Ss 0:00 /bin/sh -c /run.sh
6 ? S 0:00 /bin/bash /run.sh
13 ? S 0:00 \_ /bin/su -c /elasticsearch/bin/elasticsearch elasticsearch
14 ? Ss 0:00 \_ sh -c /elasticsearch/bin/elasticsearch
15 ? Sl 19:18 \_ /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java ... org.elasticsearch.bootstrap.Elasticsearch start
After this change:
1 ? Ssl 0:29 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java ... org.elasticsearch.bootstrap.Elasticsearch start
Automatic merge from submit-queue (batch tested with PRs 41826, 42405)
Add stubDomains and upstreamNameservers configuration to kube-dns
```release-note
Updates the dnsmasq cache/mux layer to be managed by dnsmasq-nanny.
dnsmasq-nanny manages dnsmasq based on values from the
kube-system:kube-dns configmap:
"stubDomains": {
"acme.local": ["1.2.3.4"]
},
is a map of domain to list of nameservers for the domain. This is used
to inject private DNS domains into the kube-dns namespace. In the above
example, any DNS requests for *.acme.local will be served by the
nameserver 1.2.3.4.
"upstreamNameservers": ["8.8.8.8", "8.8.4.4"]
is a list of upstreamNameservers to use, overriding the configuration
specified in /etc/resolv.conf.
```
Automatic merge from submit-queue
Fixing unbound bash variable.
**What this PR does / why we need it**: this fixes a bug introduced in 1.6 for ABAC.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**: without this, we hit an unbound variable and fail to bring up the kube-apiserver with ABAC enabled.
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 42070, 42127)
Remove fluentd-gcp image sources
This PR removes fluentd-gcp image sources from the main kubernetes repo to move it the `contrib`: https://github.com/kubernetes/contrib/pull/2426
Once image is moved, it will be maintained by Stackdriver team (@igorpeshansky, @qingling128 and @dhrupadb)
CC @ixdy @timstclair
Automatic merge from submit-queue
Remove the kube-discovery binary from the tree
**What this PR does / why we need it**:
kube-discovery was a temporary solution to implementing proposal: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/bootstrap-discovery.md
However, this functionality is now gonna be implemented in the core for v1.6 and will fully replace kube-discovery:
- https://github.com/kubernetes/kubernetes/pull/36101
- https://github.com/kubernetes/kubernetes/pull/41281
- https://github.com/kubernetes/kubernetes/pull/41417
So due to that `kube-discovery` isn't used in any v1.6 code, it should be removed.
The image `gcr.io/google_containers/kube-discovery-${ARCH}:1.0` should and will continue to exist so kubeadm <= v1.5 continues to work.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Remove cmd/kube-discovery from the tree since it's not necessary anymore
```
@jbeda @dgoodwin @mikedanese @dmmcquay @lukemarsden @errordeveloper @pires
Automatic merge from submit-queue (batch tested with PRs 41919, 41149, 42350, 42351, 42285)
Juju: Disable anonymous auth on kubelet
**What this PR does / why we need it**:
This disables anonymous authentication on kubelet when deployed via Juju.
I've also adjusted a few other TLS options for kubelet and kube-apiserver. The end result is that:
1. kube-apiserver can now authenticate with kubelet
2. kube-apiserver now verifies the integrity of kubelet
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/219
**Special notes for your reviewer**:
This is dependent on PR #41251, where the tactics changes are being merged in separately.
Some useful pages from the documentation:
* [apiserver -> kubelet](https://kubernetes.io/docs/admin/master-node-communication/#apiserver---kubelet)
* [Kubelet authentication/authorization](https://kubernetes.io/docs/admin/kubelet-authentication-authorization/)
**Release note**:
```release-note
Juju: Disable anonymous auth on kubelet
```
Automatic merge from submit-queue (batch tested with PRs 41306, 42187, 41666, 42275, 42266)
remove support for debian masters in GCE
Asked about this on the mailing list and no one objects.
@zmerlynn @roberthbailey
```release-note
Remove support for debian masters in GCE kube-up.
```
Automatic merge from submit-queue (batch tested with PRs 41672, 42084, 42233, 42165, 42273)
remove azure getting kube-ups.
Haven't been touched in > 7 months.
@colemickens , i"m going to send out an email about this.
```release-note
Remove Azure kube-up as the Azure community has focused efforts elsewhere.
```
Automatic merge from submit-queue (batch tested with PRs 42126, 42130, 42232, 42245, 41932)
Update fluentd-gcp configuration for hosted masters
This PR makes use of the new fluentd-gcp image, which is not configured per se, for the hosted masters, which cannot use configmaps.
Mirroring https://github.com/kubernetes/kubernetes/pull/42126
Automatic merge from submit-queue
Move fluentd DS config to configmap
This is the logical continuation of https://github.com/kubernetes/kubernetes/pull/41998. This PR makes fluentd-gcp DaemonSet use the new image configured using ConfigMap.
This PR doesn't change the way fluentd-gcp works in case master is not registered, that'll be fixed in a separate PR
CC @ixdy @timstclair @igorpeshansky @qingling128 @dhrupadb
**Release note:**
```release-note
Fluentd-gcp containers spawned by DaemonSet are now configured using ConfigMap
```
Automatic merge from submit-queue (batch tested with PRs 41931, 39821, 41841, 42197, 42195)
Adding legacy ABAC for 1.6
This is a fork of a previous [pull request](https://github.com/kubernetes/kubernetes/pull/42014) to include feedback as the original author is unavailable.
Adds a mechanism to optionally enable legacy abac for 1.6 to provide a migration path for existing users.
Automatic merge from submit-queue (batch tested with PRs 41931, 39821, 41841, 42197, 42195)
Admission Controller: Add Pod Preset
Based off the proposal in https://github.com/kubernetes/community/pull/254
cc @pmorie @pwittrock
TODO:
- [ ] tests
**What this PR does / why we need it**: Implements the Pod Injection Policy admission controller
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Added new Api `PodPreset` to enable defining cross-cutting injection of Volumes and Environment into Pods.
```
Automatic merge from submit-queue (batch tested with PRs 41644, 42020, 41753, 42206, 42212)
Update defaultbackend image to 1.3
Update `gcr.io/google-containers/defaultbackend` to the latest version.
See https://github.com/kubernetes/contrib/pull/2386
/cc @ixdy
export functions from pkg/api/validation
add settings API
add settings to pkg/registry
add settings api to pkg/master/master.go
add admission control plugin for pod preset
add new admission control plugin to kube-apiserver
add settings to import_known_versions.go
add settings to codegen
add validation tests
add settings to client generation
add protobufs generation for settings api
update linted packages
add settings to testapi
add settings install to clientset
add start of e2e
add pod preset plugin to config-test.sh
Signed-off-by: Jess Frazelle <acidburn@google.com>
Automatic merge from submit-queue (batch tested with PRs 38676, 41765, 42103, 41833, 41702)
Update addons yaml files for converting tolerations to api fields.
Automatic merge from submit-queue (batch tested with PRs 42162, 41973, 42015, 42115, 41923)
add nrpe-external-master relation to kubernetes-master and kubernetes-worker
**What this PR does / why we need it**:
This PR adds an an nrpe-external-master relation to the kubernetes-worker, kubernetes-master and kubeapi-load-balancer charms. This is needed to monitor the state of the workers, the masters and the load-balancers via Nagios.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/165
**Special notes for your reviewer**:
Original work by @axinojolais in PR #40897. All I've done is squash commits on his behalf.
**Release note**:
```release-note
The kubernetes-master, kubernetes-worker and kubeapi-load-balancer charms have gained an nrpe-external-master relation, allowing the integration of their monitoring in an external Nagios server.
```
Automatic merge from submit-queue
remove trusty GCE kube-up.sh
Asked on the mailing list. No one objected. Lot's of people were in favor.
cc @roberthbailey
```release-note
Remove support for trusty in GCE kube-up.
```
Automatic merge from submit-queue (batch tested with PRs 35094, 42095, 42059, 42143, 41944)
Use chroot for containerized mounts
This PR is to modify the containerized mounter script to use chroot
instead of rkt fly. This will avoid the problem of possible large number
of mounts caused by rkt containers if they are not cleaned up.
Automatic merge from submit-queue (batch tested with PRs 40746, 41699, 42108, 42174, 42093)
Avoid fake node names in user info
Node usernames should follow the format `system:node:<node-name>`,
but if we don't know the node name, it's worse to put a fake one in.
In the future, we plan to have a dedicated node authorizer, which would
start rejecting requests from a user with a bogus node name like this.
The right approach is to either mint correct credentials per node, or use node bootstrapping so it requests a correct client certificate itself.
Automatic merge from submit-queue (batch tested with PRs 41937, 41151, 42092, 40269, 42135)
GCE will properly regenerate basic_auth.csv on kube-apiserver start.
**What this PR does / why we need it**:
If basic_auth.csv does not exist we will generate it as normal.
If basic_auth.csv exists we will remove the old admin password before adding the "new" one. (Turns in to a no-op if the password exists).
This did not work properly before because we were replacing by key, where the key was the password. New password would not match and so not replace the old password.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#41935
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Updates the dnsmasq cache/mux layer to be managed by dnsmasq-nanny.
dnsmasq-nanny manages dnsmasq based on values from the
kube-system:kube-dns configmap:
"stubDomains": {
"acme.local": ["1.2.3.4"]
},
is a map of domain to list of nameservers for the domain. This is used
to inject private DNS domains into the kube-dns namespace. In the above
example, any DNS requests for *.acme.local will be served by the
nameserver 1.2.3.4.
"upstreamNameservers": ["8.8.8.8", "8.8.4.4"]
is a list of upstreamNameservers to use, overriding the configuration
specified in /etc/resolv.conf.
Automatic merge from submit-queue (batch tested with PRs 42058, 41160, 42065, 42076, 39338)
Bump up dns-horizontal-autoscaler to 1.1.1
cluster-proportional-autoscaler 1.1.1 is releasing by kubernetes-incubator/cluster-proportional-autoscaler#26, also bump it up for dns-horizontal-autoscaler to introduce below features:
- Add PreventSinglePointFailure option in linear mode.
- Use protobufs for communication with apiserver.
- Support switching control mode on-the-fly.
Note:
The new entry `"preventSinglePointFailure":true` ensures kube-dns to have at least 2 replicas when there is more than one node. Mitigate the issue mentioned in #40063.
@bowei @thockin
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 42058, 41160, 42065, 42076, 39338)
Juju: Fix shebangs in charm actions to use python3
**What this PR does / why we need it**:
This fixes the microbot and create-rbd-pv actions by reverting them back to python3. We accidentally switched them to python2 to match the boilerplate checker's expectations for python files.
It looks like hack/verify-boilerplate.sh does not check these since they don't have the .py extension, so we should be good with no changes there.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/212
**Special notes for your reviewer**:
**Release note**:
```release-note
Juju: Fix shebangs in charm actions to use python3
```
Automatic merge from submit-queue
Use docker log rotation mechanism instead of logrotate
This is a solution for https://github.com/kubernetes/kubernetes/issues/38495.
Instead of rotating logs using logrotate tool, which is configured quite rigidly, this PR makes docker responsible for the rotation and makes it possible to configure docker logging parameters. It solves the following problems:
* Logging agent will stop loosing lines upon rotation
* Container's logs size will be more strictly constrained. Instead of checking the size hourly, size will be checked upon write, preventing https://github.com/kubernetes/kubernetes/issues/27754
It's still far from ideal, for example setting logging options per pod, as suggested in https://github.com/kubernetes/kubernetes/issues/15478 would be much more flexible, but latter approach requires deep changes, including changes in API, which may be in vain because of CRI and long-term vision for logging.
Changes include:
* Change in salt. It's possible to configure docker log parameters, using variables in pillar. They're exported from env variables on `gce`, but for different cloud provider they have to be exported first.
* Change in `configure-helper.sh` scripts for those os on `gce` that don't use salt + default values exposed via env variables
This change may be problematic for kubelet logs functionality with CRI enabled, that will be tackled in the follow-up PR, if confirmed.
CC @piosz @Random-Liu @yujuhong @dashpole @dchen1107 @vishh @kubernetes/sig-node-pr-reviews
```release-note
On GCI by default logrotate is disabled for application containers in favor of rotation mechanism provided by docker logging driver.
```
Automatic merge from submit-queue
Cleanup fluentd-gcp image, rebase on debian-base
**Why we need this PR**:
There are several problems with our current fluentd-gcp image:
- It pulls in lots of unused packages, which expose unnecessary risk and create noise in CVE scans (and scare customers). The most notable example is the fluent-ui, which pulls in rails.
- `curl | sh ` is not a good practice for a Dockerfile. First, the script is not checked in the same source control branch, so builds are not reproducible. Second, the actions it is taking are opaque. Third, in this case, using non-standard packages means they're harder to manage with CVE scans & upstream fixes.
**What is changed by this PR?**
- Rather than relying on td-agent (which includes fluent-ui), use standard upstream packages. This is largely based off the [official fluentd debian-based image](https://github.com/fluent/fluentd-docker-image/blob/master/v0.12/debian/Dockerfile).
- Rebases the image on debian-base (depends on https://github.com/kubernetes/kubernetes/pull/41915). We would like to move towards a single full-distro base image we can maintain. This change should be relatively minor.
As a result of these changes, the image size is reduced from 360.6 MB to 185.8 MB (nearly half). Many packages were removed, and the full diff (focus on the unversioned files) is listed here: 3fb704f977
**Which issue this PR fixes** https://github.com/kubernetes/kubernetes/issues/40248
**Special notes for your reviewer**:
This change both addresses security concerns, and is expected to greatly reduce the maintenance burden of the fluentd-gcp image. I'd *really* like to get this into 1.6, so please prioritize this review if possible.
I tested this by running the default e2e suite on a private e2e cluster using the new image. If there are other tests you'd like me to run, please let me know ASAP.
**Release note**:
```release-note
Cleanup fluentd-gcp image: rebase on debian-base, switch to upstream packages, remove fluent-ui & rails
```
Automatic merge from submit-queue (batch tested with PRs 35408, 41915, 41992, 41964, 41925)
bump version numbers for heapster/influxdb/grafana images
**What this PR does / why we need it**:
It updates version of monitoring components (heapster/influxdb/grafana) to the latest one used by heapster
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
[e2e/monitoring.go](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/monitoring.go) test seems to be passing without modifications
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 40932, 41896, 41815, 41309, 41628)
enable DefaultTolerationSeconds admission controller by default
**What this PR does / why we need it**:
Continuation of PR #41414, enable DefaultTolerationSeconds admission controller by default.
**Which issue this PR fixes**:
fixes: #41860
related Issue: #1574, #25320
related PRs: #34825, #41133, #41414
**Special notes for your reviewer**:
**Release note**:
```release-note
enable DefaultTolerationSeconds admission controller by default
```
If the file does not exist we will generate it as normal.
If the file exists we will remove the old admin password before adding
the "new" one. (Turns in to a no-op if the password exists).
This did not work properly before because we were replacing by key,
where the key was the password. New password would not match and so
not replace the old password.
Added a METADATA_CLOBBERS_CONFIG flag
METADATA_CLOBBERS_CONFIG controls if we consider the values on disk or in
metadata to be the canonical source of truth. Currently defaulting to
false for GCE and forcing to true for GKE.
Added handling for older forms of the basic_auth.csv file.
Fixed comment to reflect new METADATA_CLOBBERS_CONFIG var.
Automatic merge from submit-queue
Protect kubeproxy deployed via kube-up from system OOMs
This change is necessary until it can be moved to Guaranteed QoS Class.
For #40573
Automatic merge from submit-queue (batch tested with PRs 41854, 41801, 40088, 41590, 41911)
Bump gcr.io/google-containers/rescheduler to v0.2.2
**What this PR does / why we need it**: updates the rescheduler image to one based on busybox instead of ubuntu-slim. Changes for the image were in https://github.com/kubernetes/contrib/pull/2390.
Do you think this merits a release note? I'm leaning towards no.
**Release note**:
```release-note
Update gcr.io/google-containers/rescheduler to v0.2.2, which uses busybox as a base image instead of ubuntu.
```
cc @timstclair
Automatic merge from submit-queue (batch tested with PRs 41854, 41801, 40088, 41590, 41911)
Default storage class for vSphere Fixes#40070
**What this PR does / why we need it**:
Create default storage class for vSphere. This is part of the storage class GA effort https://github.com/kubernetes/features/issues/36
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
fixes#40070
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 40665, 41094, 41351, 41721, 41843)
Multi master patch
**What this PR does / why we need it**: Corrects a sync files issue present when running in a HA Master configuration. This PR adds logic to syncronize on first deployment for `/etc/kubernetes/serviceaccount.key` which will cause cypto verification failure if not 1:1 on each master unit. Additionally syncs basic_auth and additional files in /srv/kubernetes.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#41019
**Special notes for your reviewer**: This requires PR #41251 as a dependency before merging.
**Release note**:
```release-note
Juju - K8s master charm now properly keeps distributed master files in sync for an HA control plane.
```
Automatic merge from submit-queue
Supports 'ensure exist' class addon in Addon-manager
Fixes#39561, fixes#37047 and fixes#36411. Depends on #40057.
This PR splits cluster addons into two categories:
- Reconcile: Addons that need to be reconciled (`kube-dns` for instance).
- EnsureExists: Addons that need to be exist but changeable (`default-storage-class`).
The behavior for the 'EnsureExists' class addon would be:
- Create it if not exist.
- Users could do any modification they want, addon-manager will not reconcile it.
- If it is deleted, addon-manager will recreate it with the given template.
- It will not be updated/clobbered during upgrade.
As Brian pointed out in [#37048/comment](https://github.com/kubernetes/kubernetes/issues/37048#issuecomment-272510835), this may not be the best solution for addon-manager. Though #39561 needs to be fixed in 1.6 and we might not have enough bandwidth to do a big surgery.
@mikedanese @thockin
cc @kubernetes/sig-cluster-lifecycle-misc
---
Tasks for this PR:
- [x] Supports 'ensure exist' class addon and switch to use new labels in addon-manager.
- [x] Updates READMEs regarding the new behavior of addon-manager.
- [x] Updated `test/e2e/addon_update.go` to match the new behavior.
- [x] Go through all current addons and apply the new labels on them regarding what they need.
- [x] Bump addon-manager and update its template files.
This PR is to modify the containerized mounter script to use chroot
instead of rkt fly. This will avoid the problem of possible large number
of mounts caused by rkt containers if they are not cleaned up.
Automatic merge from submit-queue
Remove ivan4th from reviewers
**What this PR does / why we need it**:
Per @ivan4th request in #41351 he would like to be removed from the
reviewers list in this directory tree. This commit addresses that
request.
**Special notes for your reviewer**:
As Ivan has already investigated the PR in question under 41351 I would like to see that driven to landing before landing this OWNERS file change, unless another reviewer would like to step in and help land that open PR.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Base etcd-empty-dir-cleanup on busybox, run as nobody, and update to etcdctl 3.0.14
**What this PR does / why we need it**: since the `etcd-empty-dir-cleanup` image just uses a simple shell script and `etcdctl`, we can base it on busybox, which is a smaller target than alpine.
I've also updated this to use an `etcdctl` from etcd 3.0.14, which matches the version of etcd we're running in 1.6 clusters (I believe), and changed the tag to match the `etcdctl` version.
Tested in my own e2e cluster, where it seems to work.
I haven't pushed the image yet, so e2e tests *may* fail. Tagging `do-not-merge`; if you think this looks good, I'll push the image and retest.
**Release note**:
```release-note
```
cc @timstclair @mml @wojtek-t
Automatic merge from submit-queue
move kube-dns to a separate service account
Switches the kubedns addon to run as a separate service account so that we can subdivide RBAC permission for it. The RBAC permissions will need a little more refinement which I'm expecting to find in https://github.com/kubernetes/kubernetes/pull/38626 .
@cjcullen @kubernetes/sig-auth since this is directly related to enabling RBAC with subdivided permissions
@thockin @kubernetes/sig-network since this directly affects now kubedns is added.
```release-note
`kube-dns` now runs using a separate `system:serviceaccount:kube-system:kube-dns` service account which is automatically bound to the correct RBAC permissions.
```
Automatic merge from submit-queue (batch tested with PRs 39855, 41433, 41567, 41887, 41652)
Add fluentd monitoring to fluentd-gcp image
Right now we are not able to monitor the state of fluentd in cluster, which may result in logging subsystem quietly failing. This PR tries to address that problem by introducing the fluentd container monitoring:
* fluentd internal metrics, like number of buffers and number of data in buffers
* `logging_line_count`, number of lines, read by fluentd from application containers' logs
* Has `tag` label, corresponding to the fluentd tag of the entry
* `logging_entry_count`, number of entries, emitted to the output plugin
* With label `component` set to `container`, generated by application containers
* With label `component` set to `system`, generated by system components like kubelet, docker, scheduler, etc.
* Has `tag` label, corresponding to the fluentd tag of the entry
CC @fabxc @igorpeshansky @edsiper
Automatic merge from submit-queue (batch tested with PRs 41812, 41665, 40007, 41281, 41771)
Bump golang versions to 1.7.5
**What this PR does / why we need it**: While #41636 might not make it in until 1.7, this would bump current golang versions from 1.7.4 to 1.7.5 to integrate the fixes from that patch version. This would include, among other things, a fix to ensure cross-built binaries for darwin don't have certificate validation errors (golang/go#18688)
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: none
**Special notes for your reviewer**:
**Release note**:
```release-note
Upgrade golang versions to 1.7.5
```
Automatic merge from submit-queue (batch tested with PRs 41797, 41793, 41795, 41807, 41781)
Remove unnecessary metrics (http/process/go) from being exposed by etcd-version-monitor
Unregister metrics we do not want from the etcd version metrics handler.
cc @wojtek-t @piosz
Automatic merge from submit-queue (batch tested with PRs 41797, 41793, 41795, 41807, 41781)
Turn fluentd supervisor off for fluentd-gcp
By default, turn fluentd supervisor off so that when fluentd process fails, for example due to OOM, container fails completely and it would be easy to detect.
CC @igorpeshansky @qingling128
This branch sync's the crypto keys, and flat-files used for auth with
all the masters when scaling the kubernetes-master units. This should
fix the mis-matched crypto keys seen when rebooting units after first
deploy.
Automatic merge from submit-queue (batch tested with PRs 41349, 41532, 41256, 41587, 41657)
Update kubectl in addon-manager to use HPA in autoscaling/v1
Addon-manager is broken since HPA objects were removed from extensions api group.
Came across the logs from [the latest addon-manager on Jenkins](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce/4290/artifacts/bootstrap-e2e-master/kube-addon-manager.log):
```
INFO: == Entering periodical apply loop at 2017-02-16T17:33:37+0000 ==
error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource
WRN: == Failed to execute /usr/local/bin/kubectl apply --namespace=kube-system -f /etc/kubernetes/addons --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:38+0000. 2 tries remaining. ==
error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource
WRN: == Failed to execute /usr/local/bin/kubectl apply --namespace=kube-system -f /etc/kubernetes/addons --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:46+0000. 1 tries remaining. ==
error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource
WRN: == Failed to execute /usr/local/bin/kubectl apply --namespace=kube-system -f /etc/kubernetes/addons --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:53+0000. 0 tries remaining. ==
WRN: == Kubernetes addon update completed with errors at 2017-02-16T17:33:58+0000 ==
```
And notice this commit (f66679a4e9) came in two weeks ago, which removed HorizontalPodAutoscaler from extensions/v1beta1.
Addon-manager is now partially functioning that it could successfully create and update addons, but will fail to prune objects, which means upgrade tests may mostly fail.
Pushed another version of addon-manager with kubectl v1.6.0-alpha.2 ([release 2 days ago](https://github.com/kubernetes/kubernetes/releases/tag/v1.6.0-alpha.2)) for fixing, including below images:
- gcr.io/google-containers/kube-addon-manager:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-amd64:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-arm:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-arm64:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-ppc64le:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-s390x:v6.4-alpha.2
@mikedanese
cc @wojtek-t @shyamjvs
Automatic merge from submit-queue (batch tested with PRs 41349, 41532, 41256, 41587, 41657)
Lint fixes for the master and worker Python code.
**What this PR does / why we need it**: lint fixes for the python code.
**Which issue this PR fixes** none
**Special notes for your reviewer**: This is lint fixes for the Juju python code.
**Release note**:
```release-note
NONE
```
Please consider these changes so we can pass flake8 lint tests in our build process.
Automatic merge from submit-queue (batch tested with PRs 41364, 40317, 41326, 41783, 41782)
Add ability to enable cache mutation detector in GCE
Add the ability to enable the cache mutation detector in GCE. The current default behavior (disabled) is retained.
When paired with https://github.com/kubernetes/test-infra/pull/1901, we'll be able to detect shared informer cache mutations in gce e2e PR jobs.
Automatic merge from submit-queue (batch tested with PRs 41706, 39063, 41330, 41739, 41576)
[Kubemark] Add option to log hollow-node logs
Ref https://github.com/kubernetes/kubernetes/issues/41613
Added an option to log kubemark hollow-node logs which includes kubelet, kubeproxy and npd logs for each hollow-node.
Setting the env var `ENABLE_HOLLOW_NODE_LOGS=true` should now enable logging for tests.
cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek @yujuhong @Random-Liu
Automatic merge from submit-queue
Add standalone npd on GCI.
This PR added standalone NPD in GCE GCI cluster. I already verified the PR, and it should work.
/cc @dchen1107 @fabioy @andyxning @kubernetes/sig-node-misc
Automatic merge from submit-queue
Fix the output of health-mointor.sh
The script show prints the errors/response of the health check, but not
show the progress of `curl`.
Automatic merge from submit-queue
Added a basic monitor for providing etcd version related info
Fixes#41071
This tool scrapes metrics partly from etcd's /version and /metrics endpoints and partly using etcdctl and exposes them as prometheus metrics at `http://localhost:9101/metrics` endpoint on the master. Here is a summary of the metrics it exposes (self-explanatory from the code):
- etcdVersionFetchCount = prometheus.NewCounterVec(
prometheus.CounterOpts{
Namespace: "etcd",
Name: "version_info_fetch_count",
Help: "Number of times etcd's version info was fetched, labeled by etcd's server binary and cluster version",
},
[]string{"serverversion", "clusterversion"})
- etcdGRPCRequestsTotal = prometheus.NewCounterVec(
prometheus.CounterOpts{
Namespace: namespace,
Name: "grpc_requests_total",
Help: "Counter of received grpc requests, labeled by grpc method and grpc service names",
},
[]string{"grpc_method", "grpc_service"})
For further info on how to run this as a binary/docker-container/kubernetes-pod and checking the metrics, have a look at the README.md file.
cc @fgrzadkowski @wojtek-t @piosz
Allow cache mutation detector enablement by PRs in an attempt to find
mutations before they're merged in to the code base. It's just for the
apiserver and controller-manager for now. If/when the other components
start using a SharedInformerFactory, we should set them up just like
this as well.
Automatic merge from submit-queue
Reduce default value of kubemark's NUM_NODES to 10
Changing the default value of kubemark's NUM_NODES from 100 to 10, as it would then be possible to start kubemark on gce clusters that have been started using kube-up that uses the default config of three n1-standard-2 nodes. I've already been asked by a couple of people about why kubemark is not starting on their cluster because of this. More people shouldn't be facing this issue in future.
cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek
Automatic merge from submit-queue
Bump fluentd-gcp google_cloud plugin version
Bump the version of `fluent-plugin-google-cloud` in fluentd-gcp image, because it's broken for version `0.5.2`.
Recently, gem `google-api-client` was updated to version `0.10.0`. The new version broke `fluent-plugin-google-cloud` which doesn't specify the upper version of `google-api-client` gem. I'm bumping the version used in our image to allow future changes in this release to be run and tested.
This PR doesn't bump the version, since no effective changes has happened, leaving this for the next PR to do.
CC @igorpeshansky
Automatic merge from submit-queue (batch tested with PRs 40000, 41508, 41489)
Add toleration to fluentd daemonset to make it run on master
Because of https://github.com/kubernetes/kubernetes/pull/41172 fluentd pods stopped being allocated on master node.
This PR introduces toleration for master taint for fluentd.
CC @davidopp @janetkuo @kubernetes/sig-scheduling-bugs
Unfortunately, we don't have e2e tests to ensure that master logs are being ingested. This problem is a great signal to work on https://github.com/kubernetes/kubernetes/issues/41411
Automatic merge from submit-queue (batch tested with PRs 40000, 41508, 41489)
Make fluentd use default dns instead of cluster dns to make it work o…
Fix https://github.com/kubernetes/kubernetes/issues/41415
Fluentd for Stackdriver requires external urls (e.g. `logging.googleapis.com`) to be available in order to work. If fluentd runs on master, it cannot access the service endpoint of cluster DNS. This change makes fluentd use default dns to fix this problem.
CC @thockin @bowei
Automatic merge from submit-queue (batch tested with PRs 41104, 41245, 40722, 41439, 41502)
openstack-heat: do not daemonize salt-minion
_openstack-heat_ does currently not setup a _salt-master_, so it is not necessary to daemonize it.
**What this PR does / why we need it**:
as stated in #40721:
> The _openstack-heat_ provider only installs _salt-minions_, no _salt-master_. The configuration does not take this into account which causes the following issues:
>
> - the _salt minion_ is not able to DNS resolve `salt` (see fist part of error log below)
> - the _salt-minion_ is daemonized and fails finding the master (second part of error log below). From my understanding is not required when there is no salt-master, as the setup uses `salt-call`
> anyway (see [gce provider](https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/configure-vm.sh#L328-L339) as reference).
>
> ```
> Jan 31 03:00:04 kube-stack-master salt-minion[9795]: [ERROR ] DNS lookup of 'salt' failed.
> Jan 31 03:00:04 kube-stack-master salt-minion[9795]: [ERROR ] Master hostname: 'salt' not found. Retrying in 30 seconds
> ...
> Jan 31 02:35:30 kube-stack-master salt-minion[9690]: [ERROR ] Error while bringing up minion for multi-master. Is master at salt responding?
> ```
>
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#40721
**Release note**:
```release-note
Do not daemonize `salt-minion` for the openstack-heat provider.
```
Automatic merge from submit-queue (batch tested with PRs 41104, 41245, 40722, 41439, 41502)
Change the etcd rollback tool to do rollback to 2.2.1 version.
I did some tests of it and for my 3-node cluster with 1 deployment it worked fine.
But before merging this, we should probably do way more testing (we should rerun tests that @mml was doing for the previous script).
@lavalamp @xiang90
Automatic merge from submit-queue
Added configurable etcd initial-cluster-state to kube-up script.
Added configurable etcd initial-cluster-state to kube-up script. This
allows creation of multi-master cluster from scratch. This is a
cherry-pick of #41320 from 1.5 branch.
```release-note
Added configurable etcd initial-cluster-state to kube-up script.
```
Automatic merge from submit-queue (batch tested with PRs 41134, 41410, 40177, 41049, 41313)
Refactored kubemark code into provider-specific and provider-independent parts [Part-3]
Fixes#38967
Applying final part of the changes in PR #39033 (which refactored kubemark code completely). The changes included in this PR are:
- Removed `test/kubemark/common.sh` and moved relevant parts of its code to the right places in start-kubemark/stop-kubemark scripts.
- Added DOCKER_REGISTRY, PROJECT, KUBEMARK_IMAGE_MAKE_TARGET variables to `/test/kubemark/cloud-provider-config.sh` to make the kubemark image push location variable wrt provider.
- Removed get-real-pod-for-hollow-node.sh as it doesn't seem to do anything useful.
@kubernetes/sig-scalability-misc @wojtek-t @gmarek