Commit Graph

266 Commits (dc4d92e1544749a755f1ead1f8111b7279e4105e)

Author SHA1 Message Date
Random-Liu 1d3979190c Bump up npd version to v0.4.0 2017-06-06 16:30:02 -07:00
Kubernetes Submit Queue c0407972e9 Merge pull request #46938 from shyamjvs/no-reprinting-gcloud-result
Automatic merge from submit-queue

Avoid double printing output of gcloud commands in kubemark

Just noticed we were unnecessarily echoing the result again.

/cc @wojtek-t
2017-06-06 04:07:55 -07:00
Kubernetes Submit Queue f842bb9987 Merge pull request #46802 from shyamjvs/npd-kernel-config
Automatic merge from submit-queue (batch tested with PRs 46972, 42829, 46799, 46802, 46844)

Add KernelDeadlock condition to hollow NPD

Ref https://github.com/kubernetes/kubernetes/issues/44701

/cc @wojtek-t @gmarek
2017-06-05 17:46:55 -07:00
Shyam Jeedigunta 163f1de5ed Avoid double printing output of gcloud commands in kubemark 2017-06-04 20:07:36 +02:00
Shyam Jeedigunta b655953e21 Enable DefaultTolerationSeconds and PodPreset admission plugins for kubemark 2017-06-04 19:52:57 +02:00
Clayton Coleman 4ce3907639
Add Initializers to all admission control paths by default 2017-06-02 22:09:04 -04:00
Shyam Jeedigunta 23ef37d9ce Add KernelDeadlock condition to hollow NPD 2017-06-01 22:23:28 +02:00
Sen Lu d237e54a24 Switch gcloud compute copy-files to scp 2017-05-30 10:19:33 -07:00
Shyam Jeedigunta 02092312bb Make kubemark scripts fail fast 2017-05-30 11:59:13 +02:00
Kubernetes Submit Queue 52337d5db6 Merge pull request #46502 from gmarek/run_kubemark_tests
Automatic merge from submit-queue

Fix kubemark/run-e2e-tests.sh

This should make most common arguments work.

cc @shyamjvs
2017-05-29 09:35:01 -07:00
gmarek 0ca6aeb95c Fix kubemark/run-e2e-tests.sh 2017-05-29 15:20:54 +02:00
Shyam Jeedigunta b72cbc074c chmod +x kubemark scripts 2017-05-26 22:03:12 +02:00
Kubernetes Submit Queue c60bc53921 Merge pull request #46434 from shyamjvs/kubemark-config-upload
Automatic merge from submit-queue (batch tested with PRs 46124, 46434, 46089, 45589, 46045)

Copy kubeconfig to kubemark master

This should save the effort of digging through jenkins agent and its container to get the kubeconfig.
Ideally we should have kubectl directly working on the kubemark master, but I'm facing some issues due to older version of kubectl present by default on the node.

cc @wojtek-t @gmarek
2017-05-25 21:39:59 -07:00
Shyam Jeedigunta 8f2b4c3b33 Copy kubeconfig to kubemark master 2017-05-25 14:55:28 +02:00
gmarek 2437cf4d59 fix type in start-kubemark 2017-05-25 11:48:01 +02:00
Kubernetes Submit Queue 1e2105808b Merge pull request #45136 from vishh/cos-nvidia-driver-install
Automatic merge from submit-queue

Enable "kick the tires" support for Nvidia GPUs in COS

This PR provides an installation daemonset that will install Nvidia CUDA drivers on Google Container Optimized OS (COS).
User space libraries and debug utilities from the Nvidia driver installation are made available on the host in a special directory on the host -
* `/home/kubernetes/bin/nvidia/lib` for libraries
*  `/home/kubernetes/bin/nvidia/bin` for debug utilities

Containers that run CUDA applications on COS are expected to consume the libraries and debug utilities (if necessary) from the host directories using `HostPath` volumes.

Note: This solution requires updating Pod Spec across distros. This is a known issue and will be addressed in the future. Until then CUDA workloads will not be portable.

This PR updates the COS base image version to m59. This is coupled with this PR for the following reasons:
1. Driver installation requires disabling a kernel feature in COS. 
2. The kernel API for disabling this interface changed across COS versions
3. If the COS image update is not handled in this PR, then a subsequent COS image update will break GPU integration and will require an update to the installation scripts in this PR.
4. Instead of having to post `3` PRs, one each for adding the basic installer, updating COS to m59, and then updating the installer again, this PR combines all the changes to reduce review overhead and latency, and additional noise that will be created when GPU tests break.

**Try out this PR**
1. Get Quota for GPUs in any region
2. `export `KUBE_GCE_ZONE=<zone-with-gpus>` KUBE_NODE_OS_DISTRIBUTION=gci`
3. `NODE_ACCELERATORS="type=nvidia-tesla-k80,count=1" cluster/kube-up.sh`
4. `kubectl create -f cluster/gce/gci/nvidia-gpus/cos-installer-daemonset.yaml`
5. Run your CUDA app in a pod.

**Another option is to run a e2e manually to try out this PR**
1. Get Quota for GPUs in any region
2. export `KUBE_GCE_ZONE=<zone-with-gpus>` KUBE_NODE_OS_DISTRIBUTION=gci
3. `NODE_ACCELERATORS="type=nvidia-tesla-k80,count=1"`
4. `go run hack/e2e.go -- --up` 
5. `hack/ginkgo-e2e.sh --ginkgo.focus="\[Feature:GPU\]"`
The e2e will install the drivers automatically using the daemonset and then run test workloads to validate driver integration.

TODO:
- [x] Update COS image version to m59 release.
- [x] Remove sleep from the install script and add it to the daemonset
- [x] Add an e2e that will run the daemonset and run a sample CUDA app on COS clusters.
- [x] Setup a test project with necessary quota to run GPU tests against HEAD to start with https://github.com/kubernetes/test-infra/pull/2759
- [x] Update node e2e serial configs to install nvidia drivers on COS by default
2017-05-23 10:46:10 -07:00
Kubernetes Submit Queue 03ba1324cf Merge pull request #46224 from gmarek/kubemark_heapster
Automatic merge from submit-queue (batch tested with PRs 46133, 46211, 46224, 46205, 45910)

Make CPU request for heapster in kubemark scale with the number of Nodes
2017-05-22 15:50:03 -07:00
gmarek 27fc7be396 Make CPU request for heapster in kubemark scale with the number of Nodes 2017-05-22 16:20:27 +02:00
Vishnu kannan 86b5edb79a Update COS version to m59
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-05-20 21:17:19 -07:00
Shyam Jeedigunta 360054a75f Add script to dump kubemark master logs 2017-05-20 13:12:38 +02:00
Kubernetes Submit Queue a1c2db2fec Merge pull request #45950 from shyamjvs/revert-proxier
Automatic merge from submit-queue

Make real proxier in hollow-proxy optional (default=true)

Ref https://github.com/kubernetes/kubernetes/pull/45622
This allows using real proxier for hollow proxy, but we use the fake one by default.

cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek
2017-05-18 07:55:09 -07:00
Shyam Jeedigunta 804a4f558c Make usage of real proxier in hollow-proxy optional (default=true) 2017-05-18 14:30:12 +02:00
Michael Taufen 2ee2ec5e21 Remove the deprecated --babysit-daemons kubelet flag 2017-05-17 09:08:57 -07:00
Shyam Jeedigunta 87cde074f8 Minor fix in run-gcloud-compute-with-retries output piping 2017-05-15 13:39:10 +02:00
Shyam Jeedigunta 0f1d5e6e36 Remove kubemark.sh as we don't use pod IP from it anymore 2017-05-12 13:47:13 +02:00
Kubernetes Submit Queue e939019900 Merge pull request #45604 from shyamjvs/start-km-master-fix
Automatic merge from submit-queue (batch tested with PRs 45569, 45602, 45604, 45478, 45550)

Minor bug fix in start-kubemark-master script

cc @wojtek-t @gmarek
2017-05-10 21:34:41 -07:00
Shyam Jeedigunta 1078e9580c Minor bug fix in start-kubemark-master script 2017-05-10 19:51:14 +02:00
Shyam Jeedigunta 1fc831e0ec Fix bug in hollow-node deletion in stop-kubemark script 2017-05-10 12:57:43 +02:00
Shyam Jeedigunta 0759289dcf Stream output of run-gcloud-compute-with-retries to stdout in realtime 2017-05-09 13:44:48 +02:00
Shyam Jeedigunta 2e800eef20 Fix add-metadata command for kubemark master 2017-05-08 20:44:20 +02:00
Shyam Jeedigunta efc84378b8 Fix gcloud retries cmd to rightly capture return code 2017-05-08 19:34:26 +02:00
Shyam Jeedigunta 395d3bf3b4 Move hollow-node's initContainer from annotation to field 2017-05-04 11:41:33 +02:00
Dan Williams b3705b6e35 hack/cluster: consolidate cluster/ utils to hack/lib/util.sh
Per Clayton's suggestion, move stuff from cluster/lib/util.sh to
hack/lib/util.sh.  Also consolidate ensure-temp-dir and use the
hack/lib/util.sh implementation rather than cluster/common.sh.
2017-03-30 22:34:46 -05:00
Kubernetes Submit Queue 4f606b9c8d Merge pull request #42820 from MrHohn/addon-kubemark-v6.4-beta.1
Automatic merge from submit-queue (batch tested with PRs 42672, 42770, 42818, 42820, 40849)

kubemark test: Bump addon-manager to v6.4-beta.1

Follow up PR of #42760. This PR bumps addon-manager to v6.4-beta.1 for kubemark test.

**Release note**:

```release-note
NONE
```
2017-03-25 14:27:27 -07:00
Piotr Szczesniak 69fd7aafd0 Bumped Heapster to v1.3.0 2017-03-17 15:45:52 +01:00
Random-Liu c4b3fd4e63 Update npd to the official v0.3.0 release. 2017-03-15 14:26:12 -07:00
Zihong Zheng 34b8d008ec kubemark test: Bump addon-manager to v6.4-beta.1 2017-03-09 10:13:07 -08:00
Kubernetes Submit Queue c6d9d9c5ad Merge pull request #42456 from Random-Liu/update-npd-in-kubemark
Automatic merge from submit-queue (batch tested with PRs 42456, 42457, 42414, 42480, 42370)

Update npd in kubemark since #42201 is merged.

Revert https://github.com/kubernetes/kubernetes/pull/41716.

#42201 has been merged, and #41713 is fixed. Now we could retry update npd in kubemark.

/cc @shyamjvs @wojtek-t @dchen1107
2017-03-04 00:17:40 -08:00
Random-Liu 3f30532b0f Update npd in kubemark since #42201 is merged. 2017-03-02 16:29:24 -08:00
gmarek 30b9490d66 Add alsologtostderr flag to hollow node 2017-03-03 01:29:02 +01:00
Kubernetes Submit Queue db5e85af5f Merge pull request #41980 from shyamjvs/one-more-time
Automatic merge from submit-queue (batch tested with PRs 41980, 42192, 42223, 41822, 42048)

Modified kubemark startup scripts to restore master on reboot

Fixes #41735 

As discussed in the issue, modified the scripts to satisfy the conditions of restoring master env, running non-idempotent operations only for the first time and persist important data like pki/auth files on a PD.
Also attached `start-kubemark-master.sh` as startup-script metadata to master instance (on GCE) so that it is called automatically on each boot.

cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek
2017-03-02 00:59:13 -08:00
Shyam JVS ab78b20bc1 Make kubemark hollow node logging verbosity configurable 2017-03-01 20:24:30 +01:00
Kubernetes Submit Queue 32d59cbb2f Merge pull request #42201 from shyamjvs/inotify-limit
Automatic merge from submit-queue (batch tested with PRs 42316, 41618, 42201, 42113, 42191)

[Kubemark] Add init container in hollow node for setting inotify limit of node to 200

Fixes #41713 

Along with adding the init container, I also changed the manifest to a yaml as otherwise the entire init container annotation would have to be in a single line (with escaped characters), as json doesn't allow multi-line strings.

cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek @Random-Liu
2017-03-01 07:48:20 -08:00
Kubernetes Submit Queue 1a35155025 Merge pull request #41973 from wojtek-t/build_non_alpha_3_0_17_etcd_image
Automatic merge from submit-queue (batch tested with PRs 42162, 41973, 42015, 42115, 41923)

Release 3.0.17 etcd image
2017-02-28 22:05:59 -08:00
Shyam Jeedigunta 4574900634 Modified kubemark startup scripts to restore master on reboots 2017-02-28 19:51:00 +01:00
Kubernetes Submit Queue dac0296f0b Merge pull request #42093 from liggitt/avoid-fake-node-names
Automatic merge from submit-queue (batch tested with PRs 40746, 41699, 42108, 42174, 42093)

Avoid fake node names in user info

Node usernames should follow the format `system:node:<node-name>`,
but if we don't know the node name, it's worse to put a fake one in.

In the future, we plan to have a dedicated node authorizer, which would
start rejecting requests from a user with a bogus node name like this.

The right approach is to either mint correct credentials per node, or use node bootstrapping so it requests a correct client certificate itself.
2017-02-28 07:51:33 -08:00
Shyam JVS 75e602ca28 Convert hollow-node manifest to yaml and add init container for setting inotify limit 2017-02-28 00:53:36 +01:00
Wojciech Tyczynski 74266e0dc0 Release 3.0.17 etcd image 2017-02-27 16:23:44 +01:00
Kubernetes Submit Queue 61a2bd64a2 Merge pull request #42054 from fejta/kubemark
Automatic merge from submit-queue (batch tested with PRs 41962, 42055, 42062, 42019, 42054)

Update flag to --check-version-skew instead of --check_version_skew

https://github.com/kubernetes/test-infra/issues/2012

Also add a `--` to send the flags to kubetest without triggering a warning.
2017-02-27 00:17:00 -08:00
Jordan Liggitt 34ac0dc302
Avoid fake node names in user info 2017-02-25 02:09:55 -05:00
Zihong Zheng 64ba52ae71 Bumps addon-manager to v6.4-alpha.3 and updates template files 2017-02-24 16:52:31 -08:00
Erick Fejta db5a355336 Update flag to --check-version-skew instead of --check_version_skew 2017-02-24 07:49:55 -08:00
Kubernetes Submit Queue ac293b857c Merge pull request #41858 from shyamjvs/npd-logs
Automatic merge from submit-queue (batch tested with PRs 38702, 41810, 41778, 41858, 41872)

[Kubemark] Fixed hollow-npd container command to log to file

Fixes #41802 

cc @wojtek-t @gmarek @Random-Liu
2017-02-23 07:54:40 -08:00
Wojciech Tyczynski b70e392161 Update clusters to use 3.0.17 etcd 2017-02-23 10:08:50 +01:00
Kubernetes Submit Queue fe34705f8a Merge pull request #41587 from MrHohn/addon-manager-fix-hpa
Automatic merge from submit-queue (batch tested with PRs 41349, 41532, 41256, 41587, 41657)

Update kubectl in addon-manager to use HPA in autoscaling/v1

Addon-manager is broken since HPA objects were removed from extensions api group.

Came across the logs from [the latest addon-manager on Jenkins](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce/4290/artifacts/bootstrap-e2e-master/kube-addon-manager.log):
```
INFO: == Entering periodical apply loop at 2017-02-16T17:33:37+0000 ==
error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource
WRN: == Failed to execute /usr/local/bin/kubectl  apply --namespace=kube-system -f /etc/kubernetes/addons     --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:38+0000. 2 tries remaining. ==
error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource
WRN: == Failed to execute /usr/local/bin/kubectl  apply --namespace=kube-system -f /etc/kubernetes/addons     --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:46+0000. 1 tries remaining. ==
error: error pruning namespaced object extensions/v1beta1, Kind=HorizontalPodAutoscaler: the server could not find the requested resource
WRN: == Failed to execute /usr/local/bin/kubectl  apply --namespace=kube-system -f /etc/kubernetes/addons     --prune=true -l kubernetes.io/cluster-service=true --recursive >/dev/null at 2017-02-16T17:33:53+0000. 0 tries remaining. ==
WRN: == Kubernetes addon update completed with errors at 2017-02-16T17:33:58+0000 ==
```

And notice this commit (f66679a4e9) came in two weeks ago, which removed HorizontalPodAutoscaler from extensions/v1beta1.

Addon-manager is now partially functioning that it could successfully create and update addons, but will fail to prune objects, which means upgrade tests may mostly fail.

Pushed another version of addon-manager with kubectl v1.6.0-alpha.2 ([release 2 days ago](https://github.com/kubernetes/kubernetes/releases/tag/v1.6.0-alpha.2)) for fixing, including below images:
- gcr.io/google-containers/kube-addon-manager:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-amd64:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-arm:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-arm64:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-ppc64le:v6.4-alpha.2
- gcr.io/google-containers/kube-addon-manager-s390x:v6.4-alpha.2

@mikedanese 

cc @wojtek-t @shyamjvs
2017-02-22 08:12:46 -08:00
Wojciech Tyczynski 6d303d3329 Increase cpu for kubeproxy in kubemark in large clusters 2017-02-22 08:44:34 +01:00
Shyam Jeedigunta f40b5eed5d [Kubemark] Fixed hollow-npd container command to log to file 2017-02-22 02:38:38 +01:00
Kubernetes Submit Queue 70c9eebd21 Merge pull request #41739 from shyamjvs/hollow-node-logs
Automatic merge from submit-queue (batch tested with PRs 41706, 39063, 41330, 41739, 41576)

[Kubemark] Add option to log hollow-node logs

Ref https://github.com/kubernetes/kubernetes/issues/41613

Added an option to log kubemark hollow-node logs which includes kubelet, kubeproxy and npd logs for each hollow-node.
Setting the env var `ENABLE_HOLLOW_NODE_LOGS=true` should now enable logging for tests.

cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek @yujuhong @Random-Liu
2017-02-21 02:24:43 -08:00
Zihong Zheng 2c8e89820a Update kubectl in addon-manager to use HPA in autoscaling/v1 instead of extensions/v1beta1 2017-02-20 10:49:10 -08:00
Kubernetes Submit Queue 5fb6b91faf Merge pull request #41751 from shyamjvs/fix-kubemark-default-suite
Automatic merge from submit-queue

Fix kubemark default e2e test suite's name

Seems like the suite "[Feature:performance]" doesn't trigger tests anymore. Changed it to "[Feature:Performance]" in kubemark run-e2e-tests.sh.

cc @wojtek-t @gmarek
2017-02-20 09:27:22 -08:00
Shyam Jeedigunta 7802c82671 Fix kubemark default e2e test suite's name 2017-02-20 16:08:28 +01:00
Shyam Jeedigunta ed0ab3cd8e [Kubemark] Add option to log hollow-node logs 2017-02-20 11:52:49 +01:00
Wojciech Tyczynski 4426156aa6 More resources for hollowproxy in large kubemarks 2017-02-20 09:26:17 +01:00
Random-Liu 47fc1d684d Revert the npd change in kubemark. 2017-02-19 04:14:30 -08:00
Random-Liu cd194bd9cc Fix kubemark hollow-npd. 2017-02-18 21:01:56 -08:00
Random-Liu d40c0a7099 Add standalone npd on GCI. 2017-02-17 16:18:08 -08:00
Shyam Jeedigunta 3c9a8a3b68 Modify kubemark run-e2e-tests.sh to run right command based on if its in docker/normal environment 2017-02-17 00:51:48 +01:00
Shyam Jeedigunta 4e43de4fc2 Bump addon-manager version to v6.4-alpha.1 in kubemark 2017-02-15 20:11:31 +01:00
Jordan Liggitt d69a75d50f
Mount kubeconfig file into kube-scheduler in kubemark 2017-02-15 10:03:57 -05:00
Kubernetes Submit Queue 5cc2f73bc9 Merge pull request #41134 from shyamjvs/refactor-final-blow
Automatic merge from submit-queue (batch tested with PRs 41134, 41410, 40177, 41049, 41313)

Refactored kubemark code into provider-specific and provider-independent parts [Part-3]

Fixes #38967
Applying final part of the changes in PR #39033 (which refactored kubemark code completely). The changes included in this PR are:

- Removed `test/kubemark/common.sh` and moved relevant parts of its code to the right places in start-kubemark/stop-kubemark scripts.
- Added DOCKER_REGISTRY, PROJECT, KUBEMARK_IMAGE_MAKE_TARGET variables to `/test/kubemark/cloud-provider-config.sh` to make the kubemark image push location variable wrt provider.
- Removed get-real-pod-for-hollow-node.sh as it doesn't seem to do anything useful.

@kubernetes/sig-scalability-misc @wojtek-t @gmarek
2017-02-15 05:58:15 -08:00
Kubernetes Submit Queue e4a4fe4a89 Merge pull request #41285 from liggitt/kube-scheduler-role
Automatic merge from submit-queue (batch tested with PRs 40297, 41285, 41211, 41243, 39735)

Secure kube-scheduler

This PR:
* Adds a bootstrap `system:kube-scheduler` clusterrole
* Adds a bootstrap clusterrolebinding to the `system:kube-scheduler` user
* Sets up a kubeconfig for kube-scheduler on GCE (following the controller-manager pattern)
* Switches kube-scheduler to running with kubeconfig against secured port (salt changes, beware)
* Removes superuser permissions from kube-scheduler in local-up-cluster.sh
* Adds detailed RBAC deny logging

```release-note
On kube-up.sh clusters on GCE, kube-scheduler now contacts the API on the secured port.
```
2017-02-15 03:25:10 -08:00
Jordan Liggitt cc11d7367a
Switch kube-scheduler to secure API access 2017-02-15 01:05:42 -05:00
Jordan Liggitt 9e6a3496b4
Update rbac data to v1beta1 2017-02-14 00:50:31 -05:00
Shyam Jeedigunta 3ac0e22f62 Refactored kubemark code into provider-specific and provider-independent parts [Part-3] 2017-02-08 17:03:13 +01:00
Michael Taufen 982df56c52 Replace uses of --config with --pod-manifest-path 2017-02-07 14:32:37 -08:00
Kubernetes Submit Queue 702ac1c504 Merge pull request #40622 from shyamjvs/refactor-returns-again
Automatic merge from submit-queue (batch tested with PRs 40978, 40994, 41008, 40622)

Refactored kubemark code into provider-specific and provider-independent parts [Part-2]

Applying part of the changes of PR https://github.com/kubernetes/kubernetes/pull/39033 (which refactored kubemark code completely). The changes included in this PR are:

- Added test/kubemark/skeleton/util.sh which defines a well-commented interface that any cloud-provider should implement to run kubemark.
  This includes functions like creating the master machine instance along with its resources, remotely executing a given command on the master (like ssh), scp, deleting the master instance and its resources.
  All these functions have to be over-ridden by each cloud provider inside the file /test/kubemark/$CLOUD_PROVIDER/util.sh
- Implemented the above mentioned interface for gce in /test/kubemark/$CLOUD_PROVIDER/util.sh
- Made start- and stop- kubemark scripts (almost) provider independent by making them source the interface based on cloud provider.

@kubernetes/sig-scalability-misc @wojtek-t @gmarek
2017-02-06 06:45:10 -08:00
Shyam Jeedigunta dd133769de Refactored kubemark code into provider-specific and provider-independent parts [Part-2] 2017-01-31 13:55:57 +01:00
Kubernetes Submit Queue 3a3ca50653 Merge pull request #40619 from Random-Liu/update-kubemark-npd-version
Automatic merge from submit-queue (batch tested with PRs 40132, 39302, 40194, 40619, 40601)

Update NPD version to v0.3.0-alpha.0 in kubemark.

@wojtek-t @shyamjvs Update the NPD version in kubemark.

I just built the alpha release https://github.com/kubernetes/node-problem-detector/releases/tag/v0.3.0-alpha.0.

And the PR https://github.com/kubernetes/node-problem-detector/pull/79 is included.

However, I'm not sure whether 1 minute period is longer enough.

If it's still not longer enough, in fact we can extend it by split the resync and heartbeat:
* Every 1 minute, check whether there is inconsistency between apiserver and npd, and only update when there is inconsistency. (1 GET/m)
* Every > 2 minute, do forcibly update as heartbeat. (<0.5 PATCH/m)

And I can also make the sync period configurable after we finalize the sync mechanism.
2017-01-27 18:32:26 -08:00
Random-Liu e2abfb7120 Update NPD version to v0.3.0-alpha.0 in kubemark. 2017-01-27 11:16:24 -08:00
Shyam Jeedigunta c62e5214c3 Refactored kubemark code into provider-specific and provider-independent parts [Part-1] 2017-01-26 22:54:14 +01:00
Kubernetes Submit Queue df569fd42a Merge pull request #40419 from shyamjvs/fix-heapster-eventer
Automatic merge from submit-queue (batch tested with PRs 40130, 40419, 40416)

fixing source for heapster eventer in kubemark

Fixing the out of place heapster eventer source IP.

cc @wojtek-t @gmarek
2017-01-25 07:22:58 -08:00
Shyam Jeedigunta b48de58311 Added OWNERS to kubemark subdirectories 2017-01-25 14:37:57 +01:00
Shyam Jeedigunta cad541eb0c fixing source for heapster eventer in kubemark 2017-01-25 14:16:06 +01:00
Wojciech Tyczynski fbd5c7c380 Revert "Refactored kubemark into cloud-provider independent code and GCE specific code" 2017-01-24 10:42:17 +01:00
Shyam Jeedigunta d2fadbe30f Refactored kubemark code into provider-specific and provider-independent parts 2017-01-19 15:34:13 +01:00
Kubernetes Submit Queue da7d17c8dd Merge pull request #39951 from shyamjvs/fix-kubemark-npd
Automatic merge from submit-queue (batch tested with PRs 40081, 39951)

Passing correct master address to kubemark NPD & authenticating+authorizing it with apiserver

Fixes #39245 
Fixes https://github.com/kubernetes/node-problem-detector/issues/50

Added RBAC for npd and fixed issue with the npd falling back to inClusterConfig.

cc @kubernetes/sig-scalability-misc @wojtek-t @gmarek
2017-01-19 05:01:04 -08:00
Kubernetes Submit Queue b29d9cdbcf Merge pull request #39898 from ixdy/bazel-release-tars
Automatic merge from submit-queue

Build release tars using bazel

**What this PR does / why we need it**: builds equivalents of the various kubernetes release tarballs, solely using bazel.

For example, you can now do
```console
$ make bazel-release
$ hack/e2e.go -v -up -test -down
```

**Special notes for your reviewer**: this is currently dependent on 3b29803eb5, which I have yet to turn into a pull request, since I'm still trying to figure out if this is the best approach.

Basically, the issue comes up with the way we generate the various server docker image tarfiles and load them on nodes:
* we `md5sum` the binary being encapsulated (e.g. kube-proxy) and save that to `$binary.docker_tag` in the server tarball
* we then build the docker image and tag using that md5sum (e.g. `gcr.io/google_containers/kube-proxy:$MD5SUM`)
* we `docker save` this image, which embeds the full tag in the `$binary.tar` file.
* on cluster startup, we `docker load` these tarballs, which are loaded with the tag that we'd created at build time. the nodes then use the `$binary.docker_tag` file to find the right image.

With the current bazel `docker_build` rule, the tag isn't saved in the docker image tar, so the node is unable to find the image after `docker load`ing it.

My changes to the rule save the tag in the docker image tar, though I don't know if there are subtle issues with it. (Maybe we want to only tag when `--stamp` is given?)

Also, the docker images produced by bazel have the timestamp set to the unix epoch, which is not great for debugging. Might be another thing to change with a `--stamp`.

Long story short, we probably need to follow up with bazel folks on the best way to solve this problem.

**Release note**:

```release-note
NONE
```
2017-01-18 14:24:48 -08:00
Shyam Jeedigunta cc78a3f428 Passing correct master address to kubemark NPD & authenticating+authorizing it with apiserver 2017-01-18 18:23:23 +01:00
Kubernetes Submit Queue 6dfe5c49f6 Merge pull request #38865 from vwfs/ext4_no_lazy_init
Automatic merge from submit-queue

Enable lazy initialization of ext3/ext4 filesystems

**What this PR does / why we need it**: It enables lazy inode table and journal initialization in ext3 and ext4.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #30752, fixes #30240

**Release note**:
```release-note
Enable lazy inode table and journal initialization for ext3 and ext4
```

**Special notes for your reviewer**:
This PR removes the extended options to mkfs.ext3/mkfs.ext4, so that the defaults (enabled) for lazy initialization are used.

These extended options come from a script that was historically located at */usr/share/google/safe_format_and_mount* and later ported to GO so this dependency to the script could be removed. After some search, I found the original script here: https://github.com/GoogleCloudPlatform/compute-image-packages/blob/legacy/google-startup-scripts/usr/share/google/safe_format_and_mount

Checking the history of this script, I found the commit [Disable lazy init of inode table and journal.](4d7346f7f5). This one introduces the extended flags with this description:
```
Now that discard with guaranteed zeroing is supported by PD,
initializing them is really fast and prevents perf from being affected
when the filesystem is first mounted.
```

The problem is, that this is not true for all cloud providers and all disk types, e.g. Azure and AWS. I only tested with magnetic disks on Azure and AWS, so maybe it's different for SSDs on these cloud providers. The result is that this performance optimization dramatically increases the time needed to format a disk in such cases.

When mkfs.ext4 is told to not lazily initialize the inode tables and the check for guaranteed zeroing on discard fails, it falls back to a very naive implementation that simply loops and writes zeroed buffers to the disk. Performance on this highly depends on free memory and also uses up all this free memory for write caching, reducing performance of everything else in the system. 

As of https://github.com/kubernetes/kubernetes/issues/30752, there is also something inside kubelet that somehow degrades performance of all this. It's however not exactly known what it is but I'd assume it has something to do with cgroups throttling IO or memory. 

I checked the kernel code for lazy inode table initialization. The nice thing is, that the kernel also does the guaranteed zeroing on discard check. If it is guaranteed, the kernel uses discard for the lazy initialization, which should finish in a just few seconds. If it is not guaranteed, it falls back to using *bio*s, which does not require the use of the write cache. The result is, that free memory is not required and not touched, thus performance is maxed and the system does not suffer.

As the original reason for disabling lazy init was a performance optimization and the kernel already does this optimization by default (and in a much better way), I'd suggest to completely remove these flags and rely on the kernel to do it in the best way.
2017-01-18 09:09:52 -08:00
Shyam Jeedigunta 9b0d8b9747 Added RBAC for heapster in kubemark 2017-01-18 13:47:08 +01:00
Shyam Jeedigunta 491c26feca Fix RBAC role for kube-proxy in Kubemark 2017-01-17 11:39:00 +01:00
Jeff Grafton bc4b6ac397 Build release tarballs in bazel and add `make bazel-release` rule 2017-01-13 16:17:44 -08:00
Aleksandra Malinowska 043e809b8f update heapster version to 1.3.0-beta.0 2017-01-12 13:42:31 +01:00
Shyam Jeedigunta 312e2f85a6 run-gcloud-compute-with-retries in kubemark handles resource already exists case 2017-01-06 23:21:15 +01:00
Kubernetes Submit Queue 85ad3045be Merge pull request #39349 from shyamjvs/rbac-for-kubemark
Automatic merge from submit-queue

Updated kubemark with RBAC for controllers, proxy and kubelet

Fixes issue #39244 

@kubernetes/sig-scalability-misc @wojtek-t @gmarek
2017-01-06 13:42:54 -08:00
Shyam Jeedigunta ce8c207328 Updated kubemark with RBAC for controller-manager, kubecfg, kubelet and proxy 2017-01-06 08:54:54 +01:00
Kubernetes Submit Queue f4a8713088 Merge pull request #36229 from wojtek-t/bump_etcd_version
Automatic merge from submit-queue (batch tested with PRs 36229, 39450)

Bump etcd to 3.0.14 and switch to v3 API in etcd.

Ref #20504

**Release note**:

```release-note
Switch default etcd version to 3.0.14.
Switch default storage backend flag in apiserver to `etcd3` mode.
```
2017-01-04 17:36:06 -08:00
Shyam Jeedigunta ac30fb28bd Fixing 'systemd restart docker' command in kubemark master 2016-12-21 11:46:33 +01:00
Shyam Jeedigunta 7e12fd4bfd Added 'hollow'-node-problem-detector to hollow-nodes in kubemark 2016-12-20 12:04:24 +01:00
Wojciech Tyczynski 76f115a8ee Bump etcd to 3.0.14 2016-12-20 11:57:45 +01:00