Commit Graph

58 Commits (8c25b80913d54de547b763a8f9fd30d235049bb7)

Author SHA1 Message Date
Peter Hornyack 0bb25290c8 Update log-dump.sh for Windows nodes.
Tested:
```
$ PROJECT=${CLOUDSDK_CORE_PROJECT} KUBERNETES_SKIP_CONFIRM=y NUM_NODES=2 \
  NUM_WINDOWS_NODES=2 KUBE_GCE_ENABLE_IP_ALIASES=true go run \
  ./hack/e2e.go -- --up
$ cluster/log-dump/log-dump.sh
$ ls _artifacts
```

And with: NUM_NODES=2 NUM_WINDOWS_NODES=0; NUM_NODES=0 NUM_WINDOWS_NODES=2
2019-02-26 12:10:19 -08:00
Kubernetes Prow Robot 16cbb6b965
Merge pull request #73848 from krzysied/logexporter_custom_fix
Handling for use_custom_instance_list in dump_nodes_with_logexporter
2019-02-18 15:23:48 -08:00
Matt Matejczyk d7d46013cb Start using new version of logexporter. 2019-02-13 08:52:04 +01:00
Krzysztof Siedlecki bc42602024 adding handling for use_custom_instance_list in dump_nodes_with_logexporter 2019-02-08 14:02:06 +01:00
Matt Matejczyk 5e6171790b Propagate dump_systemd_journal to logexporter job.
Log exporter changes have been made in
https://github.com/kubernetes/test-infra/pull/11121 and new version has
been pushed in https://github.com/kubernetes/test-infra/pull/11149
2019-02-06 15:49:29 +01:00
Matt Matejczyk 35543f8989 Allow dumping full systemd journal in log-dump.sh.
The feature is gated behind a newly introduced 'dump-systemd-journal' flag.
We want to dump the full systemd journal in our scalability performance tests.
2019-02-03 21:28:37 +01:00
Jordan Liggitt cc680273e8 Change add-on manifests to apps/v1 2018-12-19 17:30:59 -05:00
Maciej Borsz 2aee491bf8 Fix detect_node_failures for gke 2018-12-19 08:14:22 +01:00
Maciej Borsz 325511d0ab Check if INSTANCE_GROUPS is empty in detect_node_failures. 2018-12-18 11:59:11 +01:00
Maciej Borsz 8e879db938 Revert "Revert "Check for hostError and automaticRestart when test finishes.""
This reverts commit 047aa25484.
2018-12-18 11:57:03 +01:00
Maciej Borsz 047aa25484
Revert "Check for hostError and automaticRestart when test finishes." 2018-11-30 17:55:27 +01:00
Maciej Borsz 0514aa17a6 Check for hostError and automaticRestart when test finishes. 2018-11-27 15:13:56 +01:00
Katharine Berry 3578696846 DRY 2018-09-06 16:54:13 -07:00
Katharine Berry ed0f3f5d3c Don't bother dumping coverage info if it won't exist. 2018-09-06 16:24:32 -07:00
Katharine Berry e17499c8e6 Include coverage information when dumping logs. 2018-09-06 16:24:32 -07:00
Shyam Jeedigunta 898fb4c936 Bump logexporter version 2018-08-30 12:13:31 +02:00
Kubernetes Submit Queue d67a03183a
Merge pull request #67687 from Lion-Wei/remote-reschrduler
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

remove rescheduler since scheduling DS pods by default scheduler is moving to beta

**What this PR does / why we need it**:

remove rescheduler since scheduling DS pods by default scheduler is moving to beta

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #64725

**Special notes for your reviewer**:

**Release note**:
```release-note
Remove rescheduler since scheduling DS pods by default scheduler is moving to beta.
```
2018-08-23 12:32:17 -07:00
liangwei 5ea138f4e9 remove rescheduler 2018-08-22 11:49:14 +08:00
Maciej Borsz 598be75757 Store logs from 'logexporter' to allow debugging it. 2018-08-14 15:43:32 +02:00
Maciej Borsz 496c2cd1bb Use gcr.io/k8s-testimages/logexporter:v0.1.2. 2018-08-09 13:23:34 +02:00
wojtekt 0316faba9d Fix dumping logs with logexporter 2018-07-02 15:24:25 +02:00
Kubernetes Submit Queue 624dec20c0
Merge pull request #65139 from wojtek-t/fix_logexporter
Automatic merge from submit-queue (batch tested with PRs 65123, 65176, 65139, 65084, 65056). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Periodically fetch logexported nodes instead of sleeping
2018-06-21 16:56:13 -07:00
wojtekt 43d217f904 Periodically fetch logexported nodes instead of sleeping 2018-06-18 14:29:14 +02:00
Shyam Jeedigunta 87225c0b9a Increase logexporter timeout and add debug logs 2018-06-12 16:30:04 +02:00
RaviSantosh Gudimetla 872addf9e3
Revert "Remove rescheduler and corresponding tests from master" 2018-05-31 22:18:49 -04:00
ravisantoshgudimetla aeccffc339 Phase out rescheduler in favor of priority and preemption 2018-05-29 19:52:06 -04:00
Kubernetes Submit Queue 7b8bb6e7d3
Merge pull request #63357 from Random-Liu/install-and-use-crictl
Automatic merge from submit-queue (batch tested with PRs 63167, 63357). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Install and use crictl in gce kube-up.sh

Download and use crictl in gce kube-up.sh.

This PR:
1. Downloads crictl `v1.0.0-beta.0` onto the node, which supports CRI v1alpha2. We'll upgrade it to `v1.0.0-beta.1` soon after the release is cut.
2. Change `kube-docker-monitor` to `kube-container-runtime-monitor`, and let it use `crictl` to do health monitoring.
3. Change `e2e-image-puller` to use `crictl`. Because of https://github.com/kubernetes/kubernetes/issues/63355, it doesn't work now. But in `crictl v1.0.0-beta.1`, we are going to statically link it, and the `e2e-image-puller` should work again.
4. Use `systemctl kill --kill-who=main` instead of `pkill`, the reason is that:
  a. `pkill docker` will send `SIGTERM` to all processes including `dockerd`, `docker-containerd`, `docker-containerd-shim`. This is not a problem for Docker 17.03 CE, because `containerd-shim` in containerd 0.2.x doesn't exit with SIGERM (see [code](https://github.com/containerd/containerd/blob/v0.2.x/containerd-shim/main.go#L123)). However, `containerd-shim` in containerd 1.0+ does exit with SIGTERM (see [code](https://github.com/containerd/containerd/blob/master/cmd/containerd-shim/main_unix.go#L200)). This means that `pkill docker` and `pkill containerd` will kill all shim processes for Docker 17.11+ and containerd 1.0+.
  b. We can use `pkill -x` instead. However, docker systemd service name is `docker`, but daemon process name is `dockerd`. We have to introduce another environment variable to specify "daemon process name". Given so, it seems easier to just use `systemctl kill` which only requires systemd service name. `systemctl kill --kill-who=main` will make sure only main process receives SIGTERM.

Signed-off-by: Lantao Liu <lantaol@google.com>

/cc @filbranden @yujuhong @feiskyer @mrunalp @kubernetes/sig-node-pr-reviews @kubernetes/sig-cluster-lifecycle-pr-reviews 

**Release note**:

```release-note
Kubernetes cluster on GCE have crictl installed now. Users can use it to help debug their node. The documentation of crictl can be found https://github.com/kubernetes-incubator/cri-tools/blob/master/docs/crictl.md.
```
2018-05-15 21:18:12 -07:00
Lantao Liu 884e08e33c Collect logs for health monitor services.
Signed-off-by: Lantao Liu <lantaol@google.com>
2018-05-03 17:18:00 -07:00
Matthias Bertschy 9b15af19b2 Update all script to use /usr/bin/env bash in shebang 2018-04-19 13:20:13 +02:00
Tim Hockin 3586986416 Switch to k8s.gcr.io vanity domain
This is the 2nd attempt.  The previous was reverted while we figured out
the regional mirrors (oops).

New plan: k8s.gcr.io is a read-only facade that auto-detects your source
region (us, eu, or asia for now) and pulls from the closest.  To publish
an image, push k8s-staging.gcr.io and it will be synced to the regionals
automatically (similar to today).  For now the staging is an alias to
gcr.io/google_containers (the legacy URL).

When we move off of google-owned projects (working on it), then we just
do a one-time sync, and change the google-internal config, and nobody
outside should notice.

We can, in parallel, change the auto-sync into a manual sync - send a PR
to "promote" something from staging, and a bot activates it.  Nice and
visible, easy to keep track of.
2018-02-07 21:14:19 -08:00
Tim Hockin e9dd8a68f6 Revert k8s.gcr.io vanity domain
This reverts commit eba5b6092a.

Fixes https://github.com/kubernetes/kubernetes/issues/57526
2017-12-22 14:36:16 -08:00
Tim Hockin eba5b6092a Use k8s.gcr.io vanity domain for container images 2017-12-18 09:18:34 -08:00
Kubernetes Submit Queue ebd3d68039
Merge pull request #55831 from Random-Liu/rename-log-dump-env
Automatic merge from submit-queue (batch tested with PRs 55392, 55491, 51914, 55831, 55836). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Rename log-dump env to `LOG_DUMP_SYSTEMD_SERVICES`.

For https://github.com/kubernetes/features/issues/286.

Rename `SYSTEMD_SERVICES` to `LOG_DUMP_SYSTEMD_SERVICES`. test-infra disables log dump in our e2e framework, and uses a different log dump logic https://github.com/kubernetes/test-infra/blob/master/kubetest/e2e.go#L480-L497. So the flags we added in https://github.com/kubernetes/kubernetes/pull/55288 will not work in test-infra.

Fortrunately, test-infra is using the same script `cluster/log-dump/log-dump.sh`, so we could still configure systemd services by setting the environment variable globally.

The original environment variable name is too general for setting globally, change it to a more specific name.

**Release note**:

```release-note
none
```
2017-11-17 00:18:25 -08:00
Lantao Liu 0085e2208d Rename log-dump env to `LOG_DUMP_SYSTEMD_SERVICES`. 2017-11-16 00:41:27 +00:00
Marcin Owsiany 310ab8c3c4 Do not crash on empty NODE_NAMES array. 2017-11-14 14:43:30 +01:00
Lantao Liu 32c4295bcf Support collecting log for alternative container runtime in e2e test. 2017-11-10 18:46:48 +00:00
Davanum Srinivas 9a217217c1 Fix log collection for kubeadm-gce tests
Separate out kuberenetes-anywhere provider under cluster/ but
delegate all the functionality to the "gce" one since the code
would be the same. Except for the name of the node, the
NODE_INSTANCE_PREFIX will be different, so account for that.
2017-10-26 07:57:42 -04:00
zouyee 5f9d931804 [cluster/log-dump] bump daemonset version 2017-10-24 10:30:20 +08:00
Jordan Liggitt d7699028f6
Include audit log in master log capture 2017-09-24 19:59:53 -04:00
Shyam Jeedigunta 6ae0eb8806 Fix bug with gke in logdump 2017-09-13 14:03:03 +02:00
Shyam Jeedigunta 05fcefc0df Make log-dump use 'gcloud ssh' for GKE also 2017-09-13 00:14:57 +02:00
Shyam Jeedigunta c483c13aee Correct logdump logic for kubemark master 2017-09-04 12:59:36 +02:00
Shyam Jeedigunta a31703631f Make logdump work for GKE with 'use_custom_instance_list' defined 2017-09-02 00:29:16 +02:00
Shyam Jeedigunta aac1837218 Make logdump for kubemark logs independent of KUBERNETES_PROVIDER 2017-09-01 23:56:00 +02:00
Kubernetes Submit Queue f2335d33d6 Merge pull request #50713 from MrHohn/dump-master-log-fix
Automatic merge from submit-queue (batch tested with PRs 50713, 47660, 51198, 51159, 51195)

Dump installation and configuration logs for master

**What this PR does / why we need it**:
We are dumping out empty configuration and installation logs on master, see `kube-node-configuration.log` and `kube-node-installation.log` on http://gcsweb.k8s.io/gcs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce/12818/artifacts/bootstrap-e2e-master/.

I guess it is just because [we name the services on master differently](https://github.com/kubernetes/kubernetes/blob/v1.7.3/cluster/gce/gci/master.yaml#L4-L40)?

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #NONE

**Special notes for your reviewer**:

**Release note**:

```release-note
NONE
```
2017-08-24 11:17:01 -07:00
Shyam Jeedigunta d2b6705dc8 Add some debug statements to logdump script 2017-08-23 11:51:58 +02:00
Zihong Zheng 7654e6a9d6 Dump installation and configuration logs for master 2017-08-15 13:50:02 -07:00
Shyam Jeedigunta 7456716120 Add debug logs to log-dump 2017-08-08 21:43:09 +02:00
Shyam Jeedigunta 73b419447f Don't stop log-dumping if logexporter fails 2017-08-01 17:39:50 +02:00
Shyam Jeedigunta 80084f0621 Reduce kubectl calls from O(#nodes) to O(1) in cluster logdump 2017-07-31 13:20:53 +02:00