Commit Graph

17 Commits (e5f55dd9d0969af3a37e7880d9b88c9aef8c654d)

Author SHA1 Message Date
Tomoe Sugihara da23396e22 Dump Stack when docker fails on healthcheck
Send SIGUSR1 to dockerd to save stack dump of docker daemon
in order to be able to investigate why docker daemon was
unresposive to health check done by `docker ps`.

See https://github.com/moby/moby/blob/master/daemon/daemon.go on
how docker sets up a trap for SIGUSR1 with `setupDumpStackTrap()`
2018-05-21 11:39:59 +09:00
Kubernetes Submit Queue 7b8bb6e7d3
Merge pull request #63357 from Random-Liu/install-and-use-crictl
Automatic merge from submit-queue (batch tested with PRs 63167, 63357). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Install and use crictl in gce kube-up.sh

Download and use crictl in gce kube-up.sh.

This PR:
1. Downloads crictl `v1.0.0-beta.0` onto the node, which supports CRI v1alpha2. We'll upgrade it to `v1.0.0-beta.1` soon after the release is cut.
2. Change `kube-docker-monitor` to `kube-container-runtime-monitor`, and let it use `crictl` to do health monitoring.
3. Change `e2e-image-puller` to use `crictl`. Because of https://github.com/kubernetes/kubernetes/issues/63355, it doesn't work now. But in `crictl v1.0.0-beta.1`, we are going to statically link it, and the `e2e-image-puller` should work again.
4. Use `systemctl kill --kill-who=main` instead of `pkill`, the reason is that:
  a. `pkill docker` will send `SIGTERM` to all processes including `dockerd`, `docker-containerd`, `docker-containerd-shim`. This is not a problem for Docker 17.03 CE, because `containerd-shim` in containerd 0.2.x doesn't exit with SIGERM (see [code](https://github.com/containerd/containerd/blob/v0.2.x/containerd-shim/main.go#L123)). However, `containerd-shim` in containerd 1.0+ does exit with SIGTERM (see [code](https://github.com/containerd/containerd/blob/master/cmd/containerd-shim/main_unix.go#L200)). This means that `pkill docker` and `pkill containerd` will kill all shim processes for Docker 17.11+ and containerd 1.0+.
  b. We can use `pkill -x` instead. However, docker systemd service name is `docker`, but daemon process name is `dockerd`. We have to introduce another environment variable to specify "daemon process name". Given so, it seems easier to just use `systemctl kill` which only requires systemd service name. `systemctl kill --kill-who=main` will make sure only main process receives SIGTERM.

Signed-off-by: Lantao Liu <lantaol@google.com>

/cc @filbranden @yujuhong @feiskyer @mrunalp @kubernetes/sig-node-pr-reviews @kubernetes/sig-cluster-lifecycle-pr-reviews 

**Release note**:

```release-note
Kubernetes cluster on GCE have crictl installed now. Users can use it to help debug their node. The documentation of crictl can be found https://github.com/kubernetes-incubator/cri-tools/blob/master/docs/crictl.md.
```
2018-05-15 21:18:12 -07:00
Lantao Liu f952b093a7 Still use `docker ps` for docker health monitoring.
Signed-off-by: Lantao Liu <lantaol@google.com>
2018-05-15 00:42:25 -07:00
Lantao Liu d94a2b39d9 Install and use crictl in gce kube-up.sh
Signed-off-by: Lantao Liu <lantaol@google.com>
2018-05-03 17:17:55 -07:00
Matthias Bertschy 9b15af19b2 Update all script to use /usr/bin/env bash in shebang 2018-04-19 13:20:13 +02:00
Dawn Chen fe36fdde6c Increase waiting time (120s) for docker startup in health-monitor.sh 2017-10-17 15:31:15 -07:00
Yu-Ju Hong d3e24e1085 Fix the output of health-mointor.sh
The script show prints the errors/response of the health check, but not
show the progress of `curl`.
2017-02-15 18:08:27 -08:00
Kubernetes Submit Queue ebc8e40694 Merge pull request #39691 from yujuhong/bump_timeout
Automatic merge from submit-queue (batch tested with PRs 39694, 39383, 39651, 39691, 39497)

Bump container-linux and gci timeout for docker health check

The command `docker ps` can take longer time to respond under heavy load or
when encountering some known issues. In these cases, the containers are running
fine, so aggressive health check could cause serious disruption. Bump the
timeout to 60s to be consistent with the debian-based containerVM.

This addresses #38588
2017-01-10 21:25:16 -08:00
Yu-Ju Hong 4e87973a9b Bump container-linux and gci timeout for docker health check
The command `docker ps` can take longer time to respond under heavy load or
when encountering some known issues. In these cases, the containers are running
fine, so aggressive health check could cause serious disruption. Bump the
timeout to 60s to be consistent with the debian-based containerVM.
2017-01-10 13:07:21 -08:00
CJ Cullen d0997a3d1f Generate a kubelet CA and kube-apiserver cert-pair for kubelet auth.
Plumb through to kubelet/kube-apiserver on gci & cvm.
2017-01-03 14:30:45 -08:00
David McMahon ef0c9f0c5b Remove "All rights reserved" from all the headers. 2016-06-29 17:47:36 -07:00
Andy Zheng e6b744c85a Revert "Revert "GCI: add support for network plugin""
This reverts commit 8207eddd99.
2016-06-14 09:52:34 -07:00
Dawn Chen 8207eddd99 Revert "GCI: add support for network plugin" 2016-06-09 13:24:05 -07:00
Andy Zheng 66d6b43b67 GCI: add support for kubenet 2016-06-08 13:20:44 -07:00
Andy Zheng f31c4f6d69 Revert "Revert "Add support for running GCI on the GCE cloud provider""
This reverts commit 40f53b1765.
2016-05-23 00:52:08 -07:00
Daniel Smith 40f53b1765 Revert "Add support for running GCI on the GCE cloud provider" 2016-05-18 21:31:28 -07:00
Andy Zheng a737e1eba1 Add support for running GCI on the GCE cloud provider 2016-05-18 15:15:05 -07:00