- for ginkgo parallel mode pass -p to autodetect ginkgo nodes.
- disable parallel mode by default (false).
- use provider "skeleton" instead of "local".
- make run_e2e.sh pass shellcheck.
The Dockerfile has a few more options, let's surface that in the
manifest. Also use the same defaults in the manifest as the Dockerfile
itself.
Change-Id: Ib7419cf7999430db15f39ac414c80ee362fcda76
I recently lowered the timeout value to 2 minutes but this turned out to
be too aggressive for now. 3 minutes is usually long enough for the test
containers to come up.
Note that this test is only used during development, it does not run
continuously anywhere.
--hostname-override should not be necessary, we can rely on kube-proxy
to auto-detect it just like we do in Kubelet.
I also added an optional variable to kube_env, KUBEPROXY_ARGS, that we
can use to remotely extend kube-proxy's command-line if necessary.
A common issue users run into is wanting a list of tests
a certain regexp will run, without actually running it.
ginkgo supports this with the dryRun flag but it was not
exposed via the kube-conformance image. This change
will set the flag if the E2E_DRYRUN environment variable
is set.
Fixes#74727
This PR adds the support for GCEPD volume. For now, it uses a workaround
(https://github.com/pjh/gce-tools) to get disk number in windows for a given GCE PD name.
Fix three issues with the fluentd-gcp liveness probe:
h1. STUCK_THRESHOLD_SECONDS was overridden by LIVENESS_THRESHOLD_SECONDS
if defined
Probably a copy/paste issue introduced in edf1ffc074
h1. `[[` is [a bashism](https://stackoverflow.com/a/47576482), and will always failed when called with `/bin/sh`
Introduced by a844523c20
Given that we call the liveness probe with `/bin/sh`, we cannot use the
double-bracketed `[[` syntax for test, as it is not POSIX-compliant and
will throw an error.
Annoyingly, even through it prints an error, `sh` returns with exit code 0
in this case:
```bash
root@fluentd-7mprs:/# sh liveness.sh
liveness.sh: 8: liveness.sh: [[: not found
liveness.sh: 15: liveness.sh: [[: not found
root@fluentd-7mprs:/# echo $?
0
```
Which means the liveness probe is considered successful by Kubernetes,
despite failing to test things as it was intended. This is also
probably the reason why this bug wasn't reported sooner :)
Thankfully, the test in this case can just as easily be written as
POSIX-compliant as it doesn't use any bash-specific features within the
`[[` block.
h1. Buffers are transient and cannot be relied upon for monitoring
Finally, after fixing the above issue, we started seeing the fluentd
containers being restarted very often, and found an issue with the
underlying logic of the liveness probe.
The probe checks that the pod is still alive by running the following
command:
`find /var/log/fluentd-buffers -type f -newer /tmp/marker-stuck -print -quit`
This checks if any _regular_ file exists under `/var/log/fluentd-buffers`
that is more recent than a predetermined time, and will return an empty
string otherwise.
The issue is that these buffers are temporary and volatile, they get created and
deleted constantly. Here is an example of running that check every second on a
running fluentd:
```
root@fluentd-eks-playground-jdc8m:/# LIVENESS_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-300};
root@fluentd-eks-playground-jdc8m:/# STUCK_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-900};
root@fluentd-eks-playground-jdc8m:/# touch -d "${STUCK_THRESHOLD_SECONDS} seconds ago" /tmp/marker-stuck;
root@fluentd-eks-playground-jdc8m:/# touch -d "${LIVENESS_THRESHOLD_SECONDS} seconds ago" /tmp/marker-liveness;
root@fluentd-eks-playground-jdc8m:/# while true; do date ; find /var/log/fluentd-buffers -type f -newer /tmp/marker-stuck -print -quit ; sleep 1 ; done
Fri Feb 22 10:52:57 UTC 2019
Fri Feb 22 10:52:58 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964ccf4c7004103c3fa7c8533f85.log
Fri Feb 22 10:52:59 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964ccf4c7004103c3fa7c8533f85.log
Fri Feb 22 10:53:00 UTC 2019
Fri Feb 22 10:53:01 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964fb8b2eedcccd2763ea7775cc2.log
Fri Feb 22 10:53:02 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964fb8b2eedcccd2763ea7775cc2.log
Fri Feb 22 10:53:03 UTC 2019
Fri Feb 22 10:53:04 UTC 2019
Fri Feb 22 10:53:05 UTC 2019
Fri Feb 22 10:53:06 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827965564883997b673d703af54848b.log
Fri Feb 22 10:53:07 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827965564883997b673d703af54848b.log
Fri Feb 22 10:53:08 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827965564883997b673d703af54848b.log
Fri Feb 22 10:53:09 UTC 2019
Fri Feb 22 10:53:10 UTC 2019
Fri Feb 22 10:53:11 UTC 2019
Fri Feb 22 10:53:12 UTC 2019
Fri Feb 22 10:53:13 UTC 2019
Fri Feb 22 10:53:14 UTC 2019
Fri Feb 22 10:53:15 UTC 2019
Fri Feb 22 10:53:16 UTC 2019
```
We can see buffers being created, then disappearing. The LivenessProbe running
under these conditions has a ~50% chance of failing, despite fluentd being
perfectly happy.
I believe that check is probably ok for fluentd installs using large
amounts of buffers, in which case the liveness probe will be correct more
often than not, but fluentd installs that use buffering less intensively
will be negatively impacted by this.
My solution to fix this is to check the last updated time of buffering
_folders_ within `/var/log/fluentd_buffers`. These _do_ get updated when
buffers are created, and do not get deleted as buffers are emptied,
making them the perfect candidate for our use.
Here's an example with the `-d` flag for directories:
```
root@fluentd-eks-playground-jdc8m:/# while true; do date ; find /var/log/fluentd-buffers -type d -newer /tmp/marker-stuck -print -quit ; sleep 1 ; done
Fri Feb 22 10:57:51 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:52 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:53 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:54 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:55 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:56 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:57 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:58 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:59 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:58:00 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:58:01 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:58:02 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:58:03 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
```
And example of the directory being updated as new buffers come in:
```
root@fluentd-eks-playground-jdc8m:/# ls -lah /var/log/fluentd-buffers/kubernetes.system.buffer
total 0
drwxr-xr-x 2 root root 6 Feb 22 11:17 .
drwxr-xr-x 3 root root 38 Feb 22 11:14 ..
root@fluentd-eks-playground-jdc8m:/# ls -lah /var/log/fluentd-buffers/kubernetes.system.buffer
total 16K
drwxr-xr-x 2 root root 224 Feb 22 11:18 .
drwxr-xr-x 3 root root 38 Feb 22 11:14 ..
-rw-r--r-- 1 root root 1.8K Feb 22 11:18 buffer.b58279be6e21e8b29fc333a7d50096ed0.log
-rw-r--r-- 1 root root 215 Feb 22 11:18 buffer.b58279be6e21e8b29fc333a7d50096ed0.log.meta
-rw-r--r-- 1 root root 429 Feb 22 11:18 buffer.b58279be6f09bdfe047a96486a525ece2.log
-rw-r--r-- 1 root root 195 Feb 22 11:18 buffer.b58279be6f09bdfe047a96486a525ece2.log.meta
root@fluentd-eks-playground-jdc8m:/# ls -lah /var/log/fluentd-buffers/kubernetes.system.buffer
total 0
drwxr-xr-x 2 root root 6 Feb 22 11:18 .
drwxr-xr-x 3 root root 38 Feb 22 11:14 ..
```
Also switches to the most recent 64-bit version of OpenSSH for Windows.
Tested:
PROJECT=${CLOUDSDK_CORE_PROJECT} KUBERNETES_SKIP_CONFIRM=y NUM_NODES=2 \
NUM_WINDOWS_NODES=2 KUBE_GCE_ENABLE_IP_ALIASES=true TEST_CLUSTER=true \
./cluster/kube-up.sh