Commit Graph

75304 Commits (ca096f8069aff73b774c8ef38900dca898c61938)

Author SHA1 Message Date
Kubernetes Prow Robot 81e7858ece
Merge pull request #74501 from RA489/fixptrtofunction
Refactor etcd client function have same signatures in etcd.go
2019-02-25 09:56:37 -08:00
Kubernetes Prow Robot 3b11f95810
Merge pull request #72827 from errordeveloper/drain-pkg
Refactor most of `kubectl drain` as a library
2019-02-25 06:06:36 -08:00
Florent Delannoy e627474e8f Fix fluentd-gcp addon liveness probe
Fix three issues with the fluentd-gcp liveness probe:

h1. STUCK_THRESHOLD_SECONDS was overridden by LIVENESS_THRESHOLD_SECONDS
if defined

Probably a copy/paste issue introduced in edf1ffc074

h1. `[[` is [a bashism](https://stackoverflow.com/a/47576482), and will always failed when called with `/bin/sh`

Introduced by a844523c20

Given that we call the liveness probe with `/bin/sh`, we cannot use the
double-bracketed `[[` syntax for test, as it is not POSIX-compliant and
will throw an error.

Annoyingly, even through it prints an error, `sh` returns with exit code 0
in this case:

```bash
root@fluentd-7mprs:/# sh liveness.sh
liveness.sh: 8: liveness.sh: [[: not found
liveness.sh: 15: liveness.sh: [[: not found
root@fluentd-7mprs:/# echo $?
0
```

Which means the liveness probe is considered successful by Kubernetes,
despite failing to test things as it was intended. This is also
probably the reason why this bug wasn't reported sooner :)

Thankfully, the test in this case can just as easily be written as
POSIX-compliant as it doesn't use any bash-specific features within the
`[[` block.

h1. Buffers are transient and cannot be relied upon for monitoring

Finally, after fixing the above issue, we started seeing the fluentd
containers being restarted very often, and found an issue with the
underlying logic of the liveness probe.

The probe checks that the pod is still alive by running the following
command:

`find /var/log/fluentd-buffers -type f -newer /tmp/marker-stuck -print -quit`

This checks if any _regular_ file exists under `/var/log/fluentd-buffers`
that is more recent than a predetermined time, and will return an empty
string otherwise.

The issue is that these buffers are temporary and volatile, they get created and
deleted constantly. Here is an example of running that check every second on a
running fluentd:

```
root@fluentd-eks-playground-jdc8m:/# LIVENESS_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-300};
root@fluentd-eks-playground-jdc8m:/# STUCK_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-900};
root@fluentd-eks-playground-jdc8m:/# touch -d "${STUCK_THRESHOLD_SECONDS} seconds ago" /tmp/marker-stuck;
root@fluentd-eks-playground-jdc8m:/# touch -d "${LIVENESS_THRESHOLD_SECONDS} seconds ago" /tmp/marker-liveness;
root@fluentd-eks-playground-jdc8m:/# while true; do date ; find /var/log/fluentd-buffers -type f -newer /tmp/marker-stuck -print -quit ; sleep 1 ; done
Fri Feb 22 10:52:57 UTC 2019
Fri Feb 22 10:52:58 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964ccf4c7004103c3fa7c8533f85.log
Fri Feb 22 10:52:59 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964ccf4c7004103c3fa7c8533f85.log
Fri Feb 22 10:53:00 UTC 2019
Fri Feb 22 10:53:01 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964fb8b2eedcccd2763ea7775cc2.log
Fri Feb 22 10:53:02 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827964fb8b2eedcccd2763ea7775cc2.log
Fri Feb 22 10:53:03 UTC 2019
Fri Feb 22 10:53:04 UTC 2019
Fri Feb 22 10:53:05 UTC 2019
Fri Feb 22 10:53:06 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827965564883997b673d703af54848b.log
Fri Feb 22 10:53:07 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827965564883997b673d703af54848b.log
Fri Feb 22 10:53:08 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer/buffer.b5827965564883997b673d703af54848b.log
Fri Feb 22 10:53:09 UTC 2019
Fri Feb 22 10:53:10 UTC 2019
Fri Feb 22 10:53:11 UTC 2019
Fri Feb 22 10:53:12 UTC 2019
Fri Feb 22 10:53:13 UTC 2019
Fri Feb 22 10:53:14 UTC 2019
Fri Feb 22 10:53:15 UTC 2019
Fri Feb 22 10:53:16 UTC 2019
```

We can see buffers being created, then disappearing. The LivenessProbe running
under these conditions has a ~50% chance of failing, despite fluentd being
perfectly happy.

I believe that check is probably ok for fluentd installs using large
amounts of buffers, in which case the liveness probe will be correct more
often than not, but fluentd installs that use buffering less intensively
will be negatively impacted by this.

My solution to fix this is to check the last updated time of buffering
_folders_ within `/var/log/fluentd_buffers`. These _do_ get updated when
buffers are created, and do not get deleted as buffers are emptied,
making them the perfect candidate for our use.

Here's an example with the `-d` flag for directories:
```
root@fluentd-eks-playground-jdc8m:/# while true; do date ; find /var/log/fluentd-buffers -type d -newer /tmp/marker-stuck -print -quit ; sleep 1 ; done
Fri Feb 22 10:57:51 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:52 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:53 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:54 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:55 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:56 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:57 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:58 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:57:59 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:58:00 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:58:01 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:58:02 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
Fri Feb 22 10:58:03 UTC 2019
/var/log/fluentd-buffers/kubernetes.system.buffer
```

And example of the directory being updated as new buffers come in:
```
root@fluentd-eks-playground-jdc8m:/# ls -lah /var/log/fluentd-buffers/kubernetes.system.buffer
total 0
drwxr-xr-x 2 root root  6 Feb 22 11:17 .
drwxr-xr-x 3 root root 38 Feb 22 11:14 ..
root@fluentd-eks-playground-jdc8m:/# ls -lah /var/log/fluentd-buffers/kubernetes.system.buffer
total 16K
drwxr-xr-x 2 root root  224 Feb 22 11:18 .
drwxr-xr-x 3 root root   38 Feb 22 11:14 ..
-rw-r--r-- 1 root root 1.8K Feb 22 11:18 buffer.b58279be6e21e8b29fc333a7d50096ed0.log
-rw-r--r-- 1 root root  215 Feb 22 11:18 buffer.b58279be6e21e8b29fc333a7d50096ed0.log.meta
-rw-r--r-- 1 root root  429 Feb 22 11:18 buffer.b58279be6f09bdfe047a96486a525ece2.log
-rw-r--r-- 1 root root  195 Feb 22 11:18 buffer.b58279be6f09bdfe047a96486a525ece2.log.meta
root@fluentd-eks-playground-jdc8m:/# ls -lah /var/log/fluentd-buffers/kubernetes.system.buffer
total 0
drwxr-xr-x 2 root root  6 Feb 22 11:18 .
drwxr-xr-x 3 root root 38 Feb 22 11:14 ..
```
2019-02-25 11:48:31 +00:00
Adam Harrison c9dd2a2a45 kubectl run --quiet suppresses deletion messages
The `--quiet` option should prevent kubectl run from polluting the
output from an attached container - make it apply to the resource
deletion messages caused by `--rm`.
2019-02-25 11:10:07 +00:00
SataQiu 09ba08f8f4 fix some golint failures for pkg/apis/... 2019-02-25 18:06:08 +08:00
André Bauer 2d15ffc9cc updated to 6.5.2
Signed-off-by: André Bauer <monotek23@gmail.com>
2019-02-25 10:56:50 +01:00
André Bauer 0c29ea1a2e Update es-statefulset.yaml 2019-02-25 10:55:23 +01:00
André Bauer 53a936c359 Update Makefile 2019-02-25 10:55:23 +01:00
André Bauer 0e44fa6359 updated elasticsearch to 6.5.0 2019-02-25 10:55:23 +01:00
PingWang 88d6e89279 Fix typos
Signed-off-by: PingWang <wang.ping5@zte.com.cn>
2019-02-25 17:12:44 +08:00
danielqsj 6322025d5c fix golint failures for test/e2e/upgrades 2019-02-25 16:36:26 +08:00
Ilya Dmitrichenko 8c09a71e1d
Refactor core functionality of `kubectl drain` as a library
- structured pod filter functions
- naming improvements
  - consistent use of daemonSets and DaemonSets
  - rename field to reflect its usage
- new cordon/uncordon helper
  - use Core API client direcly instead of generic CLI runtime
2019-02-25 08:15:07 +00:00
danielqsj 7c8498ab03 fix golint failures for test/e2e/upgrades/storage 2019-02-25 15:41:31 +08:00
RA489 a0ee4b471d Refactor etcd client function have same signatures in etcd.go 2019-02-25 12:54:12 +05:30
andyzhangx 433ebe3616 fix parse devicePath issue on Azure Disk 2019-02-25 07:02:35 +00:00
Pengfei Ni 8d0c5d9727 Fix subnet annotation checking for Azure internal loadbalancer 2019-02-25 14:48:53 +08:00
Yecheng Fu 618917e210 VolumeSpec may be nil in volume reconstruction scenario 2019-02-25 13:52:21 +08:00
danielqsj 8916ccabaf fix golint failures for test/e2e/upgrades/apps 2019-02-25 13:32:15 +08:00
mattjmcnaughton b4d086f914 Fix shellcheck for hack/verify-generated-*
All of the `hack/verify-generated-*` files now pass shellcheck and are
removed from `hack/.shellcheck_failures`.
2019-02-24 23:50:59 -05:00
mattjmcnaughton 57c51c741d Fix shellcheck for more scripts in hack
Making more of the scripts in hack pass the shellcheck linter.
2019-02-24 23:48:21 -05:00
SataQiu d357bcd2cd fix some shellcheck failures in hack 2019-02-25 11:38:56 +08:00
Clayton Coleman 7f01e23380
Ignore the sticky gid mode bit when a test is running on memory EmptyDir
While running unit tests for perf on a Kube cluster with a memory backed
emptydir as TMPDIR, TestSafeMakeDir failed with:

```
--- FAIL: TestSafeMakeDir (0.01s)
	mount_linux_test.go:661: test "directory-exists": expected permissions 20000000750, got 20020000750
```

(TMPDIR set to /tmp/volume, /tmp/volume is EmptyDir with type Memory)

The test doesn't actually care about `os.ModeSetgid`, so specifically mask it out when testing this way.
2019-02-24 17:30:37 -08:00
ducnv e11916da8e kubeadm cleanup: master -> control-plane (cont.4) 2019-02-25 08:29:19 +07:00
danielqsj 7d051e1a75 update juju shell 2019-02-24 20:46:20 +08:00
danielqsj 7e655e8666 fix shellcheck in cluster/juju 2019-02-24 20:40:59 +08:00
danielqsj f02a986081 add comments to shell function 2019-02-24 20:35:46 +08:00
danielqsj e698682a0e change a way to pass SC2164 in etcd.sh 2019-02-24 20:26:59 +08:00
danielqsj c215966d22 fix shellcheck failure in etcd shell 2019-02-24 20:19:50 +08:00
SataQiu 9cda80e836 fix shellcheck lint errors in cluster and hack scripts 2019-02-24 11:15:35 +08:00
Kubernetes Prow Robot 139a13d312
Merge pull request #74269 from moshe010/kubelet_gen_cert
Move kubelet cert generation when starting kubelet
2019-02-23 18:41:10 -08:00
Kubernetes Prow Robot ba8fcafaf8
Merge pull request #74467 from ixdy/bazel-cgo-crossbuild
bazel: create genrules to produce debs and RPMs without arch-specific names
2019-02-23 17:04:30 -08:00
Jeff Grafton a92c26d843 bazel: create genrules to produce debs and RPMs without arch-specific names 2019-02-23 15:44:34 -08:00
Kubernetes Prow Robot 5312ade3d1
Merge pull request #74457 from neolit123/fix-kubeproxy-winkernel
kubeadm: fix issue with missing kubeproxy fields in test data
2019-02-23 14:05:15 -08:00
Kubernetes Prow Robot 1cf8001e53
Merge pull request #74449 from xichengliudui/fix190223
make more of the shell pass lints
2019-02-23 12:52:34 -08:00
Kubernetes Prow Robot 6a29f8ca5f
Merge pull request #74451 from xichengliudui/fixshellcheckout190223
fix shellcheck in hack/...
2019-02-23 10:23:15 -08:00
Lubomir I. Ivanov b2cc473388 kubeadm: fix issue with missing kubeproxy fields in test data 2019-02-23 19:13:16 +02:00
Kubernetes Prow Robot 1cfaf2bdc0
Merge pull request #74454 from bart0sh/PR0064-kubeadm-1419-fix-ValidateURLs
kubeadm: fix url validation code
2019-02-23 09:09:18 -08:00
Ed Bartosh f8d235be9e kubeadm: fix url validation code
Fixed nil pointer dereference in url validation code that
caused kubeamd panic:

  panic: runtime error: invalid memory address or nil pointer dereference
  [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xa7930c]

  goroutine 1 [running]:
  kubeadm/validation.ValidateURLs(0x40000bafe0, 0x2, 0x2, 0x1, 0x40002967b0, 0x0, 0x40002967b0, 0xf302a0)
    kubeadm/validation/validation.go:324 +0xcc
  kubeadm/validation.ValidateEtcd(0x400000b490, 0x4000296720, 0x0, 0x0, 0x0)
    kubeadm/validation/validation.go:291 +0x1f0
      ...

Fixes: kubernetes/kubeadm#1419

Signed-off-by: Ed Bartosh <eduard.bartosh@intel.com>
2019-02-23 16:48:22 +01:00
Kubernetes Prow Robot 0133d14170
Merge pull request #72939 from runyontr/test-cmd-what
Test cmd what
2019-02-23 02:54:36 -08:00
Kubernetes Prow Robot 8993fbc543
Merge pull request #74328 from daixiang0/delete-blank
delete all duplicate empty blanks
2019-02-23 01:43:58 -08:00
Kubernetes Prow Robot 795ae35201
Merge pull request #74318 from cblecker/fix-swagger
Fix verify-generated-swagger-docs script
2019-02-23 01:43:48 -08:00
Kubernetes Prow Robot e6cc851fc3
Merge pull request #74270 from Huang-Wei/wei-scheduler-approver
Nominate Huang-Wei to scheduler approvers
2019-02-23 01:43:37 -08:00
Kubernetes Prow Robot 686c4912e9
Merge pull request #73930 from ixdy/bazel-cgo-crossbuild
bazel: initial support for cross-compilation
2019-02-23 01:43:27 -08:00
Kubernetes Prow Robot b5566c7818
Merge pull request #71896 from awly/client-go-keyutil
client-go: extract new keyutil package from util/cert
2019-02-23 01:43:16 -08:00
Kubernetes Prow Robot 1847c071cf
Merge pull request #74445 from msau42/fix-localssd-e2e
Fix localssd test panic
2019-02-22 23:27:53 -08:00
Kubernetes Prow Robot 1d2d2d0ab2
Merge pull request #74390 from vanduc95/cleanup-kubeadm-cont.3-20190222
kubeadm cleanup: master -> control-plane (cont.3)
2019-02-22 23:27:40 -08:00
Kubernetes Prow Robot 4938cc37d3
Merge pull request #73509 from mikedanese/cloudproviderdep
enforce that cloud providers are only linked in main or app packages
2019-02-22 21:49:31 -08:00
Kubernetes Prow Robot 0b9f13227c
Merge pull request #70302 from tallclair/authzcache
Don't cache rediculous subject access reviews
2019-02-22 21:49:21 -08:00
Jordan Liggitt e752a48a30 Explicitly set GVK when sending objects to webhooks 2019-02-23 00:19:47 -05:00
Kubernetes Prow Robot b96378c058
Merge pull request #74436 from ksubrmnn/overlay_dsr
Temporarily remove V2 API check
2019-02-22 19:19:37 -08:00