github/k3s - k3s - https://git.xinac.net

Commit Graph

Author	SHA1	Message	Date
Kubernetes Submit Queue	a244d8a48f	Merge pull request #63130 from vikaschoudhary16/dp_e2e_alloc Automatic merge from submit-queue (batch tested with PRs 61455, 63346, 63130, 63404). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. [Device-Plugin]: Extend e2e test to cover node allocatables What this PR does / why we need it: Extends device plugin e2e to cover node allocatable Which issue(s) this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged): Fixes # Special notes for your reviewer: Release note: ```release-note None ``` /sig node /area hw-accelerators /cc @jiayingz @vishh @RenaudWasTaken	2018-05-03 14:24:10 -07:00
vikaschoudhary16	b953f852f5	[Device-Plugin]: Extend e2e test to cover node allocatables	2018-05-03 14:19:29 -04:00
Kubernetes Submit Queue	592c39bccc	Merge pull request #62541 from filbranden/cgroupname1 Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Use a []string for CgroupName, which is a more accurate internal representation What this PR does / why we need it: This is purely a refactoring and should bring no essential change in behavior. It does clarify the cgroup handling code quite a bit. It is preparation for further changes we might want to do in the cgroup hierarchy. (But it's useful on its own, so even if we don't do any, it should still be considered.) Special notes for your reviewer: The slice of strings more precisely captures the hierarchic nature of the cgroup paths we use to represent pods and their groupings. It also ensures we're reducing the chances of passing an incorrect path format to a cgroup driver that requires a different path naming, since now explicit conversions are always needed. The new constructor `NewCgroupName` starts from an existing `CgroupName`, which enforces a hierarchy where a root is always needed. It also performs checking on the component names to ensure invalid characters ("/" and "_") are not in use. A `RootCgroupName` for the top of the cgroup hierarchy tree is introduced. This refactor results in a net reduction of around 30 lines of code, mainly with the demise of ConvertCgroupNameToSystemd which had fairly complicated logic in it and was doing just too many things. There's a small TODO in a helper `updateSystemdCgroupInfo` that was introduced to make this commit possible. That logic really belongs in libcontainer, I'm planning to send a PR there to include it there. (The API already takes a field with that information, only that field is only processed in cgroupfs and not systemd driver, we should fix that.) Tested: By running the e2e-node tests on both Ubuntu 16.04 (with cgroupfs driver) and CentOS 7 (with systemd driver.) NOTE: I only tested this with dockershim, we should double-check that this works with the CRI endpoints too, both in cgroupfs and systemd modes. /assign @derekwaynecarr /assign @dashpole /assign @Random-Liu Release note: ```release-note NONE ```	2018-05-03 08:16:45 -07:00
Kubernetes Submit Queue	b5f61ac129	Merge pull request #62657 from matthyx/master Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Update all script shebangs to use /usr/bin/env interpreter instead of /bin/interpreter This is required to support systems where bash doesn't reside in /bin (such as NixOS, or the *BSD family) and allow users to specify a different interpreter version through $PATH manipulation. https://www.cyberciti.biz/tips/finding-bash-perl-python-portably-using-env.html ```release-note Use /usr/bin/env in all script shebangs to increase portability. ```	2018-05-02 19:44:32 -07:00
Filipe Brandenburger	b230fb8ac4	Use a []string for CgroupName, which is a more accurate internal representation The slice of strings more precisely captures the hierarchic nature of the cgroup paths we use to represent pods and their groupings. It also ensures we're reducing the chances of passing an incorrect path format to a cgroup driver that requires a different path naming, since now explicit conversions are always needed. The new constructor NewCgroupName starts from an existing CgroupName, which enforces a hierarchy where a root is always needed. It also performs checking on the component names to ensure invalid characters ("/" and "_") are not in use. A RootCgroupName for the top of the cgroup hierarchy tree is introduced. This refactor results in a net reduction of around 30 lines of code, mainly with the demise of ConvertCgroupNameToSystemd which had fairly complicated logic in it and was doing just too many things. There's a small TODO in a helper updateSystemdCgroupInfo that was introduced to make this commit possible. That logic really belongs in libcontainer, I'm planning to send a PR there to include it there. (The API already takes a field with that information, only that field is only processed in cgroupfs and not systemd driver, we should fix that.) Tested by running the e2e-node tests on both Ubuntu 16.04 (with cgroupfs driver) and CentOS 7 (with systemd driver.)	2018-05-01 08:29:06 -07:00
Kubernetes Submit Queue	452b8c9e0d	Merge pull request #62101 from bart0sh/PR0010-e2e_node-kubelet-command-line-fix Automatic merge from submit-queue (batch tested with PRs 58474, 60034, 62101, 63198). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Fix wrong usage of kubelet option What this PR does / why we need it: "--allow-privileged true" is incorrect usage of boolean option. It means setting '--allow-priviledged' to its default value plus non-existing subcommand 'true'. "--allow-privileged false" is even more confusing as it sets allow-priviledged flag to its default value 'true' This is true for any boolean command line option. Fixed this by using correct syntax --allow-priviledged=true Special notes for your reviewer: This is a show-stopper for PR #61833 Release note: ```release-note NONE ```	2018-04-30 13:24:12 -07:00
Kubernetes Submit Queue	e01858c595	Merge pull request #63252 from liztio/e2e_node_utils Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. E2e path utils What this PR does / why we need it: A bunch of useful methods for getting k8s paths and stuff are secreted away in `e2e_node`. This PR pulls them out so they can be used in other E2E method. Which issue(s) this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged): Special notes for your reviewer: This is motivated by the upcoming kubeadm-specific E2E tests. Those tests will be added in a follow-up to this PR. Release note: ```release-note NONE ```	2018-04-27 11:43:15 -07:00
liz	1ec02b1cd5	Move path management from e2e_node to common test/utils directory enables reuse of these methods for other e2e tests	2018-04-27 11:12:10 -04:00
liz	432b542218	Generated artefacts	2018-04-27 11:11:45 -04:00
Jordan Liggitt	1bddcdcf44	Bump QPS on namespace controller https://github.com/kubernetes/kubernetes/pull/62913 switched from using a client pool, where each groupVersionResource got its own rest client, to a single client. This increases the QPS to account for increased requests using a single rest client rate limiter.	2018-04-27 10:11:14 -04:00
David Eads	3632037e60	add easy to use dynamic client	2018-04-25 08:55:26 -04:00
Kubernetes Submit Queue	44b57338d5	Merge pull request #59692 from mtaufen/dkcfg-unpack-configmaps Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. unpack dynamic kubelet config payloads to files This PR unpacks the downloaded ConfigMap to a set of files on the node. This enables other config files to ride alongside the KubeletConfiguration, and the KubeletConfiguration to refer to these cohabitants with relative paths. This PR also stops storing dynamic config metadata (e.g. current, last-known-good config records) in the same directory as config checkpoints. Instead, it splits the storage into `meta` and `checkpoints` dirs. The current store dir structure is as follows: ``` - dir named by --dynamic-config-dir (root for managing dynamic config) \| - meta (dir for metadata, e.g. which config source is currently assigned, last-known-good) \| - current (a serialized v1 NodeConfigSource object, indicating the assigned config) \| - last-known-good (a serialized v1 NodeConfigSource object, indicating the last-known-good config) \| - checkpoints (dir for config checkpoints) \| - uid1 (dir for unpacked config, identified by uid1) \| - file1 \| - file2 \| - ... \| - uid2 \| - ... ``` There are some likely changes to the above structure before dynamic config goes beta, such as renaming "current" to "assigned" for clarity, and extending the checkpoint identifier to include a resource version, as part of resolving #61643. ```release-note NONE ``` /cc @luxas @smarterclayton	2018-04-24 12:01:37 -07:00
Michael Taufen	c9d398d01e	unpack dynamic kubelet config payloads to files This PR unpacks the downloaded ConfigMap to a set of files on the node. This enables other config files to ride alongside the KubeletConfiguration, and the KubeletConfiguration to refer to these cohabitants with relative paths. This PR also stops storing dynamic config metadata (e.g. current, last-known-good config records) in the same directory as config checkpoints. Instead, it splits the storage into `meta` and `checkpoints` dirs.	2018-04-19 09:18:53 -07:00
Matthias Bertschy	9b15af19b2	Update all script to use /usr/bin/env bash in shebang	2018-04-19 13:20:13 +02:00
Kubernetes Submit Queue	dd8f8819e4	Merge pull request #62768 from krzyzacy/clean-up-jenkins Automatic merge from submit-queue (batch tested with PRs 62445, 62768, 60633). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. clean up *.properties files ref https://github.com/kubernetes/kubernetes/issues/62754 to double check, is any of the node config yaml files are still being used outside of CI? I'll make a follow up one to clean them up as well. /assign @Random-Liu @mindprince @yujuhong	2018-04-18 12:25:08 -07:00
Kubernetes Submit Queue	1ddb0e05e5	Merge pull request #62761 from Random-Liu/lower-usage-nano-cores-in-summary Automatic merge from submit-queue (batch tested with PRs 62761, 62715). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Lower UsageNanoCores boundary in summary api test. We recently switched to use `p2p` instead of `bridge` in containerd https://github.com/containerd/cri/pull/742. However, after that switch, the `UsageNanoCores` becomes lower, and constantly fails the test. An example failure: * https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/containerd_cri/740/pull-cri-containerd-node-e2e/690/ This is probably because: 1) The test container used in summary test does `ping`. https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/summary_test.go#L352 2) `p2p` is simpler than `bridge`, "Maybe cycles are saved from waiving Mac learning" - @jingax10. This PR lowers the boundary by 1 magnitude. Signed-off-by: Lantao Liu <lantaol@google.com> Release note: ```release-note none ```	2018-04-17 22:38:10 -07:00
Sen Lu	854132fdcc	clean up *.properties files	2018-04-17 21:44:32 -07:00
Lantao Liu	002483fe72	Lower UsageNanoCores boundary in summary api test. Signed-off-by: Lantao Liu <lantaol@google.com>	2018-04-17 18:37:51 -07:00
Lantao Liu	c86e85c420	Fix extra-log flag for node e2e. Signed-off-by: Lantao Liu <lantaol@google.com>	2018-04-17 21:48:26 +00:00
Lantao Liu	27105c90ec	Fix kubelet flags. Signed-off-by: Lantao Liu <lantaol@google.com>	2018-04-16 20:42:40 +00:00
Yu-Ju Hong	9a47bd0b67	Node E2E: Remove the simple mount test There are EmptyDir volume tests in test/e2e/common already. The test does not add any more coverage.	2018-04-12 17:05:28 -07:00
Ed Bartosh	7e3d28b30f	Fix wrong usage of kubelet options "--allow-privileged true" is incorrect usage of boolean option. It means setting '--allow-priviledged' to its default value plus non-existing subcommand 'true'. "--allow-privileged false" is even more confusing as it sets allow-priviledged flag to its default value 'true' This is true for any boolean command line option. Fixed this by using correct syntax --allow-priviledged=true Fixed generating of kubelet command line in addKubeletConfigFlags function.	2018-04-12 15:19:49 +03:00
Kubernetes Submit Queue	1dc6e87f57	Merge pull request #62206 from yujuhong/rm-rkt-refs Automatic merge from submit-queue (batch tested with PRs 62192, 61866, 62206, 62360). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Remove rkt references in the codebase ```release-note None ```	2018-04-10 23:52:21 -07:00
Kubernetes Submit Queue	3bc1a0a1d0	Merge pull request #60900 from dashpole/eviction_test_no_pressure Automatic merge from submit-queue (batch tested with PRs 60900, 62215, 62196). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. [Flaky test fix] Use memory.force_empty before and after eviction tests What this PR does / why we need it: (copied from https://github.com/kubernetes/kubernetes/pull/60720): MemoryAllocatableEviction tests have been somewhat flaky: https://k8s-testgrid.appspot.com/sig-node-kubelet#kubelet-serial-gce-e2e&include-filter-by-regex=MemoryAllocatable The failure on the flakes is ["Pod ran to completion"](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-node-kubelet-serial/3785#k8sio-memoryallocatableeviction-slow-serial-disruptive-when-we-run-containers-that-should-cause-memorypressure-should-eventually-evict-all-of-the-correct-pods). Looking at [an example log](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-node-kubelet-serial/3785/artifacts/tmp-node-e2e-6070a774-cos-stable-63-10032-71-0/kubelet.log) (and search for memory-hog-pod, we can see that this pod fails admission because the allocatable memory threshold has already been crossed. `eviction manager: thresholds - ignoring grace period: threshold [signal=allocatableMemory.available, quantity=250Mi] observed 242404Ki` https://github.com/kubernetes/kubernetes/pull/60720 wasn't effective. To clean-up after each eviction test, and prepare for the next, use memory.force_empty to make the kernel reclaim memory in the allocatable cgroup before and after eviction tests. Special notes for your reviewer: I tested to make sure this doesn't break Cgroup Manager tests. It should work on both cgroupfs and systemd based systems, although I have only tested in on cgroupfs. Release note: ```release-note NONE ``` /assign @yujuhong @Random-Liu /sig node /priority important-soon /kind bug its getting a little late in the release cycle, so we can probably wait until after code freeze is lifted for this.	2018-04-06 21:30:06 -07:00
Kubernetes Submit Queue	1e767ddf60	Merge pull request #62135 from jiayingz/kubelet-restart-fix Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Fixes restartKubelet in test/e2e_node failure. Looks like there is some recent change on how we start kubelet service in test_e2e_node. Fixes restartKubelet() to get right kubelet service name to cope with the change. What this PR does / why we need it: Which issue(s) this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged): Fixes kubelet-serial-gce-e2e test failure: https://k8s-testgrid.appspot.com/wg-resource-management#kubelet-serial-gce-e2e Thanks a lot to @mindprince for noticing this! Special notes for your reviewer: Release note: ```release-note ```	2018-04-06 15:46:19 -07:00
David Ashpole	3254bdc1a4	use memory.force_empty before and after eviction tests	2018-04-06 14:01:11 -07:00
Yu-Ju Hong	59741bdfbd	Remove rkt references in the codebase	2018-04-06 12:02:11 -07:00
Manjunath A Kumatagi	1bb810e749	Use pause manifest image	2018-04-06 11:00:50 +05:30
Jiaying Zhang	0138007bdd	Fixes restartKubelet in test/e2e_node failure. Looks like there is some recent change on how we start kubelet service in test_e2e_node. Fixes restartKubelet() to get right kubelet service name to cope with the change.	2018-04-04 13:18:08 -07:00
hzxuzhonghu	8cce8bdc85	make kube-apiserver ServerRunOptions setdefault and Validate before use	2018-04-04 11:19:55 +08:00
Kubernetes Submit Queue	043204b1e5	Merge pull request #61498 from mindprince/delete-in-tree-gpu Automatic merge from submit-queue (batch tested with PRs 61498, 62030). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Delete in-tree support for NVIDIA GPUs. This removes the alpha Accelerators feature gate which was deprecated in 1.10 (#57384). The alternative feature DevicePlugins went beta in 1.10 (#60170). Fixes #54012 ```release-note Support for "alpha.kubernetes.io/nvidia-gpu" resource which was deprecated in 1.10 is removed. Please use the resource exposed by DevicePlugins instead ("nvidia.com/gpu"). ```	2018-04-03 02:02:04 -07:00
Rohit Agarwal	87dda3375b	Delete in-tree support for NVIDIA GPUs. This removes the alpha Accelerators feature gate which was deprecated in 1.10. The alternative feature DevicePlugins went beta in 1.10.	2018-04-02 20:17:01 -07:00
Christoph Blecker	710c8563b4	Fix go vet errors	2018-04-02 17:57:44 -07:00
Kubernetes Submit Queue	99fd98a893	Merge pull request #61740 from filbranden/nodetest1 Automatic merge from submit-queue (batch tested with PRs 61482, 61740). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Make systemd service name for kubelet use a timestamp in e2e-node tests. What this PR does / why we need it: This makes it easier to figure out which execution was last when looking at the output of `systemd list-units kubelet-.service`. We try to find the name of the /tmp/node-e2e- directory and use the same timestamp if we can. Otherwise, we just call Now() again, which isn't as nice (as the unit name and directory name will not match) but will still produce unit names that will be ordered when launching multiple subsequent executions on the same host. Which issue(s) this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged): N/A Special notes for your reviewer: Tested using `make test-e2e-node REMOTE=true` and then checking `systemctl list-units kubelet-.service` on the target host. ``` $ systemctl list-units kubelet-.service kubelet-20180326T142016.service loaded active exited /tmp/node-e2e-20180326T142016/kubelet --kubeconfig /tmp/node-e2e-20180326T142016/kubeconfig --root-dir /var/lib/kubelet ... kubelet-20180326T143550.service loaded active exited /tmp/node-e2e-20180326T143550/kubelet --kubeconfig /tmp/node-e2e-20180326T143550/kubeconfig --root-dir /var/lib/kubelet ... ``` The units are sorted in the order they were launched. Release note: ```release-note NONE ```	2018-03-29 21:10:03 -07:00
Filipe Brandenburger	b8c39b7055	In summary_test, make Docker cpu/memory checks optional if unavailable. The numbers will only be available when docker.service has its own memory and cpu cgroups, which doesn't necessarily happen unless the unit has Delegate=yes configured. Let's work around that by checking the status of Delegate, in the case where we are: * running Docker * running Systemd * able to check the status through systemctl * the status is explicitly Delegate=no (the default) If all of those are true, let's make CPU and Memory expectations optional. Tested: make test-e2e-node REMOTE=true HOSTS=centos-e2e-node FOCUS="Summary API"	2018-03-29 18:12:30 -07:00
Filipe Brandenburger	351a70b60e	In summary_test, create a file outside the test volume too. This is necessary to show any RootFs usage on systems where the backing filesystem of overlay2 is xfs. The current test only created directories (for mount points) in the upper layer of the overlay. Outside of the mount namespace, only the directories are visible. When running `du` on those, usually filesystems will show some usage, but not xfs, which shows a disk usage of 0 for directories. Fix this by creating a file in the root directory, outside the volumes, in order to trigger some disk usage that can be measured by `du`. Tested: make test-e2e-node REMOTE=true HOSTS=centos-e2e-node FOCUS="Summary API"	2018-03-29 18:12:29 -07:00
Kubernetes Submit Queue	5ae7bba496	Merge pull request #60100 from mtaufen/node-authz-nodeconfigsource Automatic merge from submit-queue (batch tested with PRs 61829, 61908, 61307, 61872, 60100). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. node authorizer sets up access rules for dynamic config This PR makes the node authorizer automatically set up access rules for dynamic Kubelet config. I also added some validation to the node strategy, which I discovered we were missing while writing this. This PR is based on another WIP from @liggitt. ```release-note The node authorizer now automatically sets up rules for Node.Spec.ConfigSource when the DynamicKubeletConfig feature gate is enabled. ```	2018-03-29 17:37:18 -07:00
Filipe Brandenburger	76ef9c9074	Make systemd service name for kubelet use a timestamp in e2e-node tests. This makes it easier to figure out which execution was last when looking at the output of `systemd list-units kubelet-.service`. We try to find the name of the /tmp/node-e2e- directory and use the same timestamp if we can. Otherwise, we just call Now() again, which isn't as nice (as the unit name and directory name will not match) but will still produce unit names that will be ordered when launching multiple subsequent executions on the same host.	2018-03-29 11:17:42 -07:00
Filipe Brandenburger	451faff4ef	Use curl instead of wget to fetch the CNI tarball in e2e-node test Curl is more ubiquitous than wget. For instance, the GCE centos-7 and rhel-7 image families ship curl by default, but not wget. Looking at the shell scripts under cluster/, they tend to use curl more than wget. (The ones that use wget, such as get-kube.sh, try curl first and only fallback to wget if it's not available.) Tested: by running node-e2e-test on Ubuntu, COS and CentOS.	2018-03-27 09:41:09 -07:00
Michael Taufen	ab8dc12333	node authorizer sets up access rules for dynamic config This PR makes the node authorizer automatically set up access rules for dynamic Kubelet config. I also added some validation to the node strategy, which I discovered we were missing while writing this.	2018-03-27 08:49:45 -07:00
Kubernetes Submit Queue	915798d229	Merge pull request #60563 from hzxuzhonghu/replace-context Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Replace package "golang.org/x/net/context" with "context" What this PR does / why we need it: Replace package "golang.org/x/net/context" with "context" Which issue(s) this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged): Fixes #60560 Special notes for your reviewer: As of Go 1.7 this package(golang.org/x/net/context) is available in the standard library under the name context. see (https://godoc.org/golang.org/x/net/context) It is almost machinery replace. Release note: ```release-note NONE ```	2018-03-23 16:34:23 -07:00
Kubernetes Submit Queue	1b6b2ee790	Merge pull request #61478 from shyamjvs/capture-pod-startup-phases-as-metrics Automatic merge from submit-queue (batch tested with PRs 61378, 60915, 61499, 61507, 61478). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Capture pod startup phases as metrics Learning from https://github.com/kubernetes/kubernetes/issues/60589, we should also start collecting and graphing sub-parts of pod-startup latency. /sig scalability /kind feature /priority important-soon /cc @wojtek-t ```release-note NONE ```	2018-03-22 07:15:33 -07:00
hzxuzhonghu	70e45eccf2	Replace "golang.org/x/net/context" with "context"	2018-03-22 20:57:14 +08:00
Shyam Jeedigunta	0f0c754eb4	Get rid of duplicate VerifyPodStartupLatency util in node density tests	2018-03-21 16:58:31 +01:00
Shyam Jeedigunta	b0dd166fa3	Capture different parts of pod-startup latency as metrics	2018-03-21 16:58:25 +01:00
Lantao Liu	9fc2795d55	Change pods memory boundary. Signed-off-by: Lantao Liu <lantaol@google.com>	2018-03-20 23:24:16 +00:00
Kubernetes Submit Queue	c64f19dd1b	Merge pull request #59728 from wgliang/master.append Automatic merge from submit-queue (batch tested with PRs 59740, 59728, 60080, 60086, 58714). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. more concise to merge the slice What this PR does / why we need it: more concise to merge the slice Special notes for your reviewer:	2018-03-19 21:34:30 -07:00
Kubernetes Submit Queue	a3f40dd8df	Merge pull request #60856 from jiayingz/race-fix Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Fixes the races around devicemanager Allocate() and endpoint deletion. There is a race in predicateAdmitHandler Admit() that getNodeAnyWayFunc() could get Node with non-zero deviceplugin resource allocatable for a non-existing endpoint. That race can happen when a device plugin fails, but is more likely when kubelet restarts as with the current registration model, there is a time gap between kubelet restart and device plugin re-registration. During this time window, even though devicemanager could have removed the resource initially during GetCapacity() call, Kubelet may overwrite the device plugin resource capacity/allocatable with the old value when node update from the API server comes in later. This could cause a pod to be started without proper device runtime config set. To solve this problem, introduce endpointStopGracePeriod. When a device plugin fails, don't immediately remove the endpoint but set stopTime in its endpoint. During kubelet restart, create endpoints with stopTime set for any checkpointed registered resource. The endpoint is considered to be in stopGracePeriod if its stoptime is set. This allows us to track what resources should be handled by devicemanager during the time gap. When an endpoint's stopGracePeriod expires, we remove the endpoint and its resource. This allows the resource to be exported through other channels (e.g., by directly updating node status through API server) if there is such use case. Currently endpointStopGracePeriod is set as 5 minutes. Given that an endpoint is no longer immediately removed upon disconnection, mark all its devices unhealthy so that we can signal the resource allocatable change to the scheduler to avoid scheduling more pods to the node. When a device plugin endpoint is in stopGracePeriod, pods requesting the corresponding resource will fail admission handler. Tested: Ran GPUDevicePlugin e2e_node test 100 times and all passed now. What this PR does / why we need it: Which issue(s) this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged): Fixes https://github.com/kubernetes/kubernetes/issues/60176 Special notes for your reviewer: Release note: ```release-note Fixes the races around devicemanager Allocate() and endpoint deletion. ```	2018-03-12 02:50:13 -07:00
Jiaying Zhang	5514a1f4dd	Fixes the races around devicemanager Allocate() and endpoint deletion. There is a race in predicateAdmitHandler Admit() that getNodeAnyWayFunc() could get Node with non-zero deviceplugin resource allocatable for a non-existing endpoint. That race can happen when a device plugin fails, but is more likely when kubelet restarts as with the current registration model, there is a time gap between kubelet restart and device plugin re-registration. During this time window, even though devicemanager could have removed the resource initially during GetCapacity() call, Kubelet may overwrite the device plugin resource capacity/allocatable with the old value when node update from the API server comes in later. This could cause a pod to be started without proper device runtime config set. To solve this problem, introduce endpointStopGracePeriod. When a device plugin fails, don't immediately remove the endpoint but set stopTime in its endpoint. During kubelet restart, create endpoints with stopTime set for any checkpointed registered resource. The endpoint is considered to be in stopGracePeriod if its stoptime is set. This allows us to track what resources should be handled by devicemanager during the time gap. When an endpoint's stopGracePeriod expires, we remove the endpoint and its resource. This allows the resource to be exported through other channels (e.g., by directly updating node status through API server) if there is such use case. Currently endpointStopGracePeriod is set as 5 minutes. Given that an endpoint is no longer immediately removed upon disconnection, mark all its devices unhealthy so that we can signal the resource allocatable change to the scheduler to avoid scheduling more pods to the node. When a device plugin endpoint is in stopGracePeriod, pods requesting the corresponding resource will fail admission handler.	2018-03-09 17:00:57 -08:00
Kubernetes Submit Queue	ae7be34c32	Merge pull request #60509 from verb/pid-e2e Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add node-e2e test for ShareProcessNamespace What this PR does / why we need it: Adds a node-e2e test for kubernetes/features#495 Which issue(s) this PR fixes (optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged): Fixes #59554 Special notes for your reviewer: This requires a feature gate to be enabled in both the kubelet and API server. I'm not sure which jenkins configs need to be updated (or if these are even still used) so I just updated a pile of them. opened kubernetes/test-infra#7030 for https://github.com/kubernetes/test-infra/blob/master/jobs/config.json Release note: ```release-note NONE ```	2018-03-05 14:20:14 -08:00

1 2 3 4 5 ...

1309 Commits (f4b2452654967a44dd0821a48d8833aa839916c0)