Automatic merge from submit-queue (batch tested with PRs 45871, 46498, 46729, 46144, 46804)
PD e2e test: Ready node check now uses the most up-to-date node count.
Follow-up to PR #46746
<!-- Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access)
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`.
Automatic merge from submit-queue
Respect PDBs during node upgrades and add test coverage to the ServiceTest upgrade test.
This is still a WIP... needs to be squashed at least, and I don't think it's currently passing until I increase the scale of the RC, but please have a look at the general outline. Thanks!
Fixes#38336
@kow3ns @bdbauer @krousey @erictune @maisem @davidopp
```
On GCE, node upgrades will now respect PodDisruptionBudgets, if present.
```
Automatic merge from submit-queue (batch tested with PRs 46681, 46786, 46264, 46680, 46805)
Enable Dialer on the Aggregator
Centralize the creation of the dialer during startup.
Have the dialer then passed in to both APIServer and Aggregator.
Aggregator the uses the dialer as its Transport base.
**What this PR does / why we need it**:Enables the Aggregator to use the Dialer/SSHTunneler to connect to the user-apiserver.
**Which issue this PR fixes** : fixes ##46679
**Special notes for your reviewer**:
**Release note**: None
Automatic merge from submit-queue
Implement Daemonset history
~Depends on #45867 (the 1st commit, ignore it when reviewing)~ (already merged)
Ref https://github.com/kubernetes/community/pull/527/ and https://github.com/kubernetes/community/pull/594
@kubernetes/sig-apps-api-reviews @kubernetes/sig-apps-pr-reviews @erictune @kow3ns @lukaszo @kargakis
---
TODOs:
- [x] API changes
- [x] (maybe) Remove rollback subresource if we decide to do client-side rollback
- [x] deployment controller
- [x] controller revision
- [x] owner ref (claim & adoption)
- [x] history reconstruct (put revision number, hash collision avoidance)
- [x] de-dup history and relabel pods
- [x] compare ds template with history
- [x] hash labels (put it in controller revision, pods, and maybe deployment)
- [x] clean up old history
- [x] Rename status.uniquifier when we reach consensus in #44774
- [x] e2e tests
- [x] unit tests
- [x] daemoncontroller_test.go
- [x] update_test.go
- [x] ~(maybe) storage_test.go // if we do server side rollback~
kubectl part is in #46144
---
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 46620, 46732, 46773, 46772, 46725)
Fix AppArmor test for docker 1.13
... & better debugging.
The issue is that we run the pod containers in a shared PID namespace with docker 1.13, so PID 1 is no longer the container's root process. Since it's messy to get the container's root process, I switched to using `/proc/self` to read the apparmor profile. While this wouldn't catch a regression that caused only the init process to run with the wrong profile, I think it's a good approximation.
/cc @aulanov @Amey-D
Automatic merge from submit-queue (batch tested with PRs 46620, 46732, 46773, 46772, 46725)
Added missing documentation to NodeInstanceGroup.
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue
Add initializer support to admission and uninitialized filtering to rest storage
Initializers are the opposite of finalizers - they allow API clients to react to object creation and populate fields prior to other clients seeing them.
High level description:
1. Add `metadata.initializers` field to all objects
2. By default, filter objects with > 0 initializers from LIST and WATCH to preserve legacy client behavior (known as partially-initialized objects)
3. Add an admission controller that populates .initializer values per type, and denies mutation of initializers except by certain privilege levels (you must have the `initialize` verb on a resource)
4. Allow partially-initialized objects to be viewed via LIST and WATCH for initializer types
5. When creating objects, the object is "held" by the server until the initializers list is empty
6. Allow some creators to bypass initialization (set initializers to `[]`), or to have the result returned immediately when the object is created.
The code here should be backwards compatible for all clients because they do not see partially initialized objects unless they GET the resource directly. The watch cache makes checking for partially initialized objects cheap. Some reflectors may need to change to ask for partially-initialized objects.
```release-note
Kubernetes resources, when the `Initializers` admission controller is enabled, can be initialized (defaulting or other additive functions) by other agents in the system prior to those resources being visible to other clients. An initialized resource is not visible to clients unless they request (for get, list, or watch) to see uninitialized resources with the `?includeUninitialized=true` query parameter. Once the initializers have completed the resource is then visible. Clients must have the the ability to perform the `initialize` action on a resource in order to modify it prior to initialization being completed.
```
Automatic merge from submit-queue
Add local storage (scratch space) allocatable support
This PR adds the support for allocatable local storage (scratch space).
This feature is only for root file system which is shared by kubernetes
componenets, users' containers and/or images. User could use
--kube-reserved flag to reserve the storage for kube system components.
If the allocatable storage for user's pods is used up, some pods will be
evicted to free the storage resource.
This feature is part of local storage capacity isolation and described in the proposal https://github.com/kubernetes/community/pull/306
**Release note**:
```release-note
This feature exposes local storage capacity for the primary partitions, and supports & enforces storage reservation in Node Allocatable
```
Automatic merge from submit-queue (batch tested with PRs 46239, 46627, 46346, 46388, 46524)
Dynamic webhook admission control plugin
Unit tests pass.
Needs plumbing:
* [ ] service resolver (depends on @wfender PR)
* [x] client cert (depends on ????)
* [ ] hook source (depends on @caesarxuchao PR)
Also at least one thing will need to be renamed after Chao's PR merges.
```release-note
Allow remote admission controllers to be dynamically added and removed by administrators. External admission controllers make an HTTP POST containing details of the requested action which the service can approve or reject.
```
Automatic merge from submit-queue (batch tested with PRs 46239, 46627, 46346, 46388, 46524)
move labels to components which own the APIs
During the apimachinery split in 1.6, we accidentally moved several label APIs into apimachinery. They don't belong there, since the individual APIs are not general machinery concerns, but instead are the concern of particular components: most commonly the kubelet. This pull moves the labels into their owning components and out of API machinery.
@kubernetes/sig-api-machinery-misc @kubernetes/api-reviewers @kubernetes/api-approvers
@derekwaynecarr since most of these are related to the kubelet
Add support for creating resources that are not immediately visible to
naive clients, but must first be initialized by one or more privileged
cluster agents. These controllers can mark the object as initialized,
allowing others to see them.
Permission to override initialization defaults or modify an initializing
object is limited per resource to a virtual subresource "RESOURCE/initialize"
via RBAC.
Initialization is currently alpha.
Automatic merge from submit-queue
[Flaky PR Test] Fix summary test
fixes issue: #46797
As we can see in the [example failure build log](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-node-kubelet/4319/build-log.txt), the summary containers are pinging google 100s of times a second. This causes the summary container to be killed occasionally, and fail the test. The summary containers are only supposed to ping every 10 seconds according to the current test. As it turns out, we were missing a semicolon, and were not sleeping between pings. For background, we ping google to generate network traffic, so that the summary test can validate network metrics.
This PR adds the semicolon to make the container sleep between calls, and decreases the sleep time from 10 seconds to 1 second, as 1 call / 10 seconds did not produce enough activity.
cc @kubernetes/kubernetes-build-cops @dchen1107
Automatic merge from submit-queue (batch tested with PRs 46648, 46500, 46238, 46668, 46557)
Add an e2e test for AdvancedAuditing
Enable a simple "advanced auditing" setup for e2e tests running on GCE, and add an e2e test that creates & deletes a pod, a secret, and verifies that they're audited.
Includes https://github.com/kubernetes/kubernetes/pull/46548
For https://github.com/kubernetes/features/issues/22
/cc @ericchiang @sttts @soltysh @ihmccreery
Automatic merge from submit-queue (batch tested with PRs 46648, 46500, 46238, 46668, 46557)
Support validating package versions in node conformance test
**What this PR does / why we need it**:
This PR adds a package validator in node conformance test for checking whether the locally installed packages meet the image spec.
**Special notes for your reviewer**:
The image spec for GKE (which has the package spec) will be in a separate PR. Then we will publish a new node conformance test image for GKE whose name should use the convention in https://github.com/kubernetes/kubernetes/issues/45760 and have `gke` in it.
**Release note**:
```
NONE
```
Respect PDBs during node upgrades and add test coverage to the
ServiceTest upgrade test. Modified that test so that we include pod
anti-affinity constraints and a PDB.
This PR adds the check for local storage request when admitting pods. If
the local storage request exceeds the available resource, pod will be
rejected.
This PR adds the support for allocatable local storage (scratch space).
This feature is only for root file system which is shared by kubernetes
componenets, users' containers and/or images. User could use
--kube-reserved flag to reserve the storage for kube system components.
If the allocatable storage for user's pods is used up, some pods will be
evicted to free the storage resource.
Automatic merge from submit-queue (batch tested with PRs 43505, 45168, 46439, 46677, 46623)
Add TPR to CRD migration helper.
This is a helper for migrating TPR data to CustomResource. It's rather hacky because it requires crossing apiserver boundaries, but doing it this way keeps the mess contained to the TPR code, which is scheduled for deletion anyway.
It's also not completely hands-free because making it resilient enough to be completely automated is too involved to be worth it for an alpha-to-beta migration, and would require investing significant effort to fix up soon-to-be-deleted TPR code. Instead, this feature will be documented as a best-effort helper whose results should be verified by hand.
The intended benefit of this over a totally manual process is that it should be possible to copy TPR data into a CRD without having to tear everything down in the middle. The process would look like this:
1. Upgrade to k8s 1.7. Nothing happens to your TPRs.
1. Create CRD with group/version and resource names that match the TPR. Still nothing happens to your TPRs, as the CRD is hidden by the overlapping TPR.
1. Delete the TPR. The TPR data is converted to CustomResource data, and the CRD begins serving at the same REST path.
Note that the old TPR data is left behind by this process, so watchers should not receive DELETE events. This also means the user can revert to the pre-migration state by recreating the TPR definition.
Ref. https://github.com/kubernetes/kubernetes/issues/45728
Centralize the creation of the dialer during startup.
Have the dialer then passed in to both APIServer and Aggregator.
Aggregator the sets the dialer on its Transport base.
This should allow the SSTunnel to be used but also allow the Aggregation
Auth to work with it.
Depending on Environment InsecureSkipTLSVerify *may* need to be set to
true.
Fixed as few tests to call CreateDialer as part of start-up.
Automatic merge from submit-queue (batch tested with PRs 46076, 43879, 44897, 46556, 46654)
Local storage plugin
**What this PR does / why we need it**:
Volume plugin implementation for local persistent volumes. Scheduler predicate will direct already-bound PVCs to the node that the local PV is at. PVC binding still happens independently.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
Part of #43640
**Release note**:
```
Alpha feature: Local volume plugin allows local directories to be created and consumed as a Persistent Volume. These volumes have node affinity and pods will only be scheduled to the node that the volume is at.
```
Automatic merge from submit-queue
Node authorizer
This PR implements the authorization portion of https://github.com/kubernetes/community/blob/master/contributors/design-proposals/kubelet-authorizer.md and kubernetes/features#279:
* Adds a new authorization mode (`Node`) that authorizes requests from nodes based on a graph of related pods,secrets,configmaps,pvcs, and pvs:
* Watches pods, adds edges (secret -> pod, configmap -> pod, pvc -> pod, pod -> node)
* Watches pvs, adds edges (secret -> pv, pv -> pvc)
* When both Node and RBAC authorization modes are enabled, the default RBAC binding that grants the `system:node` role to the `system:nodes` group is not automatically created.
* Tightens the `NodeRestriction` admission plugin to require identifiable nodes for requests from users in the `system:nodes` group.
This authorization mode is intended to be used in combination with the `NodeRestriction` admission plugin, which limits the pods and nodes a node may modify. To enable in combination with RBAC authorization and the NodeRestriction admission plugin:
* start the API server with `--authorization-mode=Node,RBAC --admission-control=...,NodeRestriction,...`
* start kubelets with TLS boostrapping or with client credentials that place them in the `system:nodes` group with a username of `system:node:<nodeName>`
```release-note
kube-apiserver: a new authorization mode (`--authorization-mode=Node`) authorizes nodes to access secrets, configmaps, persistent volume claims and persistent volumes related to their pods.
* Nodes must use client credentials that place them in the `system:nodes` group with a username of `system:node:<nodeName>` in order to be authorized by the node authorizer (the credentials obtained by the kubelet via TLS bootstrapping satisfy these requirements)
* When used in combination with the `RBAC` authorization mode (`--authorization-mode=Node,RBAC`), the `system:node` role is no longer automatically granted to the `system:nodes` group.
```
```release-note
RBAC: the automatic binding of the `system:node` role to the `system:nodes` group is deprecated and will not be created in future releases. It is recommended that nodes be authorized using the new `Node` authorization mode instead. Installations that wish to continue giving all members of the `system:nodes` group the `system:node` role (which grants broad read access, including all secrets and configmaps) must create an installation-specific ClusterRoleBinding.
```
Follow-up:
- [ ] enable e2e CI environment with admission and authorizer enabled (blocked by kubelet TLS bootstrapping enablement in https://github.com/kubernetes/kubernetes/pull/40760)
- [ ] optionally enable this authorizer and admission plugin in kubeadm
- [ ] optionally enable this authorizer and admission plugin in kube-up
Automatic merge from submit-queue
Switch gcloud compute copy-files to scp
gcloud is deprecating `gcloud compute copy-files` and switching to `gcloud compute scp`. Make the change before things start to break.
https://cloud.google.com/sdk/gcloud/reference/compute/copy-files
Warnings we get: `W0529 10:28:59.097] WARNING: `gcloud compute copy-files` is deprecated. Please use `gcloud compute scp` instead. Note that `gcloud compute scp` does not have recursive copy on by default. To turn on recursion, use the `--recurse` flag.`
/cc @jlowdermilk
Automatic merge from submit-queue
Support grabbing test suite metrics
**What this PR does / why we need it**:
Add support for grabbing metrics that cover the entire test suite's execution.
Update the "interesting" controller-manager metrics to match the
current names for the garbage collector, and add namespace controller
metrics to the list.
If you enable `--gather-suite-metrics-at-teardown`, the metrics file is written to a file with a name such as `MetricsForE2ESuite_2017-05-25T20:25:57Z.json` in the `--report-dir`. If you don't specify `--report-dir`, the metrics are written to the test log output.
I'd like to enable this for some of the `pull-*` CI jobs, which will require a separate PR to test-infra.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
@kubernetes/sig-testing-pr-reviews @smarterclayton @wojtek-t @gmarek @derekwaynecarr @timothysc
Automatic merge from submit-queue (batch tested with PRs 45534, 37212, 46613, 46350)
[e2e]Fix define redundant parameter
When timeout to reach HTTP service, redundant parameter make the
error is nil.
Automatic merge from submit-queue
no-snat test
This test checks that Pods can communicate with each other in the same cluster without SNAT.
I intend to create a job that runs this in small clusters (\~3 nodes) at a low frequency (\~once per day) so that we have a signal as we work on allowing multiple non-masquerade CIDRs to be configured (see [kubernetes-incubator/ip-masq-agent](https://github.com/kubernetes-incubator/ip-masq-agent), for example).
/cc @dnardo
Automatic merge from submit-queue
Added k82cn as one of scheduler approver.
According to the requirement of Approver at [community-membership.md](https://github.com/kubernetes/community/blob/master/community-membership.md), I meet the requirements as follow; so I'd like to add myself as an approver of scheduler.
* Reviewer of the codebase for at least 3 months
[k82cn]: [~3 months](6cc40678b6 )
* Primary reviewer for at least 10 substantial PRs to the codebase
[k82cn] Reviewed [40 PRs](https://github.com/issues?q=assignee%3Ak82cn+is%3Aclosed)
* Reviewed or merged at least 30 PRs to the codebase
[k82cn]: 71 merged PRs in kubernetes/kubernetes, and ~100 PRs in kuberentes at https://goo.gl/j2D1fR
As an approver,
* I agree to only approve familiar PRs
* I agree to be responsive to review/approve requests as per community expectations
* I agree to continue my reviewer work as per community expectations
* I agree to continue my contribution, e.g. PRs, mentor contributors
Automatic merge from submit-queue (batch tested with PRs 46407, 46457)
GCE - Refactor API for firewall and backend service creation
**What this PR does / why we need it**:
- Currently, firewall creation function actually instantiates the firewall object; this is inconsistent with the rest of GCE api calls. The API normally gets passed in an existing object.
- Necessary information for firewall creation, (`computeHostTags`,`nodeTags`,`networkURL`,`subnetworkURL`,`region`) were private to within the package. These now have public getters.
- Consumers might need to know whether the cluster is running on a cross-project network. A new `OnXPN` func will make that information available.
- Backend services for regions have been added. Global ones have been renamed to specify global.
- NamedPort management of instance groups has been changed from an `AddPortsToInstanceGroup` func (and missing complementary `Remove...`) to a single, simple `SetNamedPortsOfInstanceGroup`
- Addressed nitpick review comments of #45524
ILB needs the regional backend services and firewall refactor. The ingress controller needs the new `OnXPN` func to decide whether to create a firewall.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Move NetworkPolicy to v1
Move NetworkPolicy to v1
@kubernetes/sig-network-misc
**Release note**:
```release-note
NetworkPolicy has been moved from `extensions/v1beta1` to the new
`networking.k8s.io/v1` API group. The structure remains unchanged from
the beta1 API.
The `net.beta.kubernetes.io/network-policy` annotation on Namespaces
to opt in to isolation has been removed. Instead, isolation is now
determined at a per-pod level, with pods being isolated if there is
any NetworkPolicy whose spec.podSelector targets them. Pods that are
targeted by NetworkPolicies accept traffic that is accepted by any of
the NetworkPolicies (and nothing else), and pods that are not targeted
by any NetworkPolicy accept all traffic by default.
Action Required:
When upgrading to Kubernetes 1.7 (and a network plugin that supports
the new NetworkPolicy v1 semantics), to ensure full behavioral
compatibility with v1beta1:
1. In Namespaces that previously had the "DefaultDeny" annotation,
you can create equivalent v1 semantics by creating a
NetworkPolicy that matches all pods but does not allow any
traffic:
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: default-deny
spec:
podSelector:
This will ensure that pods that aren't match by any other
NetworkPolicy will continue to be fully-isolated, as they were
before.
2. In Namespaces that previously did not have the "DefaultDeny"
annotation, you should delete any existing NetworkPolicy
objects. These would have had no effect before, but with v1
semantics they might cause some traffic to be blocked that you
didn't intend to be blocked.
```
Automatic merge from submit-queue
Apply KubeProxyEndpointLagTimeout to ESIPP tests
Fixes#46533.
The previous construction of ESIPP tests is weird, so I redo it a bit.
A 30 seconds `KubeProxyEndpointLagTimeout` is introduced, as these tests ain't verifying performance, may be better to not make it too tight.
/assign @thockin
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46252, 45524, 46236, 46277, 46522)
Make GCE load-balancers create health checks for nodes
From #14661. Proposal on kubernetes/community#552. Fixes#46313.
Bullet points:
- Create nodes health check and firewall (for health checking) for non-OnlyLocal service.
- Create local traffic health check and firewall (for health checking) for OnlyLocal service.
- Version skew:
- Don't create nodes health check if any nodes has version < 1.7.0.
- Don't backfill nodes health check on existing LBs unless users explicitly trigger it.
**Release note**:
```release-note
GCE Cloud Provider: New created LoadBalancer type Service now have health checks for nodes by default.
An existing LoadBalancer will have health check attached to it when:
- Change Service.Spec.Type from LoadBalancer to others and flip it back.
- Any effective change on Service.Spec.ExternalTrafficPolicy.
```
Automatic merge from submit-queue (batch tested with PRs 45809, 46515, 46484, 46516, 45614)
Remove the reduplicated case judement
This patch remove the reduplicated case judgement
Automatic merge from submit-queue (batch tested with PRs 45809, 46515, 46484, 46516, 45614)
Fix incorrect printf format
**What this PR does / why we need it**: changes `%s` to `%d` for something that is actually an `int` (found via `make vet`).
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Update the "interesting" controller-manager metrics to match the
current names for the garbage collector, and add namespace controller
metrics to the list.
Automatic merge from submit-queue (batch tested with PRs 46429, 46308, 46395, 45867, 45492)
Controller history
**What this PR does / why we need it**:
Implements the ControllerRevision API object and clientset to allow for the implementation of StatefulSet update and DaemonSet history
```release-note
ControllerRevision type added for StatefulSet and DaemonSet history.
```
Automatic merge from submit-queue (batch tested with PRs 46429, 46308, 46395, 45867, 45492)
Summary Test looks at pods that have containers that restart.
Occasionally, the node can report extra containers that had been restarted through the summary API.
This test change tests a pod that restarts, and hopefully should allow us to reproduce and debug this behavior.
/assign @dchen1107
/release-note-none
Automatic merge from submit-queue (batch tested with PRs 46429, 46308, 46395, 45867, 45492)
Bump Go version to 1.8.3
This PR also removed this patched version of Go 1.8.1 which we used to use to workaround performance problem of Go 1.8.1.
Fix https://github.com/kubernetes/kubernetes/issues/45216
Ref #46391
@timothysc @bradfitz
Automatic merge from submit-queue (batch tested with PRs 46124, 46434, 46089, 45589, 46045)
Copy kubeconfig to kubemark master
This should save the effort of digging through jenkins agent and its container to get the kubeconfig.
Ideally we should have kubectl directly working on the kubemark master, but I'm facing some issues due to older version of kubectl present by default on the node.
cc @wojtek-t @gmarek
Automatic merge from submit-queue (batch tested with PRs 45949, 46009, 46320, 46423, 46437)
e2e tests for storage policy support in Kubernetes
This PR covers e2e test cases for vSphere storage policy support in Kubernetes - #46176.
The following test scenario have been implemented.
- Specify only SPBM storage policy name.
- Verify if the disk is provisioned on a compatible datastore with max free space.
- Specify a storage policy name which is not defined on VC.
- Verify if PVC create errors out that no pbm profile with this policy is found.
- Specify both SPBM storage policy name and VSAN capabilities together.
- Verify if PVC create errors out that you can't use both SPBM policy name with VSAN capabilities. You can only specify one.
- Specify SPBM storage policy name with user specified datastore which is non-compatible.
- Verify if PVC create errors out that it can't provision a disk on a non-compatible datastore.
@jeffvance @divyenpatel
**Release note**:
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 45949, 46009, 46320, 46423, 46437)
Unregister some metrics
delete some registered metrics since they are not observed
**Release note**:
```release-note
NONE
```