Automatic merge from submit-queue
azure: load balancer: support UDP, fix multiple loadBalancerSourceRanges support, respect sessionAffinity
**What this PR does / why we need it**:
1. Adds support for UDP ports
2. Fixes support for multiple `loadBalancerSourceRanges`
3. Adds support the Service spec's `sessionAffinity`
4. Removes dead code from the Instances file
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#43683
**Special notes for your reviewer**: n/a
**Release note**:
```release-note
azure: add support for UDP ports
azure: fix support for multiple `loadBalancerSourceRanges`
azure: support the Service spec's `sessionAffinity`
```
Automatic merge from submit-queue
Add support for PodPreset in `kubectl get` command
**What this PR does / why we need it**:
PR title
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#44736
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 45453, 45307, 44987)
Migrate the docker client code from dockertools to dockershim
Move docker client code from dockertools to dockershim/libdocker. This includes
DockerInterface (renamed to Interface), FakeDockerClient, etc.
This is part of #43234
Automatic merge from submit-queue (batch tested with PRs 45453, 45307, 44987)
Init cache with assigned non-terminated pods before scheduling
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#45220
**Release note**:
```release-note
The fix makes scheduling go routine waiting for cache (e.g. Pod) to be synced.
```
Automatic merge from submit-queue
Filter out IPV6 addresses from NodeAddresses() returned by vSphere
The vSphere CP returns both IPV6 and IPV4 addresses for a Node as part of NodeAddresses() implementation. However, Kubelet fails due to duplicate api.NodeAddress value when the node has an IPV6 address associated with it. This issue is tracked in #42690. The following are observed:
- when we enabled the logs and checked the addresses sent by vSphere CP to Kubelet, we don't see any duplicate addresses at all.
- Also, kubelet_node_status doesn’t receive any duplicate address from cloud provider.
However, when we filter out the IPV6 addresses and only return IPV4 addresses to the Kubelet, it works perfectly fine.
Even though the Kubelet receives the non-duplicate node-addresses, it still errors out with duplicate node addresses. It might be an issue when kubelet propagates these addresses to API server (or) API server is enable to handle IPV6 addresses.
@divyenpatel @abrarshivani @pdhamdhere @tusharnt
**Release note**:
```release-note
None
```
Automatic merge from submit-queue
rkt: Generate a new Network Namespace for each Pod
**What this PR does / why we need it**:
This PR concerns the Kubelet with the Container runtime rkt.
Currently, when a Pod stops and the kubelet restart it, the Pod will use the **same network namespace** based on its PodID.
When the Garbage Collection is triggered, it delete all the old resources and the current network namespace.
The Pods and all containers inside it loose the _eth0_ interface.
I explained more in details in #45149 how to reproduce this behavior.
This PR generates a new unique network namespace name for each new/restarting Pod.
The Garbage collection retrieve the correct network namespace and remove it safely.
**Which issue this PR fixes** :
fix#45149
**Special notes for your reviewer**:
Following @yifan-gu guidelines, so maybe expecting him for the final review.
**Release note**:
`NONE`
Automatic merge from submit-queue (batch tested with PRs 45304, 45006, 45527)
increase the QPS for namespace controller
The namespace controller is really chatty. Especially to discovery since that involves two requests for every API version available. This bumps the QPS and burst on the namespace controller to avoid being stuck waiting.
Automatic merge from submit-queue (batch tested with PRs 44798, 45537, 45448, 45432)
nfs.go: cleancode err
**What this PR does / why we need it**:
The modification makes code clean, simple, and easy to inspect.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Statefulsets for cinder: allow multi-AZ deployments, spread pods across zones
**What this PR does / why we need it**: Currently if we do not specify availability zone in cinder storageclass, the cinder is provisioned to zone called nova. However, like mentioned in issue, we have situation that we want spread statefulset across 3 different zones. Currently this is not possible with statefulsets and cinder storageclass. In this new solution, if we leave it empty the algorithm will choose the zone for the cinder drive similar style like in aws and gce storageclass solutions.
**Which issue this PR fixes** fixes#44735
**Special notes for your reviewer**:
example:
```
kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
name: all
provisioner: kubernetes.io/cinder
---
apiVersion: v1
kind: Service
metadata:
annotations:
service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
name: galera
labels:
app: mysql
spec:
ports:
- port: 3306
name: mysql
clusterIP: None
selector:
app: mysql
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: mysql
spec:
serviceName: "galera"
replicas: 3
template:
metadata:
labels:
app: mysql
annotations:
pod.alpha.kubernetes.io/initialized: "true"
spec:
containers:
- name: mysql
image: adfinissygroup/k8s-mariadb-galera-centos:v002
imagePullPolicy: Always
ports:
- containerPort: 3306
name: mysql
- containerPort: 4444
name: sst
- containerPort: 4567
name: replication
- containerPort: 4568
name: ist
volumeMounts:
- name: storage
mountPath: /data
readinessProbe:
exec:
command:
- /usr/share/container-scripts/mysql/readiness-probe.sh
initialDelaySeconds: 15
timeoutSeconds: 5
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
volumeClaimTemplates:
- metadata:
name: storage
annotations:
volume.beta.kubernetes.io/storage-class: all
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 12Gi
```
If this example is deployed it will automatically create one replica per AZ. This helps us a lot making HA databases.
Current storageclass for cinder is not perfect in case of statefulsets. Lets assume that cinder storageclass is defined to be in zone called nova, but because labels are not added to pv - pods can be started in any zone. The problem is that at least in our openstack it is not possible to use cinder drive located in zone x from zone y. However, should we have possibility to choose between cross-zone cinder mounts or not? Imo it is not good way of doing things that they mount volume from another zone where the pod is located(means more network traffic between zones)? What you think? Current new solution does not allow that anymore (should we have possibility to allow it? it means removing the labels from pv).
There might be some things that needs to be fixed still in this release and I need help for that. Some parts of the code is not perfect.
Issues what i am thinking about (I need some help for these):
1) Can everybody see in openstack what AZ their servers are? Can there be like access policy that do not show that? If AZ is not found from server specs, I have no idea how the code behaves.
2) In GetAllZones() function, is it really needed to make new serviceclient using openstack.NewComputeV2 or could I somehow use existing one
3) This fetches all servers from some openstack tenant(project). However, in some cases kubernetes is maybe deployed only to specific zone. If kube servers are located for instance in zone 1, and then there are another servers in same tenant in zone 2. There might be usecase that cinder drive is provisioned to zone-2 but it cannot start pod, because kubernetes does not have any nodes in zone-2. Could we have better way to fetch kubernetes nodes zones? Currently that information is not added to kubernetes node labels automatically in openstack (which should I think). I have added those labels manually to nodes. If that zone information is not added to nodes, the new solution does not start stateful pods at all, because it cannot target pods.
cc @rootfs @anguslees @jsafrane
```release-note
Default behaviour in cinder storageclass is changed. If availability is not specified, the zone is chosen by algorithm. It makes possible to spread stateful pods across many zones.
```
Automatic merge from submit-queue
apiserver: injectable default watch cache size
This makes it possible to override the default watch capacity in the REST options getter. Before this PR the default is written into the storage struct explicitly, and if it is the default, the REST options getter didn't know. With this the PR the default is applied late and can be injected from the outside.
Automatic merge from submit-queue
Remove leaked tmp file in unit tests
Some unit tests leave a temp file in work space:
pkg/util/iptables/xtables.lock
This patch remove that file
@dcbw
**Release note**:
```NONE
```
Automatic merge from submit-queue
add rootfs gnufied and childsb to volume approver
**What this PR does / why we need it**:
add me and @gnufied @childsb to volume approver
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Fixes support for multiple instances of loadBalancerSourceRanges.
Previously, the names of the rules for each address range conflicted
causing only one to be applied. Now each gets a unique name.
Automatic merge from submit-queue (batch tested with PRs 45018, 45330)
Add exponential backoff to openstack loadbalancer functions
Using exponential backoff to lower openstack load and reduce API call throttling
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 45018, 45330)
Clean up for qos.go
**What this PR does / why we need it**:
Seems we are not using any of those functions.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#39148
**Release note**:
```release-note
A small clean up to remove unnecessary functions.
```
Automatic merge from submit-queue (batch tested with PRs 45200, 45203)
Allow certificate manager to be initialized with no certs.
Adds support to the certificate manager so it can be initialized with no
certs and only a connection to the certificate request signing API. This
specifically covers the scenario for the kubelet server certificate,
where there is a request signing client but on first boot there is no
bootstrapping or local certs.
Automatic merge from submit-queue (batch tested with PRs 45508, 44258, 44126, 45441, 45320)
Print a newline after ginkgo tests so the test infra doesn't think th…
Fixes#45279
Print a newline after ginkgo tests so the test infra doesn't think that they fail
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 45508, 44258, 44126, 45441, 45320)
Use existing global var criSupportedLogDrivers
**What this PR does / why we need it**:
Use existing global var `criSupportedLogDrivers` defined in docker_service.go. If CRI supports other log drivers in the future, we will only need to modify that global var.
cc @Random-Liu
Automatic merge from submit-queue (batch tested with PRs 45508, 44258, 44126, 45441, 45320)
cloud initialize node in external cloud controller
@thockin This PR adds support in the `cloud-controller-manager` to initialize nodes (instead of kubelet, which did it previously)
This also adds support in the kubelet to skip node cloud initialization when `--cloud-provider=external`
Specifically,
Kubelet
1. The kubelet has a new flag called `--provider-id` which uniquely identifies a node in an external DB
2. The kubelet sets a node taint - called "ExternalCloudProvider=true:NoSchedule" if cloudprovider == "external"
Cloud-Controller-Manager
1. The cloud-controller-manager listens on "AddNode" events, and then processes nodes that starts with that above taint. It performs the cloud node initialization steps that were previously being done by the kubelet.
2. On addition of node, it figures out the zone, region, instance-type, removes the above taint and updates the node.
3. Then periodically queries the cloudprovider for node addresses (which was previously done by the kubelet) and updates the node if there are new addresses
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 41903, 45311, 45474, 45472, 45501)
Adds a helper to convert componentconfig into a configmap
**What this PR does / why we need it**:
Adds a utility function that will be used by self-hosted components such as `kubeadm` but is also a step towards https://github.com/kubernetes/kubernetes/issues/44857
**Special notes for your reviewer**:
**Release note**:
```
NONE
```
/cc @kubernetes/sig-cluster-lifecycle-pr-reviews @bsalamat
Automatic merge from submit-queue (batch tested with PRs 41903, 45311, 45474, 45472, 45501)
Display <none> when port is empty.
**What this PR does / why we need it**:
If container ports are not specified, `kubectl describe` displays `<none>` instead of empty.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 41903, 45311, 45474, 45472, 45501)
Fetch VM UUID from - /sys/class/dmi/id/product_serial
**What this PR does / why we need it**:
Current code fetch VM uuid using uuid reported at `'/sys/devices/virtual/dmi/id/product_uuid'.` This doesn't work with all the distros like Ubuntu 16.04 and Fedora.
updating code to fetch VM uuid from `/sys/class/dmi/id/product_serial`
**Which issue this PR fixes**
fixes #
**Special notes for your reviewer**:
Verified UUID is matching with VM UUID on ubuntu 16.04, Cent OS 7.3 , and Photon OS
@BaluDontu @tusharnt
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 44727, 45409, 44968, 45122, 45493)
Separate healthz server from metrics server in kube-proxy
From #14661, proposal is on kubernetes/community#552.
Couple bullet points as in commit:
- /healthz will be served on 0.0.0.0:10256 by default.
- /metrics and /proxyMode will be served on port 10249 as before.
- Healthz handler will verify timestamp in iptables mode.
/assign @nicksardo @bowei @thockin
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
adds log when gpuManager.start() failed
If gpuManager.start() returns error, there is no log.
We confused with scheduler do not schedule any pod(with gpu) to one node.
kubectl describe node xxx shows there is no gpu on that node, because the gpu driver do not work on that node, gpuManager.start() failed, but we can not see anything in log.
Automatic merge from submit-queue
Remove unnecessary constants and add type to secret
**What this PR does / why we need it**:
Adds the type field to the secret for the `persistent-volume-provisioning` example of Quobyte. Also remove unnecessary constants in Quobyte Code base.
FYI
@rootfs @saad-ali @quolix
Automatic merge from submit-queue
refactor names for the apiserver handling chain
The names and structure around the handling chain got a bit confused. This simplifies it back out into a single struct with three parts: overall handler, gorestful handler, pathrecording mux and makes the delegate wiring simpler
Automatic merge from submit-queue
Clean up petset
**What this PR does / why we need it**:
Rename legacy petset to statefulset.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
util/iptables: grab iptables locks if iptables-restore doesn't support --wait
When iptables-restore doesn't support --wait (which < 1.6.2 don't), it may
conflict with other iptables users on the system, like docker, because it
doesn't acquire the iptables lock before changing iptables rules. This causes
sporadic docker failures when starting containers.
To ensure those don't happen, essentially duplicate the iptables locking
logic inside util/iptables when we know iptables-restore doesn't support
the --wait option.
Unfortunately iptables uses two different locking mechanisms, one until
1.4.x (abstract socket based) and another from 1.6.x (/run/xtables.lock
flock() based). We have to grab both locks, because we don't know what
version of iptables-restore exists since iptables-restore doesn't have
a --version option before 1.6.2. Plus, distros (like RHEL) backport the
/run/xtables.lock patch to 1.4.x versions.
Related: https://github.com/kubernetes/kubernetes/pull/43575
See also: https://github.com/openshift/origin/pull/13845
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1417234
@kubernetes/rh-networking @kubernetes/sig-network-misc @eparis @knobunc @danwinship @thockin @freehan