Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add required family flag for conntrack IPv6 operation
This change causes kube-proxy to supply the required "-f ipv6"
family flag whenever the conntrack utility is executed and the
associated service is using IPv6.
This change is required for IPv6-only operation.
Note that unit test coverage for the 2-line changes in
pkg/proxy/iptables/proxier.go and /pkg/proxy/ipvs/proxier.go will need
to be added after support for IPv6 service addresses is added to these
files. For pkg/proxy/iptables/proxier.go, this coverage will be added
either with PR #48551.
fixes#52027
**What this PR does / why we need it**:
Kube-proxy is currently not supplying the required "-f ipv6" family flag whenever it
calls the conntrack utility and the associated service is using an IPv6 service IP address.
This means that for IPv6-only operation, conntrack is not properly cleaning up
stale UDP connections, and this may be effecting ip6tables operation.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes # 52027
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
The following changes are proposed for the iptables proxier:
* There are three places where a string specifying IP:port is parsed
using something like this:
if index := strings.Index(e.endpoint, ":"); index != -1 {
This will fail for IPv6 since V6 addresses contain colons. Also,
the V6 address is expected to be surrounded by square brackets
(i.e. []:). Fix this by replacing call to Index with
call to LastIndex() and stripping out square brackets.
* The String() method for the localPort struct should put square brackets
around IPv6 addresses.
* The logging in the merge() method for proxyServiceMap should put brackets
around IPv6 addresses.
* There are several places where filterRules destination is hardcoded to
/32. This should be a /128 for IPv6 case.
* Add IPv6 unit test cases
fixes#48550
This change causes kube-proxy to supply the required "-f ipv6"
family flag whenever the conntrack utility is executed and the
associated service is using IPv6.
This change is required for IPv6-only operation.
Note that unit test coverage for the 2-line changes in
pkg/proxy/iptables/proxier.go and /pkg/proxy/ipvs/proxier.go will need
to be added after support for IPv6 service addresses is added to these
files. For pkg/proxy/iptables/proxier.go, this coverage will be added
either with PR #48551.
fixes#52027
Automatic merge from submit-queue (batch tested with PRs 49850, 47782, 50595, 50730, 51341)
Paramaterize `stickyMaxAgeMinutes` for service in API
**What this PR does / why we need it**:
Currently I find `stickyMaxAgeMinutes` for a session affinity type service is hard code to 180min. There is a TODO comment, see
https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/iptables/proxier.go#L205
I think the seesion sticky max time varies from service to service and users may not aware of it since it's hard coded in all proxier.go - iptables, userspace and winuserspace.
Once we parameterize it in API, users can set/get the values for their different services.
Perhaps, we can introduce a new field `api.ClientIPAffinityConfig` in `api.ServiceSpec`.
There is an initial discussion about it in sig-network group. See,
https://groups.google.com/forum/#!topic/kubernetes-sig-network/i-LkeHrjs80
**Which issue this PR fixes**:
fixes#49831
**Special notes for your reviewer**:
**Release note**:
```release-note
Paramaterize session affinity timeout seconds in service API for Client IP based session affinity.
```
Automatic merge from submit-queue
Log abridged set of rules at v2 in kube-proxy on error
**What this PR does / why we need it**:
this is a follow-on to https://github.com/kubernetes/kubernetes/pull/48085
**Special notes for your reviewer**:
we hit this in operations where we typically run in v2, and would like to log abridged set of output rather than full output.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Raise a warning instead of info if br-netfilter is missing or unset
Took quite a while to figure out why service VIP is unreachable on my cluster. It turns out br-nf-call-iptables is unset. I wish this message could be a warning to attract considerable attention.
Automatic merge from submit-queue (batch tested with PRs 45953, 45889)
Add /metrics and profiling handlers to kube-proxy
Also expose "syncProxyRules latency" as a prometheus metrics.
Fix https://github.com/kubernetes/kubernetes/issues/45876
Automatic merge from submit-queue (batch tested with PRs 45623, 45241, 45460, 41162)
Promotes Source IP preservation for Virtual IPs from Beta to GA
Fixes#33625. Feature issue: kubernetes/features#27.
Bullet points:
- Declare 2 fields (ExternalTraffic and HealthCheckNodePort) that mirror the ESIPP annotations.
- ESIPP alpha annotations will be ignored.
- Existing ESIPP beta annotations will still be fully supported.
- Allow promoting beta annotations to first class fields or reversely.
- Disallow setting invalid ExternalTraffic and HealthCheckNodePort on services. Default ExternalTraffic field for nodePort or loadBalancer type service to "Global" if not set.
**Release note**:
```release-note
Promotes Source IP preservation for Virtual IPs to GA.
Two api fields are defined correspondingly:
- Service.Spec.ExternalTrafficPolicy <- 'service.beta.kubernetes.io/external-traffic' annotation.
- Service.Spec.HealthCheckNodePort <- 'service.beta.kubernetes.io/healthcheck-nodeport' annotation.
```
Automatic merge from submit-queue (batch tested with PRs 45685, 45572, 45624, 45723, 45733)
Remove reasons from iptables syncProxyRules
The reasons are no longer useful, since we know if something changed anyway, I think.
Automatic merge from submit-queue (batch tested with PRs 44727, 45409, 44968, 45122, 45493)
Separate healthz server from metrics server in kube-proxy
From #14661, proposal is on kubernetes/community#552.
Couple bullet points as in commit:
- /healthz will be served on 0.0.0.0:10256 by default.
- /metrics and /proxyMode will be served on port 10249 as before.
- Healthz handler will verify timestamp in iptables mode.
/assign @nicksardo @bowei @thockin
**Release note**:
```release-note
NONE
```
- /healthz will be served on 0.0.0.0:10256 by default.
- /metrics and /proxyMode will be served on port 10249
as before.
- Healthz handler will verify timestamp in iptables mode.
Automatic merge from submit-queue
Edge-based userspace LB in kube-proxy
@thockin @bowei - if one of you could take a look if that PR doesn't break some basic kube-proxy assumptions. The similar change for winuserproxy should be pretty trivial.
And we should also do that for iptables, but that requires splitting the iptables code to syncProxyRules (which from what I know @thockin already started working on so we should probably wait for it to be done).
Automatic merge from submit-queue
Restore "Setting endpoints" log message
**What this PR does / why we need it**:
The "Setting endpoints" message from kube-proxy at high verbosity was
lost as part of a larger simplification in kubernetes/kubernetes#42747.
This change brings it back, simply outputting the just-constructed
addresses list.
I need this message to monitor delays in propagating endpoints changes across nodes.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43871, 44053)
Proxy healthchecks overhaul
The first commit is #44051
These three commits are tightly coupled, but should be reviewed one-by-one. The first adds tests for healthchecks, and found a bug. The second basically rewrites the healthcheck pkg to be much simpler and less flexible (since we weren't using the flexibility). The third tweaks how healthchecks are handled in endpoints-path to be more like they are in services-path.
@MrHohn because I know you were in here for source-IP GA work.
@wojtek-t
The "Setting endpoints" message from kube-proxy at high verbosity was
lost as part of a larger simplification in kubernetes/kubernetes#42747.
This change brings it back, simply outputting the just-constructed
addresses list.
Automatic merge from submit-queue
kube-proxy: filter INPUT as well as OUTPUT
We need to apply filter rules on the way in (nodeports) and out (cluster
IPs). Testing here is insufficient to have caught this - will come back
for that.
Fixes#43969
@justinsb since you have the best repro, can you test? It passes what I think is repro.
@ethernetdan we will want this in 1.6.x
```release-note
Fix bug with service nodeports that have no backends not being rejected, when they should be. This is not a regression vs v1.5 - it's a fix that didn't quite fix hard enough.
```
The existing healthcheck lib was pretty complicated and was hiding some
bugs (like the count always being 1), This is a reboot of the interface
and implementation to be significantly simpler and better tested.
Adding test cases for HC updates found a bug with an update that
simultaneously removes one port and adds another. Map iteration is
randomized, so sometimes no HC would be created.
We need to apply filter rules on the way in (nodeports) and out (cluster
IPs). Testing here is insufficient to have caught this - will come back
for that.
We don't need the svcPortToInfoMap. Its only purpose was to
send "valid" local endpoints (those with valid IP and >0 port) to the
health checker. But we shouldn't be sending invalid endpoints to
the health checker anyway, because it can't do anything with them.
If we exclude invalid endpoints earlier, then we don't need
flattenValidEndpoints().
And if we don't need flattenValidEndpoints() it makes no sense to have
svcPortToInfoMap store hostPortInfo, since endpointsInfo is the same
thing as hostPortInfo except with a combined host:port.
And if svcPortToInfoMap now only stores valid endpointsInfos, it is
exactly the same thing as newEndpoints.
This changes the userspace proxy so that it cleans up its conntrack
settings when a service is removed (as the iptables proxy already
does). This could theoretically cause problems when a UDP service
as deleted and recreated quickly (with the same IP address). As
long as packets from the same UDP source IP and port were going to
the same destination IP and port, the the conntrack would apply and
the packets would be sent to the old destination.
This is astronomically unlikely if you did not specify the IP address
to use in the service, and even then, only happens with an "established"
UDP connection. However, in cases where a service could be "switched"
between using the iptables proxy and the userspace proxy, this case
becomes much more frequent.
This makes it more obvious that they run together and makes the upcoming
rate-limited syncs easier.
Also make test use ints for ports, so it's easier to see when a port is
a literal value vs a name.