Automatic merge from submit-queue
Fix the issue in Windows kube-proxy when processing unqualified name. This is for DNS client such as ping or iwr that validate name in response and original question.
**What this PR does / why we need it**:
This PR is an additional fix to #41618 and [the corresponding commit](b9dfb69dd7). The DNS client such as nslookup does not validate name matching in response and original question. That works fine when we append DNS suffix to unqualified name in DNS query in Windows kube-proxy. However, for DNS client such as ping or Invoke-WebRequest that validates name in response and original question, the issue arises and the DNS query fails although the received DNS response has no error.
This PR fixes the additional issue by restoring the original question name in DNS response. Further, this PR refactors DNS message routines by using miekg's DNS library.
This PR affects the Windows kube-proxy only.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#42605
**Special notes for your reviewer**:
**Release note**:
```release-note
Fix DNS suffix search list support in Windows kube-proxy.
```
Automatic merge from submit-queue (batch tested with PRs 45953, 45889)
Add /metrics and profiling handlers to kube-proxy
Also expose "syncProxyRules latency" as a prometheus metrics.
Fix https://github.com/kubernetes/kubernetes/issues/45876
Automatic merge from submit-queue (batch tested with PRs 45623, 45241, 45460, 41162)
Promotes Source IP preservation for Virtual IPs from Beta to GA
Fixes#33625. Feature issue: kubernetes/features#27.
Bullet points:
- Declare 2 fields (ExternalTraffic and HealthCheckNodePort) that mirror the ESIPP annotations.
- ESIPP alpha annotations will be ignored.
- Existing ESIPP beta annotations will still be fully supported.
- Allow promoting beta annotations to first class fields or reversely.
- Disallow setting invalid ExternalTraffic and HealthCheckNodePort on services. Default ExternalTraffic field for nodePort or loadBalancer type service to "Global" if not set.
**Release note**:
```release-note
Promotes Source IP preservation for Virtual IPs to GA.
Two api fields are defined correspondingly:
- Service.Spec.ExternalTrafficPolicy <- 'service.beta.kubernetes.io/external-traffic' annotation.
- Service.Spec.HealthCheckNodePort <- 'service.beta.kubernetes.io/healthcheck-nodeport' annotation.
```
Automatic merge from submit-queue (batch tested with PRs 45685, 45572, 45624, 45723, 45733)
Remove reasons from iptables syncProxyRules
The reasons are no longer useful, since we know if something changed anyway, I think.
Automatic merge from submit-queue (batch tested with PRs 45571, 45657, 45638, 45663, 45622)
Use real proxier inside hollow-proxy but with mocked syscalls
Fixes https://github.com/kubernetes/kubernetes/issues/43701
This should make hollow-proxy better mimic the real kube-proxy in performance.
Maybe next we should have a more realistic implementation even for fake iptables (adding/updating/deleting rules/chains in an table, just not on the real one)? Though I'm not sure how important it is.
cc @kubernetes/sig-scalability-misc @kubernetes/sig-network-misc @wojtek-t @gmarek
Automatic merge from submit-queue (batch tested with PRs 44727, 45409, 44968, 45122, 45493)
Separate healthz server from metrics server in kube-proxy
From #14661, proposal is on kubernetes/community#552.
Couple bullet points as in commit:
- /healthz will be served on 0.0.0.0:10256 by default.
- /metrics and /proxyMode will be served on port 10249 as before.
- Healthz handler will verify timestamp in iptables mode.
/assign @nicksardo @bowei @thockin
**Release note**:
```release-note
NONE
```
- /healthz will be served on 0.0.0.0:10256 by default.
- /metrics and /proxyMode will be served on port 10249
as before.
- Healthz handler will verify timestamp in iptables mode.
Automatic merge from submit-queue (batch tested with PRs 44590, 44969, 45325, 45208, 44714)
Fix onlylocal endpoint's healthcheck nodeport logic
I was in the middle of rebasing #41162, surprisingly found the healthcheck nodeport logic in kube-proxy is still buggy. Separate this fix out as it isn't GA related.
/assign @freehan @thockin
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Edge-based userspace LB in kube-proxy
@thockin @bowei - if one of you could take a look if that PR doesn't break some basic kube-proxy assumptions. The similar change for winuserproxy should be pretty trivial.
And we should also do that for iptables, but that requires splitting the iptables code to syncProxyRules (which from what I know @thockin already started working on so we should probably wait for it to be done).
Automatic merge from submit-queue
Restore "Setting endpoints" log message
**What this PR does / why we need it**:
The "Setting endpoints" message from kube-proxy at high verbosity was
lost as part of a larger simplification in kubernetes/kubernetes#42747.
This change brings it back, simply outputting the just-constructed
addresses list.
I need this message to monitor delays in propagating endpoints changes across nodes.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 43871, 44053)
Proxy healthchecks overhaul
The first commit is #44051
These three commits are tightly coupled, but should be reviewed one-by-one. The first adds tests for healthchecks, and found a bug. The second basically rewrites the healthcheck pkg to be much simpler and less flexible (since we weren't using the flexibility). The third tweaks how healthchecks are handled in endpoints-path to be more like they are in services-path.
@MrHohn because I know you were in here for source-IP GA work.
@wojtek-t
The "Setting endpoints" message from kube-proxy at high verbosity was
lost as part of a larger simplification in kubernetes/kubernetes#42747.
This change brings it back, simply outputting the just-constructed
addresses list.
Automatic merge from submit-queue
kube-proxy: filter INPUT as well as OUTPUT
We need to apply filter rules on the way in (nodeports) and out (cluster
IPs). Testing here is insufficient to have caught this - will come back
for that.
Fixes#43969
@justinsb since you have the best repro, can you test? It passes what I think is repro.
@ethernetdan we will want this in 1.6.x
```release-note
Fix bug with service nodeports that have no backends not being rejected, when they should be. This is not a regression vs v1.5 - it's a fix that didn't quite fix hard enough.
```
The existing healthcheck lib was pretty complicated and was hiding some
bugs (like the count always being 1), This is a reboot of the interface
and implementation to be significantly simpler and better tested.
Automatic merge from submit-queue
Use shared informers for proxy endpoints and service configs
Use shared informers instead of creating local controllers/reflectors
for the proxy's endpoints and service configs. This allows downstream
integrators to pass in preexisting shared informers to save on memory &
cpu usage.
This also enables the cache mutation detector for kube-proxy for those
presubmit jobs that already turn it on.
Follow-up to #43295 cc @wojtek-t
Will race with #43937 for conflicting changes 😄 cc @thockin
cc @smarterclayton @sttts @liggitt @deads2k @derekwaynecarr @eparis @kubernetes/rh-cluster-infra
Adding test cases for HC updates found a bug with an update that
simultaneously removes one port and adds another. Map iteration is
randomized, so sometimes no HC would be created.
Use shared informers instead of creating local controllers/reflectors
for the proxy's endpoints and service configs. This allows downstream
integrators to pass in preexisting shared informers to save on memory &
cpu usage.
This also enables the cache mutation detector for kube-proxy for those
presubmit jobs that already turn it on.
We need to apply filter rules on the way in (nodeports) and out (cluster
IPs). Testing here is insufficient to have caught this - will come back
for that.
We don't need the svcPortToInfoMap. Its only purpose was to
send "valid" local endpoints (those with valid IP and >0 port) to the
health checker. But we shouldn't be sending invalid endpoints to
the health checker anyway, because it can't do anything with them.
If we exclude invalid endpoints earlier, then we don't need
flattenValidEndpoints().
And if we don't need flattenValidEndpoints() it makes no sense to have
svcPortToInfoMap store hostPortInfo, since endpointsInfo is the same
thing as hostPortInfo except with a combined host:port.
And if svcPortToInfoMap now only stores valid endpointsInfos, it is
exactly the same thing as newEndpoints.
Automatic merge from submit-queue (batch tested with PRs 41672, 42084, 42233, 42165, 42273)
Don't sync IPtables before underlying store/reflector is fully synced
Ref #42108
Build on top of #42108 - only the second commit is unique.
Automatic merge from submit-queue
Extensible Userspace Proxy
This PR refactors the userspace proxy to allow for custom proxy socket implementations.
It changes the the ProxySocket interface to ensure that other packages can properly implement it (making sure all arguments are publicly exposed types, etc), and adds in a mechanism for an implementation to create an instance of the userspace proxy with a non-standard ProxySocket.
Custom ProxySockets are useful to inject additional logic into the actual proxying. For example, our idling proxier uses a custom proxy socket to hold connections and notify the cluster that idled scalable resources need to be woken up.
Also-Authored-By: Ben Bennett bbennett@redhat.com
Automatic merge from submit-queue (batch tested with PRs 42316, 41618, 42201, 42113, 42191)
Support unqualified and partially qualified domain name in DNS query in Windows kube-proxy
**What this PR does / why we need it**:
In Windows container networking, --dns-search is not currently supported on Windows Docker. Besides, even with --dns-suffix, inside Windows container DNS suffix is not appended to DNS query names. That makes unqualified domain name or partially qualified domain name in DNS query not able to resolve.
This PR provides a solution to resolve unqualified domain name or partially qualified domain name in DNS query for Windows container in Windows kube-proxy. It uses well-known Kubernetes DNS suffix as well host DNS suffix search list to append to the name in DNS query. DNS packet in kube-proxy UDP stream is modified as appropriate.
This PR affects the Windows kube-proxy only.
**Special notes for your reviewer**:
This PR is based on top of Anthony Howe's commit 48647fb, 0e37f0a and 7e2c71f which is already included in the PR 41487. Please only review commit b9dfb69.
**Release note**:
```release-note
Add DNS suffix search list support in Windows kube-proxy.
```
Automatic merge from submit-queue (batch tested with PRs 42200, 39535, 41708, 41487, 41335)
Update kube-proxy support for Windows
**What this PR does / why we need it**:
The kube-proxy is built upon the sophisticated iptables NAT rules. Windows does not have an equivalent capability. This introduces a change to the architecture of the user space mode of the Windows version of kube-proxy to match the capabilities of Windows.
The proxy is organized around service ports and portals. For each service a service port is created and then a portal, or iptables NAT rule, is opened for each service ip, external ip, node port, and ingress ip. This PR merges the service port and portal into a single concept of a "ServicePortPortal" where there is one connection opened for each of service IP, external ip, node port, and ingress IP.
This PR only affects the Windows kube-proxy. It is important for the Windows kube-proxy because it removes the limited portproxy rule and RRAS service and enables full tcp/udp capability to services.
**Special notes for your reviewer**:
**Release note**:
```
Add tcp/udp userspace proxy support for Windows.
```
This changes the userspace proxy so that it cleans up its conntrack
settings when a service is removed (as the iptables proxy already
does). This could theoretically cause problems when a UDP service
as deleted and recreated quickly (with the same IP address). As
long as packets from the same UDP source IP and port were going to
the same destination IP and port, the the conntrack would apply and
the packets would be sent to the old destination.
This is astronomically unlikely if you did not specify the IP address
to use in the service, and even then, only happens with an "established"
UDP connection. However, in cases where a service could be "switched"
between using the iptables proxy and the userspace proxy, this case
becomes much more frequent.
This commit makes the userspace proxy keep an ObjectReference to the
service being proxied. This allows the consumers of the `ServiceInfo`
struct, like `ProxySockets` to emit events about or otherwise refer to
the service.
This commit adds a new method for constructing userspace proxiers,
`NewCustomProxier`. `NewCustomProxier` functions identically to
`NewProxier`, except that it allows a custom constructor method to
be passed in to construct instances of ProxySocket.
This commit makes it possible for the `ProxySocket` interface to be
implemented by types outside of the `userspace` package. It mainly just
exposes relevant types and fields as public.
This commit adds a method to the `LoadBalancer` interface in the
userspace proxy which allows consumers of the `LoadBalancer` to check if
it thinks a given service has endpoints available.