If health checks are failing for all servers, make a second pass through the server list with health-checks ignored before returning failure
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit ca39614d4e)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Adds support for health-checking loadbalancer servers. If a
health-check fails when dialing, all existing connections to the
server will be closed.
* Wires up a remotedialer tunnel connectivity check as the health check
for supervisor/apiserver connections.
* Wires up a simple ping request to the supervisor port as the health
check for etcd connections.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
If the port wanted by the client load balancer is in TIME_WAIT, startup
will fail. Set SO_REUSEPORT so that it can be listened on again
immediately.
The configurable Listen call wants a context, so plumb that through as
well.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Always use static ports for the load-balancers
This fixes an issue where RKE2 kube-proxy daemonset pods were failing to
communicate with the apiserver when RKE2 was restarted because the
load-balancer used a different port every time it started up.
This also changes the apiserver load-balancer port to be 1 below the
supervisor port instead of 1 above it. This makes the apiserver port
consistent at 6443 across servers and agents on RKE2.
Additional fixes below were required to successfully test and use this change
on etcd-only nodes.
* Actually add lb-server-port flag to CLI
* Fix nil pointer when starting server with --disable-etcd but no --server
* Don't try to use full URI as initial load-balancer endpoint
* Fix etcd load-balancer pool updates
* Update dynamiclistener to fix cert updates on etcd-only nodes
* Handle recursive initial server URL in load balancer
* Don't run the deploy controller on etcd-only nodes