Commit Graph

37 Commits (c8278053cbff2488067845eba1269fa9ecbb9b18)

Author SHA1 Message Date
Brad Davidson 3d2fabb013 Add loadbalancer metrics
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-12-06 11:45:34 -08:00
Brad Davidson 911ee19a93 Refactor load balancer server list and health checking
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-12-06 11:45:34 -08:00
Brad Davidson 67fd5fa9e5 Separate persistent config struct from LoadBalancer and make fields private
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-12-06 11:45:34 -08:00
Brad Davidson 13e9113787 Move http/socks proxy stuff to separate file
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-12-06 11:45:34 -08:00
Brad Davidson cd4ddedbc9 Fix issue with loadbalancer failover to default server
The loadbalancer should only fail over to the default server if all other server have failed, and it should force fail-back to a preferred server as soon as one passes health checks.

The loadbalancer tests have been improved to ensure that this occurs.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-11-13 19:41:45 -08:00
Brad Davidson 1ae9ca73f5 Update tcpproxy for import path change
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-10-09 11:46:08 -07:00
Brad Davidson cb6bf74bc4 Add dial duration to debug error message
This should give us more detail on how long dials take before failing, so that we can perhaps better tune the retry loop in the future.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-07-15 09:46:52 -07:00
Brad Davidson 9d0c2e0000 Fix reentrant rlock in loadbalancer.dialContext
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-07-15 09:46:52 -07:00
Brad Davidson c0450a2cb4 Fix race condition panic in loadbalancer.nextServer
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-06-07 07:39:48 -07:00
Brad Davidson 1661f1024a Fix bug that caused agents to bypass local loadbalancer
If proxy.SetAPIServerPort was called multiple times, all calls after the
first one would cause the apiserver address to be set to the default
server address, bypassing the local load-balancer. This was most likely
to occur on RKE2, where the supervisor may be up for a period of time
before it is ready to manage node password secrets, causing the agent
to retry.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-06-04 11:18:45 -07:00
Brad Davidson 307f07bd61 Fix issue caused by sole server marked as failed under load
If health checks are failing for all servers, make a second pass through the server list with health-checks ignored before returning failure

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-30 11:47:23 -07:00
Brad Davidson c51d7bfbd1 Add health-check support to loadbalancer
* Adds support for health-checking loadbalancer servers. If a
  health-check fails when dialing, all existing connections to the
  server will be closed.
* Wires up a remotedialer tunnel connectivity check as the health check
  for supervisor/apiserver connections.
* Wires up a simple ping request to the supervisor port as the health
  check for etcd connections.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-27 16:50:27 -07:00
Brad Davidson c87e6e5f7e Move proxy dialer out of init() and fix crash
* Fixes issue where proxy support only honored server address via K3S_URL, not CLI or config.
* Fixes crash when agent proxy is enabled, but proxy env vars do not return a proxy URL for the server address (server URL is in NO_PROXY list).
* Adds tests

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-01-11 16:12:15 -08:00
Pierre bbd68f3a50
Rebase & Squash (#9070)
Signed-off-by: Yodo <pierre@azmed.co>
2024-01-02 12:05:36 -08:00
Brad Davidson ece4d8e45c Fix tests to not hide failure location in dummp assert functions
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-04-04 12:02:22 -07:00
Brad Davidson e54ceaa497 Fix issue with stale connections to removed LB server
Track LB connections through each server so that they can be closed when it is removed.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-04-04 12:02:22 -07:00
Derek Nola 06d81cb936
Replace deprecated ioutil package (#6230)
* Replace ioutil package
* check integration test null pointer
* Remove rotate retries

Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-10-07 17:36:57 -07:00
Brad Davidson c8447dca56 Bump golang to 1.18.1
Also update all use of 'go get' => 'go install', update CI tooling for
1.18 compatibility, and gofmt everything so lint passes.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-05-11 14:39:07 -07:00
Brad Davidson b12cd62935 Move IPv4/v6 selection into helpers
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-15 01:02:42 -07:00
Dirk Müller fa0fa8b1d0 Update golangci-lint to 1.45.2
This requires a further set of gofmt -s improvements to the
code, but nothing major. golangci-lint 1.45.2 brings golang 1.18
support which might be needed in the future.

Signed-off-by: Dirk Müller <dirk@dmllr.de>
2022-04-13 14:48:42 -07:00
Roberto Bonafiglia 06c779c57d Fixed loadbalancer in case of IPv6 addresses
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2022-03-31 11:49:30 +02:00
Luther Monson 9a849b1bb7
[master] changing package to k3s-io (#4846)
* changing package to k3s-io

Signed-off-by: Luther Monson <luther.monson@gmail.com>

Co-authored-by: Derek Nola <derek.nola@suse.com>
2022-03-02 15:47:27 -08:00
Brad Davidson 5014c9e0e8 Fix adding etcd-only node to existing cluster
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-02-28 19:56:08 -08:00
Brad Davidson e95b75409a Fix lint failures
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-08-20 18:47:16 -07:00
Hussein Galal e322924781
Reset load balancer state during restoraion (#3877)
* Reset load balancer state during restoraion

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Reset load balancer state during restoraion

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-08-18 01:02:30 +02:00
Derek Nola 21c8a33647
Introduction of Integration Tests (#3695)
* Commit of new etcd snapshot integration tests.
* Updated integration github action to not run on doc changes.
* Update Drone runner to only run unit tests

Signed-off-by: dereknola <derek.nola@suse.com>
2021-07-26 09:59:33 -07:00
William Zhang a4c992ce52 🐳 burp to inetaf/tcpproxy
Problem:
    tcpproxy repository has been moved out of the github.com/google org to github.com/inetaf.

    Solution:
    Switch to the new repo.
    FYI: https://godoc.org/inet.af/tcpproxy/

Signed-off-by: William Zhang <warmchang@outlook.com>
2021-07-08 16:58:09 -07:00
Jamie Phillips 7345ac35ae
Initial windows support for agent (#3375)
Signed-off-by: Jamie Phillips <jamie.phillips@suse.com>
2021-06-01 12:29:46 -07:00
Brad Davidson c0d129003b Handle loadbalancer port in TIME_WAIT
If the port wanted by the client load balancer is in TIME_WAIT, startup
will fail. Set SO_REUSEPORT so that it can be listened on again
immediately.

The configurable Listen call wants a context, so plumb that through as
well.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-03-08 17:05:25 -08:00
Brad Davidson 7cdfaad6ce
Always use static ports for client load-balancers (#3026)
* Always use static ports for the load-balancers

This fixes an issue where RKE2 kube-proxy daemonset pods were failing to
communicate with the apiserver when RKE2 was restarted because the
load-balancer used a different port every time it started up.

This also changes the apiserver load-balancer port to be 1 below the
supervisor port instead of 1 above it. This makes the apiserver port
consistent at 6443 across servers and agents on RKE2.

Additional fixes below were required to successfully test and use this change
on etcd-only nodes.

* Actually add lb-server-port flag to CLI
* Fix nil pointer when starting server with --disable-etcd but no --server
* Don't try to use full URI as initial load-balancer endpoint
* Fix etcd load-balancer pool updates
* Update dynamiclistener to fix cert updates on etcd-only nodes
* Handle recursive initial server URL in load balancer
* Don't run the deploy controller on etcd-only nodes
2021-03-06 02:29:57 -08:00
Hussein Galal 5749f66aa3
Add disable flags for control components (#2900)
* Add disable flags to control components

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* golint

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fixes to disable flags

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Add comments to functions

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Fix joining problem

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* golint

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix ticker

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix role labels

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-02-12 17:35:57 +02:00
Erik Wilson 992ca52c31
Enable go test in ci 2020-11-05 09:48:53 -07:00
Erik Wilson 720197b9b1
Fix linting issues 2020-08-28 17:18:29 -07:00
Darren Shepherd 7e59c0801e Make program name a variable to be changed at compile time 2020-06-06 16:39:41 -07:00
Darren Shepherd 2f5ee914f9 Add supervisor port
In k3s today the kubernetes API and the /v1-k3s API are combined into
one http server.  In rke2 we are running unmodified, non-embedded Kubernetes
and as such it is preferred to run k8s and the /v1-k3s API on different
ports.  The /v1-k3s API port is called the SupervisorPort in the code.

To support this separation of ports a new shim was added on the client in
then pkg/agent/proxy package that will launch two load balancers instead
of just one load balancer.  One load balancer for 6443 and the other
for 9345 (which is the supervisor port).
2020-05-05 15:54:51 -07:00
Erik Wilson 98254a3412 Change load balancer logging to debug 2019-08-08 10:48:11 -07:00
Erik Wilson a17e336993 Use go tcpproxy 2019-07-30 09:53:15 -07:00