Commit Graph

439 Commits (942a51109ad2c36ddba7877e7a008af108791902)

Author SHA1 Message Date
Brad Davidson 942a51109a Separate persistent config struct from LoadBalancer and make fields private
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 67fd5fa9e5)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-12-10 17:02:07 -08:00
Brad Davidson c4bee6fa8e Move http/socks proxy stuff to separate file
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 13e9113787)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-12-10 17:02:07 -08:00
Brad Davidson 51404d0e83 Fix issue with loadbalancer failover to default server
The loadbalancer should only fail over to the default server if all other server have failed, and it should force fail-back to a preferred server as soon as one passes health checks.

The loadbalancer tests have been improved to ensure that this occurs.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-11-14 08:19:39 -08:00
Brad Davidson e08085f1e9 Add nonroot-devices flag to agent CLI
Add new flag that is passed through to the device_ownership_from_security_context parameter in the containerd CRI config. This is not possible to change without providing a complete custom containerd.toml template so we should add a flag for it.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 56fb3b0991)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-11-06 12:41:11 -08:00
manuelbuil 29fd916cc9 Add the nvidia runtime cdi
Signed-off-by: manuelbuil <mbuil@suse.com>
2024-10-12 07:37:48 +02:00
Brad Davidson 72b0eb5f5a Update tcpproxy for import path change
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 1ae9ca73f5)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-10-10 11:40:39 -07:00
Vitor Savian 0a2b383a32 Add user path to runtimes search
Signed-off-by: Vitor Savian <vitor.savian@suse.com>
2024-10-08 13:19:18 -03:00
Brad Davidson ca84f13846 Fix hosts.toml header var
Resolves issue from 270f85e468 that prevented old hosts.toml files from being cleaned up.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-09-10 15:00:04 -07:00
Brad Davidson 9e06189a7c Only clean up containerd hosts dirs managed by k3s
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 270f85e468)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-09-06 11:30:23 -07:00
Derek Nola 6965b5d1a9 Allow Pprof and Superisor metrics in standalone mode (#10576)
* Allow pprof to run on server with `--disable-agent`
* Allow supervisor metrics to run on server with `--disable-agent`

Signed-off-by: Derek Nola <derek.nola@suse.com>
2024-08-06 08:51:16 -07:00
Brad Davidson 7190c74acc
[release-1.30] Backports for 2024-08 release cycle (#10664)
* Use pagination when retrieving etcd snapshot list

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit c2216a62ad)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>

* Update secretsencrypt pagination

Make secretsencrypt page size and iteration consistent with other paginators

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 891e72f90f)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>

* Cap length of generated name used for servicelb daemonset

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 21611c5665)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>

* Fix ipv6 sysctl required by non-ipv6 LoadBalancer service

This is a partial revert of 095ecdb034,
with the workaround moved into klipper-lb.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit d4c3422a85)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>

* remove deprecated use of wait functions

Signed-off-by: Will <will7989@hotmail.com>
(cherry picked from commit e4f3cc7b54)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>

* Update pkg/secretsencrypt/config.go

Co-authored-by: Brad Davidson <brad@oatmail.org>
Signed-off-by: Will Andrews <will7989@hotmail.com>
(cherry picked from commit 3ec086f6f7)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>

* Update pkg/cluster/managed.go

Co-authored-by: Derek Nola <derek.nola@suse.com>
Signed-off-by: Will Andrews <will7989@hotmail.com>
(cherry picked from commit e2179aa957)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>

* Wire lasso metrics up to common gatherer

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit e168438d44)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>

* Fix cloudprovider controller name

Looking at metrics revealed the cloudprovider controller name was anempty string.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit bffdf463e1)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>

---------

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Signed-off-by: Will <will7989@hotmail.com>
Signed-off-by: Will Andrews <will7989@hotmail.com>
Co-authored-by: Will <will7989@hotmail.com>
Co-authored-by: Derek Nola <derek.nola@suse.com>
2024-08-05 09:35:00 -07:00
Brad Davidson 6cb63dd766 Add dial duration to debug error message
This should give us more detail on how long dials take before failing, so that we can perhaps better tune the retry loop in the future.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-07-15 10:14:20 -07:00
Brad Davidson 280aaa3a79 Fix IPv6 primary node-ip handling
I should have caught `[]string{cfg.NodeIP}[0]` and `[]string{envInfo.NodeIP.String()}[0]` in code review...

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-07-15 10:14:20 -07:00
Brad Davidson 97250ad656 Fix agents removing configured supervisor address
We shouldn't be replacing the configured server address on agents. Doing
so breaks the agent's ability to fall back to the fixed registration
endpoint when all servers are down, since we replaced it with the first
discovered apiserver address. The fixed registration endpoint will be
restored as default when the service is restarted, but this is not the
correct behavior. This should have only been done on etcd-only nodes
that start up using their local supervisor, but need to switch to a
control-plane node as soon as one is available.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-07-15 10:14:20 -07:00
Brad Davidson 4bf6fffe45 Fix reentrant rlock in loadbalancer.dialContext
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-07-15 10:14:20 -07:00
Brad Davidson b6ea5dd54f Ensure remotedialer kubelet connections use kubelet bind address
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit eb8bd15889)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-07-15 10:14:20 -07:00
Roberto Bonafiglia faeaf1b01b Update flannel to v0.25.4 and fixed issue with IPv6 mask
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2024-07-01 18:57:34 +02:00
Brad Davidson b4d4ed8f01 Fix agent supervisor port using apiserver port instead
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-06-13 15:13:21 -07:00
Harrison Affel f10cb29534 fix typo, use rancher/permissions
Signed-off-by: Harrison Affel <harrisonaffel@gmail.com>
2024-06-07 08:00:44 -07:00
Brad Davidson c0450a2cb4 Fix race condition panic in loadbalancer.nextServer
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-06-07 07:39:48 -07:00
fmoral2 043b1eac5d
Add test for `isValidResolvConf` (#10302)
Signed-off-by: Francisco <francisco.moral@suse.com>
2024-06-06 17:02:31 -03:00
Brad Davidson 1661f1024a Fix bug that caused agents to bypass local loadbalancer
If proxy.SetAPIServerPort was called multiple times, all calls after the
first one would cause the apiserver address to be set to the default
server address, bypassing the local load-balancer. This was most likely
to occur on RKE2, where the supervisor may be up for a period of time
before it is ready to manage node password secrets, causing the agent
to retry.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-06-04 11:18:45 -07:00
Brad Davidson f9130d537d Fix embedded mirror blocked by SAR RBAC and re-enable test
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 08:33:18 -07:00
Brad Davidson 307f07bd61 Fix issue caused by sole server marked as failed under load
If health checks are failing for all servers, make a second pass through the server list with health-checks ignored before returning failure

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-30 11:47:23 -07:00
Brad Davidson ed23a2bb48 Fix netpol crash when node remains tained unintialized
It is concievable that users might take more than 60 seconds to deploy their own cloud-provider. Instead of exiting, we should wait forever, but with more logging to indicate what's being waited on.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-28 23:34:44 -07:00
Brad Davidson ff679fb3ab Refactor supervisor listener startup and add metrics
* Refactor agent supervisor listener startup and authn/authz to use upstream
  auth delegators to perform for SubjectAccessReview for access to
  metrics.
* Convert spegel and pprof handlers over to new structure.
* Promote bind-address to agent flag to allow setting supervisor bind
  address for both agent and server.
* Promote enable-pprof to agent flag to allow profiling agents. Access
  to the pprof endpoint now requires client cert auth, similar to the
  spegel registry api endpoint.
* Add prometheus metrics handler.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-28 16:24:57 -07:00
Thomas Ferrandiz 6dcd52eb8e Use TrafficManager interface when calling flannel
Signed-off-by: Thomas Ferrandiz <thomas.ferrandiz@suse.com>
2024-05-27 13:05:18 +00:00
Thomas Ferrandiz af7bcc3900 Bump flannel version to v0.25.2
Signed-off-by: Thomas Ferrandiz <thomas.ferrandiz@suse.com>
2024-05-27 13:05:18 +00:00
linxin f24ba9d3a9 Validate resolv.conf for presence of nameserver entries
Co-authored-by: Brad Davidson <brad@oatmail.org>
Signed-off-by: linxin <linxin@geedgenetworks.com>
2024-05-24 12:39:34 -07:00
Brad Davidson 30999f9a07 Switch stargz over to cri registry config_path
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-23 13:35:15 -07:00
Brad Davidson 7374010c0c Use fixed stream server bind address for cri-dockerd
Will now use 127.0.0.1:10010, same as containerd's CRI

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-23 13:33:27 -07:00
Brad Davidson 5f6b813cc8 Add WithSkipMissing to not fail import on missing blobs
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-23 13:32:22 -07:00
Harrison Affel 1d22b6971f windows changes
Signed-off-by: Harrison Affel <harrisonaffel@gmail.com>
2024-05-16 14:40:27 -07:00
Hussein Galal 144f5ad333
Kubernetes V1.30.0-k3s1 (#10063)
* kubernetes 1.30.0-k3s1

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Update go version to v1.22.2

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update dynamiclistener and helm-controller

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update go in go.mod to 1.22.2

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update go in Dockerfiles

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update cri-dockerd

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Add proctitle package with linux and windows constraints

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go mod tidy

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Fixing setproctitle function

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update dynamiclistener to v0.6.0-rc1

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

---------

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2024-05-06 19:42:27 +03:00
Brad Davidson f2961fb5d2 Add workaround for containerd hosts.toml bug
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-03 20:47:54 -07:00
Brad Davidson 7f659759dd Add certificate expiry check and warnings
* Add ADR
* Add `k3s certificate check` command.
* Add periodic check and events when certs are about to expire.
* Add metrics for certificate validity remaining, labeled by cert subject

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-28 12:05:21 -07:00
Derek Nola 6a42c6fcfe
Remove old pinned dependencies (#9806)
Signed-off-by: Derek Nola <derek.nola@suse.com>
2024-03-28 10:09:48 -07:00
Derek Nola 14f54d0b26
Transition from deprecated pointer library to ptr (#9801)
Signed-off-by: Derek Nola <derek.nola@suse.com>
2024-03-28 10:07:02 -07:00
Brad Davidson c51d7bfbd1 Add health-check support to loadbalancer
* Adds support for health-checking loadbalancer servers. If a
  health-check fails when dialing, all existing connections to the
  server will be closed.
* Wires up a remotedialer tunnel connectivity check as the health check
  for supervisor/apiserver connections.
* Wires up a simple ping request to the supervisor port as the health
  check for etcd connections.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-27 16:50:27 -07:00
Brad Davidson f099bfa508 Fix error when image has already been pulled
CRI and containerd APIs disagree about the registry names - CRI supports
index.docker.io as an alias for docker.io, while containerd does not.
Use the actual stored RepoTag to determine what image to ask containerd for.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-26 16:19:40 -07:00
Brad Davidson bba3e3c66b Fix wildcard entry upstream fallback
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-12 23:31:16 -07:00
Brad Davidson fe2ca9ecf1 Warn and suppress duplicate registry mirror endpoints
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-07 16:30:06 -08:00
Roberto Bonafiglia 88c431aea5 Adjust first node-ip based on configured clusterCIDR
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2024-03-06 11:10:41 +01:00
Vitor Savian 59c724f7a6 Fix wildcard with embbeded registry test
Signed-off-by: Vitor Savian <vitor.savian@suse.com>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-05 14:38:36 -08:00
Flavio Castelli 64e4f0e6e7 fix: use correct wasm shims names
Fix the wasm shim detection and the containerd configuration generation.

Prior to this commit, the binary and the `RuntimeType` values were not
correct.

Signed-off-by: Flavio Castelli <fcastelli@suse.com>
2024-03-05 13:12:08 -08:00
Brad Davidson b164d7a270 Fix additional corner cases in registries handling
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-04 11:59:33 -08:00
Brad Davidson 513c3416e7 Tweak netpol node wait logs
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-01 12:01:34 -08:00
Brad Davidson be569f65a9 Fix NodeHosts on dual-stack clusters
* Add both dual-stack addresses to the node hosts file
* Add hostname to hosts file as alias for node name to ensure consistent resolution

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-01 11:59:59 -08:00
Brad Davidson 86f102134e Fix netpol startup when flannel is disabled
Don't break out of the poll loop if we can't get the node, RBAC might not be ready yet.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-02-26 14:58:48 -08:00
Derek Nola fae41a8b2a Rename AgentReady to ContainerRuntimeReady for better clarity
Signed-off-by: Derek Nola <derek.nola@suse.com>
2024-02-21 12:21:19 -08:00