Commit Graph

450 Commits (3347947dd2701a666f100ea0f31bb98122fc18c6)

Author SHA1 Message Date
Hussein Galal 9515436e80
Load kernel modules for nft in agent setup (#11597)
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2025-01-14 01:14:20 +02:00
Vitor Savian 4331f452bb
Add auto import images for containerd image store
* Add auto import images for containerd image store

* Add auto import images

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

* Fix EOF error log when importing tarball files

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

* Delaying queue

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

* Add parse for images

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

---------

Signed-off-by: Vitor Savian <vitor.savian@suse.com>
2025-01-11 01:58:20 -03:00
Brad Davidson fb7b765383 Add client-side certificate generation support
Clients now generate keys client-side and send CSRs. If the server is down-level and sends a cert+key instead of just responding with a cert signed with the client's public key, we use the key from the server instead.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit caeebc52b7)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-01-10 16:11:04 -08:00
Brad Davidson fcc5f32cfe Remove unused Certificate field from Node struct
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 5b1d57f7b9)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-01-10 16:11:04 -08:00
galal-hussein 481758e8ac RBAC changes for compat with AuthorizeNodeWithSelectors
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit b4747703b0)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2025-01-10 16:11:04 -08:00
Arne Winter 18ed589122 add node-internal-dns/node-external-dns address pass-through support (#10852)
* add --node-internal-dns and --node-external-dns

Signed-off-by: Arne Winter <github@arnewinter.dev>
Co-authored-by: Brad Davidson <brad@oatmail.org>
(cherry picked from commit c4c11e51f1)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-12-11 17:09:48 -08:00
Brad Davidson 4bc0ffdf9d Fix agent tunnel address on rke2
Fix issue where rke2 tunnel was trying to connect to apiserver port instead of supervisor

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 5a5b136151)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-12-10 17:02:07 -08:00
Brad Davidson 42d36cf5a2 Fall back to polling the supervisor for apiserver addresses when the watch fails
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit c7ff957cae)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-12-10 17:02:07 -08:00
Brad Davidson 89593847d7 Use helper to set consistent rest.Config rate limits and timeouts
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 71918e0d69)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-12-10 17:02:07 -08:00
Brad Davidson 46e1c57fc9 Add loadbalancer metrics
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 3d2fabb013)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-12-10 17:02:07 -08:00
Brad Davidson ea31a93f6f Refactor load balancer server list and health checking
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 911ee19a93)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-12-10 17:02:07 -08:00
Brad Davidson 942a51109a Separate persistent config struct from LoadBalancer and make fields private
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 67fd5fa9e5)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-12-10 17:02:07 -08:00
Brad Davidson c4bee6fa8e Move http/socks proxy stuff to separate file
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 13e9113787)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-12-10 17:02:07 -08:00
Brad Davidson 51404d0e83 Fix issue with loadbalancer failover to default server
The loadbalancer should only fail over to the default server if all other server have failed, and it should force fail-back to a preferred server as soon as one passes health checks.

The loadbalancer tests have been improved to ensure that this occurs.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-11-14 08:19:39 -08:00
Brad Davidson e08085f1e9 Add nonroot-devices flag to agent CLI
Add new flag that is passed through to the device_ownership_from_security_context parameter in the containerd CRI config. This is not possible to change without providing a complete custom containerd.toml template so we should add a flag for it.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 56fb3b0991)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-11-06 12:41:11 -08:00
manuelbuil 29fd916cc9 Add the nvidia runtime cdi
Signed-off-by: manuelbuil <mbuil@suse.com>
2024-10-12 07:37:48 +02:00
Brad Davidson 72b0eb5f5a Update tcpproxy for import path change
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 1ae9ca73f5)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-10-10 11:40:39 -07:00
Vitor Savian 0a2b383a32 Add user path to runtimes search
Signed-off-by: Vitor Savian <vitor.savian@suse.com>
2024-10-08 13:19:18 -03:00
Brad Davidson ca84f13846 Fix hosts.toml header var
Resolves issue from 270f85e468 that prevented old hosts.toml files from being cleaned up.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-09-10 15:00:04 -07:00
Brad Davidson 9e06189a7c Only clean up containerd hosts dirs managed by k3s
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 270f85e468)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-09-06 11:30:23 -07:00
Derek Nola 6965b5d1a9 Allow Pprof and Superisor metrics in standalone mode (#10576)
* Allow pprof to run on server with `--disable-agent`
* Allow supervisor metrics to run on server with `--disable-agent`

Signed-off-by: Derek Nola <derek.nola@suse.com>
2024-08-06 08:51:16 -07:00
Brad Davidson 7190c74acc
[release-1.30] Backports for 2024-08 release cycle (#10664)
* Use pagination when retrieving etcd snapshot list

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit c2216a62ad)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>

* Update secretsencrypt pagination

Make secretsencrypt page size and iteration consistent with other paginators

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 891e72f90f)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>

* Cap length of generated name used for servicelb daemonset

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 21611c5665)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>

* Fix ipv6 sysctl required by non-ipv6 LoadBalancer service

This is a partial revert of 095ecdb034,
with the workaround moved into klipper-lb.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit d4c3422a85)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>

* remove deprecated use of wait functions

Signed-off-by: Will <will7989@hotmail.com>
(cherry picked from commit e4f3cc7b54)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>

* Update pkg/secretsencrypt/config.go

Co-authored-by: Brad Davidson <brad@oatmail.org>
Signed-off-by: Will Andrews <will7989@hotmail.com>
(cherry picked from commit 3ec086f6f7)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>

* Update pkg/cluster/managed.go

Co-authored-by: Derek Nola <derek.nola@suse.com>
Signed-off-by: Will Andrews <will7989@hotmail.com>
(cherry picked from commit e2179aa957)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>

* Wire lasso metrics up to common gatherer

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit e168438d44)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>

* Fix cloudprovider controller name

Looking at metrics revealed the cloudprovider controller name was anempty string.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit bffdf463e1)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>

---------

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Signed-off-by: Will <will7989@hotmail.com>
Signed-off-by: Will Andrews <will7989@hotmail.com>
Co-authored-by: Will <will7989@hotmail.com>
Co-authored-by: Derek Nola <derek.nola@suse.com>
2024-08-05 09:35:00 -07:00
Brad Davidson 6cb63dd766 Add dial duration to debug error message
This should give us more detail on how long dials take before failing, so that we can perhaps better tune the retry loop in the future.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-07-15 10:14:20 -07:00
Brad Davidson 280aaa3a79 Fix IPv6 primary node-ip handling
I should have caught `[]string{cfg.NodeIP}[0]` and `[]string{envInfo.NodeIP.String()}[0]` in code review...

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-07-15 10:14:20 -07:00
Brad Davidson 97250ad656 Fix agents removing configured supervisor address
We shouldn't be replacing the configured server address on agents. Doing
so breaks the agent's ability to fall back to the fixed registration
endpoint when all servers are down, since we replaced it with the first
discovered apiserver address. The fixed registration endpoint will be
restored as default when the service is restarted, but this is not the
correct behavior. This should have only been done on etcd-only nodes
that start up using their local supervisor, but need to switch to a
control-plane node as soon as one is available.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-07-15 10:14:20 -07:00
Brad Davidson 4bf6fffe45 Fix reentrant rlock in loadbalancer.dialContext
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-07-15 10:14:20 -07:00
Brad Davidson b6ea5dd54f Ensure remotedialer kubelet connections use kubelet bind address
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit eb8bd15889)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-07-15 10:14:20 -07:00
Roberto Bonafiglia faeaf1b01b Update flannel to v0.25.4 and fixed issue with IPv6 mask
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2024-07-01 18:57:34 +02:00
Brad Davidson b4d4ed8f01 Fix agent supervisor port using apiserver port instead
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-06-13 15:13:21 -07:00
Harrison Affel f10cb29534 fix typo, use rancher/permissions
Signed-off-by: Harrison Affel <harrisonaffel@gmail.com>
2024-06-07 08:00:44 -07:00
Brad Davidson c0450a2cb4 Fix race condition panic in loadbalancer.nextServer
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-06-07 07:39:48 -07:00
fmoral2 043b1eac5d
Add test for `isValidResolvConf` (#10302)
Signed-off-by: Francisco <francisco.moral@suse.com>
2024-06-06 17:02:31 -03:00
Brad Davidson 1661f1024a Fix bug that caused agents to bypass local loadbalancer
If proxy.SetAPIServerPort was called multiple times, all calls after the
first one would cause the apiserver address to be set to the default
server address, bypassing the local load-balancer. This was most likely
to occur on RKE2, where the supervisor may be up for a period of time
before it is ready to manage node password secrets, causing the agent
to retry.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-06-04 11:18:45 -07:00
Brad Davidson f9130d537d Fix embedded mirror blocked by SAR RBAC and re-enable test
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 08:33:18 -07:00
Brad Davidson 307f07bd61 Fix issue caused by sole server marked as failed under load
If health checks are failing for all servers, make a second pass through the server list with health-checks ignored before returning failure

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-30 11:47:23 -07:00
Brad Davidson ed23a2bb48 Fix netpol crash when node remains tained unintialized
It is concievable that users might take more than 60 seconds to deploy their own cloud-provider. Instead of exiting, we should wait forever, but with more logging to indicate what's being waited on.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-28 23:34:44 -07:00
Brad Davidson ff679fb3ab Refactor supervisor listener startup and add metrics
* Refactor agent supervisor listener startup and authn/authz to use upstream
  auth delegators to perform for SubjectAccessReview for access to
  metrics.
* Convert spegel and pprof handlers over to new structure.
* Promote bind-address to agent flag to allow setting supervisor bind
  address for both agent and server.
* Promote enable-pprof to agent flag to allow profiling agents. Access
  to the pprof endpoint now requires client cert auth, similar to the
  spegel registry api endpoint.
* Add prometheus metrics handler.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-28 16:24:57 -07:00
Thomas Ferrandiz 6dcd52eb8e Use TrafficManager interface when calling flannel
Signed-off-by: Thomas Ferrandiz <thomas.ferrandiz@suse.com>
2024-05-27 13:05:18 +00:00
Thomas Ferrandiz af7bcc3900 Bump flannel version to v0.25.2
Signed-off-by: Thomas Ferrandiz <thomas.ferrandiz@suse.com>
2024-05-27 13:05:18 +00:00
linxin f24ba9d3a9 Validate resolv.conf for presence of nameserver entries
Co-authored-by: Brad Davidson <brad@oatmail.org>
Signed-off-by: linxin <linxin@geedgenetworks.com>
2024-05-24 12:39:34 -07:00
Brad Davidson 30999f9a07 Switch stargz over to cri registry config_path
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-23 13:35:15 -07:00
Brad Davidson 7374010c0c Use fixed stream server bind address for cri-dockerd
Will now use 127.0.0.1:10010, same as containerd's CRI

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-23 13:33:27 -07:00
Brad Davidson 5f6b813cc8 Add WithSkipMissing to not fail import on missing blobs
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-23 13:32:22 -07:00
Harrison Affel 1d22b6971f windows changes
Signed-off-by: Harrison Affel <harrisonaffel@gmail.com>
2024-05-16 14:40:27 -07:00
Hussein Galal 144f5ad333
Kubernetes V1.30.0-k3s1 (#10063)
* kubernetes 1.30.0-k3s1

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Update go version to v1.22.2

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update dynamiclistener and helm-controller

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update go in go.mod to 1.22.2

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update go in Dockerfiles

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update cri-dockerd

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Add proctitle package with linux and windows constraints

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go mod tidy

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Fixing setproctitle function

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update dynamiclistener to v0.6.0-rc1

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

---------

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2024-05-06 19:42:27 +03:00
Brad Davidson f2961fb5d2 Add workaround for containerd hosts.toml bug
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-03 20:47:54 -07:00
Brad Davidson 7f659759dd Add certificate expiry check and warnings
* Add ADR
* Add `k3s certificate check` command.
* Add periodic check and events when certs are about to expire.
* Add metrics for certificate validity remaining, labeled by cert subject

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-28 12:05:21 -07:00
Derek Nola 6a42c6fcfe
Remove old pinned dependencies (#9806)
Signed-off-by: Derek Nola <derek.nola@suse.com>
2024-03-28 10:09:48 -07:00
Derek Nola 14f54d0b26
Transition from deprecated pointer library to ptr (#9801)
Signed-off-by: Derek Nola <derek.nola@suse.com>
2024-03-28 10:07:02 -07:00
Brad Davidson c51d7bfbd1 Add health-check support to loadbalancer
* Adds support for health-checking loadbalancer servers. If a
  health-check fails when dialing, all existing connections to the
  server will be closed.
* Wires up a remotedialer tunnel connectivity check as the health check
  for supervisor/apiserver connections.
* Wires up a simple ping request to the supervisor port as the health
  check for etcd connections.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-27 16:50:27 -07:00