Commit Graph

423 Commits (12612a5f46bf1d1a97169c4b32d66488e8011f76)

Author SHA1 Message Date
Brad Davidson 92f8dc0a15 Ensure remotedialer kubelet connections use kubelet bind address
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit eb8bd15889)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-07-15 10:14:33 -07:00
Roberto Bonafiglia b4b156d9d1 Update flannel to v0.25.4 and fixed issue with IPv6 mask
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2024-07-01 18:58:20 +02:00
Brad Davidson 4a5f69fae1 Fix agent supervisor port using apiserver port instead
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-06-13 15:13:34 -07:00
Harrison Affel 125f5bf501 fix typo, use rancher/permissions
Signed-off-by: Harrison Affel <harrisonaffel@gmail.com>
2024-06-07 08:31:26 -07:00
Brad Davidson 3ef137a8d2 Fix race condition panic in loadbalancer.nextServer
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-06-07 07:40:10 -07:00
fmoral2 12864fb665
Add test for `isValidResolvConf` (#10302)
Signed-off-by: Francisco <francisco.moral@suse.com>
2024-06-07 11:07:27 -03:00
Brad Davidson 485eaf31b4 Fix bug that caused agents to bypass local loadbalancer
If proxy.SetAPIServerPort was called multiple times, all calls after the
first one would cause the apiserver address to be set to the default
server address, bypassing the local load-balancer. This was most likely
to occur on RKE2, where the supervisor may be up for a period of time
before it is ready to manage node password secrets, causing the agent
to retry.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 1661f1024a)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-06-04 12:48:16 -07:00
Brad Davidson 2c50f4aa5b Fix embedded mirror blocked by SAR RBAC and re-enable test
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 09:16:55 -07:00
Brad Davidson 8262c02cdd Fix issue caused by sole server marked as failed under load
If health checks are failing for all servers, make a second pass through the server list with health-checks ignored before returning failure

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit ca39614d4e)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 09:16:55 -07:00
Brad Davidson 2e7b394713 Fix netpol crash when node remains tained unintialized
It is concievable that users might take more than 60 seconds to deploy their own cloud-provider. Instead of exiting, we should wait forever, but with more logging to indicate what's being waited on.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit ed23a2bb48)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 09:16:55 -07:00
Brad Davidson 7ef30a2c60 Refactor supervisor listener startup and add metrics
* Refactor agent supervisor listener startup and authn/authz to use upstream
  auth delegators to perform for SubjectAccessReview for access to
  metrics.
* Convert spegel and pprof handlers over to new structure.
* Promote bind-address to agent flag to allow setting supervisor bind
  address for both agent and server.
* Promote enable-pprof to agent flag to allow profiling agents. Access
  to the pprof endpoint now requires client cert auth, similar to the
  spegel registry api endpoint.
* Add prometheus metrics handler.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit ff679fb3ab)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 09:16:55 -07:00
linxin d386eaf904 Validate resolv.conf for presence of nameserver entries
Co-authored-by: Brad Davidson <brad@oatmail.org>
Signed-off-by: linxin <linxin@geedgenetworks.com>
(cherry picked from commit f24ba9d3a9)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 09:16:55 -07:00
Brad Davidson c7d8e98b37 Switch stargz over to cri registry config_path
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 30999f9a07)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 09:16:55 -07:00
Brad Davidson bfc17af8bb Use fixed stream server bind address for cri-dockerd
Will now use 127.0.0.1:10010, same as containerd's CRI

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 7374010c0c)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 09:16:55 -07:00
Brad Davidson c4226adc8f Add WithSkipMissing to not fail import on missing blobs
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 5f6b813cc8)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 09:16:55 -07:00
Thomas Ferrandiz 7ebc6903fa Use TrafficManager interface when calling flannel
Signed-off-by: Thomas Ferrandiz <thomas.ferrandiz@suse.com>
2024-05-28 07:58:19 +00:00
Thomas Ferrandiz 1ec25d8f64 Bump flannel version to v0.25.2
Signed-off-by: Thomas Ferrandiz <thomas.ferrandiz@suse.com>
2024-05-28 07:58:19 +00:00
Harrison Affel 1689846299 windows changes
Signed-off-by: Harrison Affel <harrisonaffel@gmail.com>
2024-05-16 14:55:16 -07:00
Brad Davidson f2961fb5d2 Add workaround for containerd hosts.toml bug
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-03 20:47:54 -07:00
Brad Davidson 7f659759dd Add certificate expiry check and warnings
* Add ADR
* Add `k3s certificate check` command.
* Add periodic check and events when certs are about to expire.
* Add metrics for certificate validity remaining, labeled by cert subject

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-28 12:05:21 -07:00
Derek Nola 6a42c6fcfe
Remove old pinned dependencies (#9806)
Signed-off-by: Derek Nola <derek.nola@suse.com>
2024-03-28 10:09:48 -07:00
Derek Nola 14f54d0b26
Transition from deprecated pointer library to ptr (#9801)
Signed-off-by: Derek Nola <derek.nola@suse.com>
2024-03-28 10:07:02 -07:00
Brad Davidson c51d7bfbd1 Add health-check support to loadbalancer
* Adds support for health-checking loadbalancer servers. If a
  health-check fails when dialing, all existing connections to the
  server will be closed.
* Wires up a remotedialer tunnel connectivity check as the health check
  for supervisor/apiserver connections.
* Wires up a simple ping request to the supervisor port as the health
  check for etcd connections.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-27 16:50:27 -07:00
Brad Davidson f099bfa508 Fix error when image has already been pulled
CRI and containerd APIs disagree about the registry names - CRI supports
index.docker.io as an alias for docker.io, while containerd does not.
Use the actual stored RepoTag to determine what image to ask containerd for.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-26 16:19:40 -07:00
Brad Davidson bba3e3c66b Fix wildcard entry upstream fallback
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-12 23:31:16 -07:00
Brad Davidson fe2ca9ecf1 Warn and suppress duplicate registry mirror endpoints
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-07 16:30:06 -08:00
Roberto Bonafiglia 88c431aea5 Adjust first node-ip based on configured clusterCIDR
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2024-03-06 11:10:41 +01:00
Vitor Savian 59c724f7a6 Fix wildcard with embbeded registry test
Signed-off-by: Vitor Savian <vitor.savian@suse.com>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-05 14:38:36 -08:00
Flavio Castelli 64e4f0e6e7 fix: use correct wasm shims names
Fix the wasm shim detection and the containerd configuration generation.

Prior to this commit, the binary and the `RuntimeType` values were not
correct.

Signed-off-by: Flavio Castelli <fcastelli@suse.com>
2024-03-05 13:12:08 -08:00
Brad Davidson b164d7a270 Fix additional corner cases in registries handling
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-04 11:59:33 -08:00
Brad Davidson 513c3416e7 Tweak netpol node wait logs
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-01 12:01:34 -08:00
Brad Davidson be569f65a9 Fix NodeHosts on dual-stack clusters
* Add both dual-stack addresses to the node hosts file
* Add hostname to hosts file as alias for node name to ensure consistent resolution

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-01 11:59:59 -08:00
Brad Davidson 86f102134e Fix netpol startup when flannel is disabled
Don't break out of the poll loop if we can't get the node, RBAC might not be ready yet.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-02-26 14:58:48 -08:00
Derek Nola fae41a8b2a Rename AgentReady to ContainerRuntimeReady for better clarity
Signed-off-by: Derek Nola <derek.nola@suse.com>
2024-02-21 12:21:19 -08:00
Derek Nola 91cc2feed2 Restore original order of agent startup functions
Signed-off-by: Derek Nola <derek.nola@suse.com>
2024-02-21 12:21:19 -08:00
Oliver Larsson cfc3a124ee
[Testing]: Test_UnitApplyContainerdQoSClassConfigFileIfPresent (Created) (#8945)
Problem:
Function not tested.

Solution:
Unit test added.

Signed-off-by: Oliver Larsson <larsson.e.oliver@gmail.com>
2024-02-09 11:28:06 -08:00
Harrison Affel a36cc736bc allow executors to define containerd and docker behavior
Signed-off-by: Harrison Affel <harrisonaffel@gmail.com>
2024-02-09 15:51:35 -03:00
Brad Davidson 753c00f30c Consistently handle component exit on shutdown
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-02-07 10:23:54 -08:00
Vitor Savian e9cec46a23 Runtimes refactor using exec.LookPath
Signed-off-by: Vitor Savian <vitor.savian@suse.com>
2024-02-07 15:06:16 -03:00
Brad Davidson 97a22632b9 gofmt config_test.go
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-02-01 18:51:51 -08:00
Brad Davidson 29848dea3d Fix issues with certs.d template generation
* Fix issue with bare host or IP as endpoint
* Fix issue with localhost registries not defaulting to http.
* Move the registry template prep to a separate function,
  and adds tests of that function so that we can ensure we're
  generating the correct content.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-02-01 12:09:13 -08:00
Brad Davidson c87e6e5f7e Move proxy dialer out of init() and fix crash
* Fixes issue where proxy support only honored server address via K3S_URL, not CLI or config.
* Fixes crash when agent proxy is enabled, but proxy env vars do not return a proxy URL for the server address (server URL is in NO_PROXY list).
* Adds tests

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-01-11 16:12:15 -08:00
Brad Davidson 76fa022045 Enable network policy controller metrics
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-01-11 10:19:39 -08:00
Brad Davidson 37e9b87f62 Add embedded registry implementation
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-01-09 15:23:05 -08:00
Brad Davidson ef90da5c6e Add server CLI flag and config fields for embedded registry
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-01-09 15:23:05 -08:00
Brad Davidson 77846d63c1 Propagate errors up from config.Get
Fixes crash when killing agent while waiting for config from server

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-01-09 15:23:05 -08:00
Brad Davidson 16d29398ad Move registries.yaml load into agent config
Moving it into config.Agent so that we can use or modify it outside the context of containerd setup

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-01-09 15:23:05 -08:00
Brad Davidson 5c99bdd9bd Pin images instead of locking layers with lease
Layer leases never did what we wanted anyways, and this is the new approved interface for ensuring that images do not get GCd

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-01-09 15:23:05 -08:00
Manuel Buil 6330e26bb3 Wait for taint to be gone in the node before starting the netpol controller
Signed-off-by: Manuel Buil <mbuil@suse.com>
2024-01-08 12:04:18 +01:00
Lex Rivera 5fe074b540
Add more paths to crun runtime detection (#9086)
* add usr/local paths for crun detection

Signed-off-by: Lex Rivera <me@lex.io>
2024-01-04 16:51:13 -08:00