Commit Graph

1366 Commits (8262c02cdda5d394e5967c885a830c29b5f96490)

Author SHA1 Message Date
Brad Davidson 8262c02cdd Fix issue caused by sole server marked as failed under load
If health checks are failing for all servers, make a second pass through the server list with health-checks ignored before returning failure

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit ca39614d4e)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 09:16:55 -07:00
Brad Davidson 2e7b394713 Fix netpol crash when node remains tained unintialized
It is concievable that users might take more than 60 seconds to deploy their own cloud-provider. Instead of exiting, we should wait forever, but with more logging to indicate what's being waited on.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit ed23a2bb48)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 09:16:55 -07:00
Brad Davidson 0a728b8ff9 Convert remaining http handlers over to use util.SendError
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit f8e0648304)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 09:16:55 -07:00
Brad Davidson 7ef30a2c60 Refactor supervisor listener startup and add metrics
* Refactor agent supervisor listener startup and authn/authz to use upstream
  auth delegators to perform for SubjectAccessReview for access to
  metrics.
* Convert spegel and pprof handlers over to new structure.
* Promote bind-address to agent flag to allow setting supervisor bind
  address for both agent and server.
* Promote enable-pprof to agent flag to allow profiling agents. Access
  to the pprof endpoint now requires client cert auth, similar to the
  spegel registry api endpoint.
* Add prometheus metrics handler.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit ff679fb3ab)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 09:16:55 -07:00
galal-hussein c9f3efbe11 Add proctitle package with linux and windows constraints
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
(cherry picked from commit 48ff3bcddb)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 09:16:55 -07:00
Brad Davidson 2b63eb4a27 Fix issue with k3s-etcd informers not starting
Start shared informer caches when k3s-etcd controller wins leader election. Previously, these were only started when the main k3s apiserver controller won an election. If the leaders ended up going to different nodes, some informers wouldn't be started

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 3d14092f76)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 09:16:55 -07:00
huangzy d40fc0878f allow helm controller set owner reference
Signed-off-by: huangzy <huangzynn@outlook.com>
(cherry picked from commit 6fcaad553d)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 09:16:55 -07:00
Robert Rose 8a7b0b75fe Follow directory symlinks in auto deploying manifests (#9288)
Signed-off-by: Robert Rose <robert.rose@mailbox.org>
(cherry picked from commit 6886c0977f)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 09:16:55 -07:00
linxin d386eaf904 Validate resolv.conf for presence of nameserver entries
Co-authored-by: Brad Davidson <brad@oatmail.org>
Signed-off-by: linxin <linxin@geedgenetworks.com>
(cherry picked from commit f24ba9d3a9)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 09:16:55 -07:00
Brad Davidson d1b3a02af2 Add support for svclb pod PriorityClassName
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 37f97b33c9)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 09:16:55 -07:00
Brad Davidson 6452a5ea1b Update local-path-provisioner helper script
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit b453630478)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 09:16:55 -07:00
Brad Davidson 2f3d3aa05b Fix issue with local traffic policy for single-stack services on dual-stack nodes.
Just enable IP forwarding for all address families regardless of service address families.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 095ecdb034)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 09:16:55 -07:00
Brad Davidson ef8bd94480 Bump spegel version
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 5cf4d75749)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 09:16:55 -07:00
Brad Davidson c7d8e98b37 Switch stargz over to cri registry config_path
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 30999f9a07)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 09:16:55 -07:00
Brad Davidson bfc17af8bb Use fixed stream server bind address for cri-dockerd
Will now use 127.0.0.1:10010, same as containerd's CRI

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 7374010c0c)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 09:16:55 -07:00
Brad Davidson c4226adc8f Add WithSkipMissing to not fail import on missing blobs
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 5f6b813cc8)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 09:16:55 -07:00
Thomas Ferrandiz 7ebc6903fa Use TrafficManager interface when calling flannel
Signed-off-by: Thomas Ferrandiz <thomas.ferrandiz@suse.com>
2024-05-28 07:58:19 +00:00
Thomas Ferrandiz 1ec25d8f64 Bump flannel version to v0.25.2
Signed-off-by: Thomas Ferrandiz <thomas.ferrandiz@suse.com>
2024-05-28 07:58:19 +00:00
Manuel Buil 86ad488227 Fix bug when using tailscale config by file
Signed-off-by: Manuel Buil <mbuil@suse.com>
2024-05-24 07:56:20 +02:00
Harrison Affel 1689846299 windows changes
Signed-off-by: Harrison Affel <harrisonaffel@gmail.com>
2024-05-16 14:55:16 -07:00
Brad Davidson 94e29e2ef5 Make /db/info available anonymously from localhost
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-22 19:34:43 -07:00
Brad Davidson d3b60543e7 Fix 10 second etcd-snapshot request timeout
The default clientaccess request timeout is too short. Wait longer by default, and add the s3 timeout if s3 is enabled.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-19 23:26:51 -07:00
Brad Davidson 5b431ca531 Fix on-demand snapshots not honoring folder
Also fix etcd s3 tests to actually check that the files are saved to s3 🙃

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-19 23:26:51 -07:00
Thomas Anderson c59820a52a Allow LPP to read helper logs (#9834)
Signed-off-by: Thomas Anderson <127358482+zc-devs@users.noreply.github.com>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-11 12:31:54 -07:00
Brad Davidson 3f906bee79 Update packaged manifests
* Update traefik chart to bump image tag and fix quoting
* Fix image quoting in flat manifests
* Update local-path-provisioner config to stop using deprecated hostpath volume type

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-11 09:22:51 -07:00
Brad Davidson 4cc73b1fee Actually fix agent certificate rotation
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-10 09:21:01 -07:00
Brad Davidson 08f1022663 Don't log 'apiserver disabled' error sent by etcd-only nodes
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-09 15:36:33 -07:00
Brad Davidson 7d9abc9f07 Improve etcd load-balancer startup behavior
Prefer the address of the etcd member being joined, and seed the full address list immediately on startup.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-09 15:36:33 -07:00
Brad Davidson fe465cc832 Move etcd snapshot management CLI to request/response
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-09 15:21:26 -07:00
Brad Davidson 60248c42de Add supervisor cert/key to rotate list
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-05 10:59:17 -07:00
Derek Nola 9846a72e92
Bump spegel to v0.0.20-k3s1 (#9863)
* Bump spegel to v0.0.20-k3s1

* Remove deprecated libp2p Pretty function

* Remove quic-go pin
   Pinned version is now out of date,  indirect dependencies are now newer, with CVE issue fixed
Signed-off-by: Derek Nola <derek.nola@suse.com>
2024-04-05 08:43:19 -07:00
Brad Davidson f2961fb5d2 Add workaround for containerd hosts.toml bug
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-03 20:47:54 -07:00
Brad Davidson 7f659759dd Add certificate expiry check and warnings
* Add ADR
* Add `k3s certificate check` command.
* Add periodic check and events when certs are about to expire.
* Add metrics for certificate validity remaining, labeled by cert subject

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-28 12:05:21 -07:00
Derek Nola 6a42c6fcfe
Remove old pinned dependencies (#9806)
Signed-off-by: Derek Nola <derek.nola@suse.com>
2024-03-28 10:09:48 -07:00
Derek Nola 14f54d0b26
Transition from deprecated pointer library to ptr (#9801)
Signed-off-by: Derek Nola <derek.nola@suse.com>
2024-03-28 10:07:02 -07:00
Vitor Savian 5d69d6e782 Add tls for kine
Signed-off-by: Vitor Savian <vitor.savian@suse.com>

Bump kine

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

Add integration tests for kine with tls

Signed-off-by: Vitor Savian <vitor.savian@suse.com>
2024-03-28 11:12:07 -03:00
Brad Davidson c51d7bfbd1 Add health-check support to loadbalancer
* Adds support for health-checking loadbalancer servers. If a
  health-check fails when dialing, all existing connections to the
  server will be closed.
* Wires up a remotedialer tunnel connectivity check as the health check
  for supervisor/apiserver connections.
* Wires up a simple ping request to the supervisor port as the health
  check for etcd connections.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-27 16:50:27 -07:00
Brad Davidson edb0440017 Fix etcd snapshot reconcile for agentless nodes
Disable cleanup of orphaned snapshots and patching of node annotations if running agentless

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-27 16:44:36 -07:00
Vitor Savian 3f649e3bcb Add a new error when kine is with disable apiserver or disable etcd
Signed-off-by: Vitor Savian <vitor.savian@suse.com>
2024-03-27 10:59:34 -03:00
Brad Davidson f099bfa508 Fix error when image has already been pulled
CRI and containerd APIs disagree about the registry names - CRI supports
index.docker.io as an alias for docker.io, while containerd does not.
Use the actual stored RepoTag to determine what image to ask containerd for.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-26 16:19:40 -07:00
Brad Davidson 65cd606832 Respect cloud-provider fields set by kubelet
Don't clobber the providerID field and instance-type/region/zone labels if provided by the kubelet. This allows the user to set these to the correct values when using the embedded CCM in a real cloud environment.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-26 16:18:34 -07:00
Brad Davidson d7cdbb7d4d Send error response if member list cannot be retrieved
Prevents joining nodes from being stuck with bad initial member list if there is a transient failure, or if they try to join themselves

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-26 15:17:15 -07:00
Brad Davidson 7a2a2d075c Move error response generation code into util
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-26 15:17:15 -07:00
Brad Davidson bba3e3c66b Fix wildcard entry upstream fallback
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-12 23:31:16 -07:00
Brad Davidson fe2ca9ecf1 Warn and suppress duplicate registry mirror endpoints
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-07 16:30:06 -08:00
Brad Davidson 2a091a693a Bump metrics-server to v0.7.0
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-07 12:45:29 -08:00
Roberto Bonafiglia 88c431aea5 Adjust first node-ip based on configured clusterCIDR
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2024-03-06 11:10:41 +01:00
Vitor Savian 59c724f7a6 Fix wildcard with embbeded registry test
Signed-off-by: Vitor Savian <vitor.savian@suse.com>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-05 14:38:36 -08:00
Flavio Castelli 64e4f0e6e7 fix: use correct wasm shims names
Fix the wasm shim detection and the containerd configuration generation.

Prior to this commit, the binary and the `RuntimeType` values were not
correct.

Signed-off-by: Flavio Castelli <fcastelli@suse.com>
2024-03-05 13:12:08 -08:00
Brad Davidson 091a5c8965 Don't register embedded registry address as an upstream registry
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-04 15:11:26 -08:00