Commit Graph

1376 Commits (c0450a2cb465b22c6f8aa0c98267ab1d2c5fe98a)

Author SHA1 Message Date
Brad Davidson c0450a2cb4 Fix race condition panic in loadbalancer.nextServer
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-06-07 07:39:48 -07:00
Vitor Savian d9b8ba8d71
Add snapshot retention etcd-s3-folder fix
* Add snapshot retention folder fix

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

* Add snapshot retention E2E test

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

---------

Signed-off-by: Vitor Savian <vitor.savian@suse.com>
2024-06-06 17:31:01 -03:00
fmoral2 043b1eac5d
Add test for `isValidResolvConf` (#10302)
Signed-off-by: Francisco <francisco.moral@suse.com>
2024-06-06 17:02:31 -03:00
Brad Davidson 1661f1024a Fix bug that caused agents to bypass local loadbalancer
If proxy.SetAPIServerPort was called multiple times, all calls after the
first one would cause the apiserver address to be set to the default
server address, bypassing the local load-balancer. This was most likely
to occur on RKE2, where the supervisor may be up for a period of time
before it is ready to manage node password secrets, causing the agent
to retry.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-06-04 11:18:45 -07:00
Koen de Laat 79ba10f5ec fix: Use actual warningPeriod in certmonitor
Signed-off-by: Koen de Laat <koen.de.laat@philips.com>
2024-06-03 11:20:15 -07:00
github-actions[bot] 1268779ea0
Bump Local Path Provisioner version (#10268)
* chore: Bump Local Path Provisioner version

Made with ❤️️ by updatecli
2024-06-03 11:19:23 -07:00
Brad Davidson f9130d537d Fix embedded mirror blocked by SAR RBAC and re-enable test
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 08:33:18 -07:00
Katherine Door 7a0ea3c953
Add write-kubeconfig-group flag to server (#9233)
* Add write-kubeconfig-group flag to server
* update kubectl unable to read config message for kubeconfig mode/group

Signed-off-by: Katherine Pata <me@kitty.sh>
2024-05-30 23:45:34 -07:00
Brad Davidson 307f07bd61 Fix issue caused by sole server marked as failed under load
If health checks are failing for all servers, make a second pass through the server list with health-checks ignored before returning failure

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-30 11:47:23 -07:00
Brad Davidson ed23a2bb48 Fix netpol crash when node remains tained unintialized
It is concievable that users might take more than 60 seconds to deploy their own cloud-provider. Instead of exiting, we should wait forever, but with more logging to indicate what's being waited on.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-28 23:34:44 -07:00
Brad Davidson f8e0648304 Convert remaining http handlers over to use util.SendError
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-28 16:24:57 -07:00
Brad Davidson ff679fb3ab Refactor supervisor listener startup and add metrics
* Refactor agent supervisor listener startup and authn/authz to use upstream
  auth delegators to perform for SubjectAccessReview for access to
  metrics.
* Convert spegel and pprof handlers over to new structure.
* Promote bind-address to agent flag to allow setting supervisor bind
  address for both agent and server.
* Promote enable-pprof to agent flag to allow profiling agents. Access
  to the pprof endpoint now requires client cert auth, similar to the
  spegel registry api endpoint.
* Add prometheus metrics handler.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-28 16:24:57 -07:00
Brad Davidson 3d14092f76 Fix issue with k3s-etcd informers not starting
Start shared informer caches when k3s-etcd controller wins leader election. Previously, these were only started when the main k3s apiserver controller won an election. If the leaders ended up going to different nodes, some informers wouldn't be started

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-28 15:48:15 -07:00
Thomas Ferrandiz 6dcd52eb8e Use TrafficManager interface when calling flannel
Signed-off-by: Thomas Ferrandiz <thomas.ferrandiz@suse.com>
2024-05-27 13:05:18 +00:00
Thomas Ferrandiz af7bcc3900 Bump flannel version to v0.25.2
Signed-off-by: Thomas Ferrandiz <thomas.ferrandiz@suse.com>
2024-05-27 13:05:18 +00:00
huangzy 6fcaad553d allow helm controller set owner reference
Signed-off-by: huangzy <huangzynn@outlook.com>
2024-05-24 12:44:10 -07:00
Robert Rose 6886c0977f Follow directory symlinks in auto deploying manifests (#9288)
Signed-off-by: Robert Rose <robert.rose@mailbox.org>
2024-05-24 12:42:25 -07:00
linxin f24ba9d3a9 Validate resolv.conf for presence of nameserver entries
Co-authored-by: Brad Davidson <brad@oatmail.org>
Signed-off-by: linxin <linxin@geedgenetworks.com>
2024-05-24 12:39:34 -07:00
Brad Davidson 5a0162d8ee Drop check for legacy traefik v1 chart
We have been bundling traefik v2 for three years, its time to drop the legacy chart check

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-23 14:13:13 -07:00
Brad Davidson 37f97b33c9 Add support for svclb pod PriorityClassName
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-23 14:11:15 -07:00
Brad Davidson b453630478 Update local-path-provisioner helper script
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-23 14:00:00 -07:00
Brad Davidson 095ecdb034 Fix issue with local traffic policy for single-stack services on dual-stack nodes.
Just enable IP forwarding for all address families regardless of service address families.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-23 13:54:30 -07:00
Brad Davidson 5cf4d75749 Bump spegel version
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-23 13:48:38 -07:00
Brad Davidson 30999f9a07 Switch stargz over to cri registry config_path
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-23 13:35:15 -07:00
Brad Davidson 7374010c0c Use fixed stream server bind address for cri-dockerd
Will now use 127.0.0.1:10010, same as containerd's CRI

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-23 13:33:27 -07:00
Brad Davidson 5f6b813cc8 Add WithSkipMissing to not fail import on missing blobs
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-23 13:32:22 -07:00
Manuel Buil 811de8b819 Fix bug when using tailscale config by file
Signed-off-by: Manuel Buil <mbuil@suse.com>
2024-05-23 11:55:20 +02:00
Harrison Affel 1d22b6971f windows changes
Signed-off-by: Harrison Affel <harrisonaffel@gmail.com>
2024-05-16 14:40:27 -07:00
Derek Nola 6531fb79b0
Deprecate pod-infra-container-image kubelet flag (#7409)
Signed-off-by: Derek Nola <derek.nola@suse.com>
2024-05-06 10:39:10 -07:00
Hussein Galal 144f5ad333
Kubernetes V1.30.0-k3s1 (#10063)
* kubernetes 1.30.0-k3s1

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Update go version to v1.22.2

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update dynamiclistener and helm-controller

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update go in go.mod to 1.22.2

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update go in Dockerfiles

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update cri-dockerd

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Add proctitle package with linux and windows constraints

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go mod tidy

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Fixing setproctitle function

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update dynamiclistener to v0.6.0-rc1

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

---------

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2024-05-06 19:42:27 +03:00
Brad Davidson 94e29e2ef5 Make /db/info available anonymously from localhost
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-22 19:34:43 -07:00
Brad Davidson d3b60543e7 Fix 10 second etcd-snapshot request timeout
The default clientaccess request timeout is too short. Wait longer by default, and add the s3 timeout if s3 is enabled.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-19 23:26:51 -07:00
Brad Davidson 5b431ca531 Fix on-demand snapshots not honoring folder
Also fix etcd s3 tests to actually check that the files are saved to s3 🙃

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-19 23:26:51 -07:00
Thomas Anderson c59820a52a Allow LPP to read helper logs (#9834)
Signed-off-by: Thomas Anderson <127358482+zc-devs@users.noreply.github.com>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-11 12:31:54 -07:00
Brad Davidson 3f906bee79 Update packaged manifests
* Update traefik chart to bump image tag and fix quoting
* Fix image quoting in flat manifests
* Update local-path-provisioner config to stop using deprecated hostpath volume type

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-11 09:22:51 -07:00
Brad Davidson 4cc73b1fee Actually fix agent certificate rotation
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-10 09:21:01 -07:00
Brad Davidson 08f1022663 Don't log 'apiserver disabled' error sent by etcd-only nodes
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-09 15:36:33 -07:00
Brad Davidson 7d9abc9f07 Improve etcd load-balancer startup behavior
Prefer the address of the etcd member being joined, and seed the full address list immediately on startup.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-09 15:36:33 -07:00
Brad Davidson fe465cc832 Move etcd snapshot management CLI to request/response
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-09 15:21:26 -07:00
Brad Davidson 60248c42de Add supervisor cert/key to rotate list
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-05 10:59:17 -07:00
Derek Nola 9846a72e92
Bump spegel to v0.0.20-k3s1 (#9863)
* Bump spegel to v0.0.20-k3s1

* Remove deprecated libp2p Pretty function

* Remove quic-go pin
   Pinned version is now out of date,  indirect dependencies are now newer, with CVE issue fixed
Signed-off-by: Derek Nola <derek.nola@suse.com>
2024-04-05 08:43:19 -07:00
Brad Davidson f2961fb5d2 Add workaround for containerd hosts.toml bug
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-03 20:47:54 -07:00
Brad Davidson 7f659759dd Add certificate expiry check and warnings
* Add ADR
* Add `k3s certificate check` command.
* Add periodic check and events when certs are about to expire.
* Add metrics for certificate validity remaining, labeled by cert subject

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-28 12:05:21 -07:00
Derek Nola 6a42c6fcfe
Remove old pinned dependencies (#9806)
Signed-off-by: Derek Nola <derek.nola@suse.com>
2024-03-28 10:09:48 -07:00
Derek Nola 14f54d0b26
Transition from deprecated pointer library to ptr (#9801)
Signed-off-by: Derek Nola <derek.nola@suse.com>
2024-03-28 10:07:02 -07:00
Vitor Savian 5d69d6e782 Add tls for kine
Signed-off-by: Vitor Savian <vitor.savian@suse.com>

Bump kine

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

Add integration tests for kine with tls

Signed-off-by: Vitor Savian <vitor.savian@suse.com>
2024-03-28 11:12:07 -03:00
Brad Davidson c51d7bfbd1 Add health-check support to loadbalancer
* Adds support for health-checking loadbalancer servers. If a
  health-check fails when dialing, all existing connections to the
  server will be closed.
* Wires up a remotedialer tunnel connectivity check as the health check
  for supervisor/apiserver connections.
* Wires up a simple ping request to the supervisor port as the health
  check for etcd connections.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-27 16:50:27 -07:00
Brad Davidson edb0440017 Fix etcd snapshot reconcile for agentless nodes
Disable cleanup of orphaned snapshots and patching of node annotations if running agentless

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-27 16:44:36 -07:00
Vitor Savian 3f649e3bcb Add a new error when kine is with disable apiserver or disable etcd
Signed-off-by: Vitor Savian <vitor.savian@suse.com>
2024-03-27 10:59:34 -03:00
Brad Davidson f099bfa508 Fix error when image has already been pulled
CRI and containerd APIs disagree about the registry names - CRI supports
index.docker.io as an alias for docker.io, while containerd does not.
Use the actual stored RepoTag to determine what image to ask containerd for.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-26 16:19:40 -07:00