Commit Graph

1386 Commits (118acabec28c3f290b324f56ac8170dd2de60997)

Author SHA1 Message Date
Brad Davidson 118acabec2 Fix IPv6 primary node-ip handling
I should have caught `[]string{cfg.NodeIP}[0]` and `[]string{envInfo.NodeIP.String()}[0]` in code review...

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-07-15 09:46:52 -07:00
Brad Davidson 9841517457 Fix agents removing configured supervisor address
We shouldn't be replacing the configured server address on agents. Doing
so breaks the agent's ability to fall back to the fixed registration
endpoint when all servers are down, since we replaced it with the first
discovered apiserver address. The fixed registration endpoint will be
restored as default when the service is restarted, but this is not the
correct behavior. This should have only been done on etcd-only nodes
that start up using their local supervisor, but need to switch to a
control-plane node as soon as one is available.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-07-15 09:46:52 -07:00
Brad Davidson 9d0c2e0000 Fix reentrant rlock in loadbalancer.dialContext
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-07-15 09:46:52 -07:00
Brad Davidson c36db53e54 Add etcd s3 config secret implementation
* Move snapshot structs and functions into pkg/etcd/snapshot
* Move s3 client code and functions into pkg/etcd/s3
* Refactor pkg/etcd to track snapshot and s3 moves
* Add support for reading s3 client config from secret
* Add minio client cache, since S3 client configuration can now be
  changed at runtime by modifying the secret, and don't want to have to
  create a new minio client every time we read config.
* Add tests for pkg/etcd/s3

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-07-10 13:13:55 -07:00
Brad Davidson eb8bd15889 Ensure remotedialer kubelet connections use kubelet bind address
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-07-10 13:00:25 -07:00
github-actions[bot] a0b374508e
Bump Local Path Provisioner version (#10394)
* chore: Bump Local Path Provisioner version

Made with ❤️️ by updatecli

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-07-10 12:53:46 -07:00
Roberto Bonafiglia faeaf1b01b Update flannel to v0.25.4 and fixed issue with IPv6 mask
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2024-07-01 18:57:34 +02:00
Brad Davidson aa4794b372 Replace 1-weight semaphore on snapshots with simple mutex
Fixes an issue where the semaphore wasn't permanently initialized
until a scheduled snapshot was taken, allowing multiple on-demand
snapshots to be taken until the first scheduled snapshot was triggered.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-06-19 09:47:58 -07:00
Brad Davidson b4d4ed8f01 Fix agent supervisor port using apiserver port instead
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-06-13 15:13:21 -07:00
Harrison Affel f10cb29534 fix typo, use rancher/permissions
Signed-off-by: Harrison Affel <harrisonaffel@gmail.com>
2024-06-07 08:00:44 -07:00
Brad Davidson c0450a2cb4 Fix race condition panic in loadbalancer.nextServer
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-06-07 07:39:48 -07:00
Vitor Savian d9b8ba8d71
Add snapshot retention etcd-s3-folder fix
* Add snapshot retention folder fix

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

* Add snapshot retention E2E test

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

---------

Signed-off-by: Vitor Savian <vitor.savian@suse.com>
2024-06-06 17:31:01 -03:00
fmoral2 043b1eac5d
Add test for `isValidResolvConf` (#10302)
Signed-off-by: Francisco <francisco.moral@suse.com>
2024-06-06 17:02:31 -03:00
Brad Davidson 1661f1024a Fix bug that caused agents to bypass local loadbalancer
If proxy.SetAPIServerPort was called multiple times, all calls after the
first one would cause the apiserver address to be set to the default
server address, bypassing the local load-balancer. This was most likely
to occur on RKE2, where the supervisor may be up for a period of time
before it is ready to manage node password secrets, causing the agent
to retry.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-06-04 11:18:45 -07:00
Koen de Laat 79ba10f5ec fix: Use actual warningPeriod in certmonitor
Signed-off-by: Koen de Laat <koen.de.laat@philips.com>
2024-06-03 11:20:15 -07:00
github-actions[bot] 1268779ea0
Bump Local Path Provisioner version (#10268)
* chore: Bump Local Path Provisioner version

Made with ❤️️ by updatecli
2024-06-03 11:19:23 -07:00
Brad Davidson f9130d537d Fix embedded mirror blocked by SAR RBAC and re-enable test
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-31 08:33:18 -07:00
Katherine Door 7a0ea3c953
Add write-kubeconfig-group flag to server (#9233)
* Add write-kubeconfig-group flag to server
* update kubectl unable to read config message for kubeconfig mode/group

Signed-off-by: Katherine Pata <me@kitty.sh>
2024-05-30 23:45:34 -07:00
Brad Davidson 307f07bd61 Fix issue caused by sole server marked as failed under load
If health checks are failing for all servers, make a second pass through the server list with health-checks ignored before returning failure

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-30 11:47:23 -07:00
Brad Davidson ed23a2bb48 Fix netpol crash when node remains tained unintialized
It is concievable that users might take more than 60 seconds to deploy their own cloud-provider. Instead of exiting, we should wait forever, but with more logging to indicate what's being waited on.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-28 23:34:44 -07:00
Brad Davidson f8e0648304 Convert remaining http handlers over to use util.SendError
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-28 16:24:57 -07:00
Brad Davidson ff679fb3ab Refactor supervisor listener startup and add metrics
* Refactor agent supervisor listener startup and authn/authz to use upstream
  auth delegators to perform for SubjectAccessReview for access to
  metrics.
* Convert spegel and pprof handlers over to new structure.
* Promote bind-address to agent flag to allow setting supervisor bind
  address for both agent and server.
* Promote enable-pprof to agent flag to allow profiling agents. Access
  to the pprof endpoint now requires client cert auth, similar to the
  spegel registry api endpoint.
* Add prometheus metrics handler.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-28 16:24:57 -07:00
Brad Davidson 3d14092f76 Fix issue with k3s-etcd informers not starting
Start shared informer caches when k3s-etcd controller wins leader election. Previously, these were only started when the main k3s apiserver controller won an election. If the leaders ended up going to different nodes, some informers wouldn't be started

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-28 15:48:15 -07:00
Thomas Ferrandiz 6dcd52eb8e Use TrafficManager interface when calling flannel
Signed-off-by: Thomas Ferrandiz <thomas.ferrandiz@suse.com>
2024-05-27 13:05:18 +00:00
Thomas Ferrandiz af7bcc3900 Bump flannel version to v0.25.2
Signed-off-by: Thomas Ferrandiz <thomas.ferrandiz@suse.com>
2024-05-27 13:05:18 +00:00
huangzy 6fcaad553d allow helm controller set owner reference
Signed-off-by: huangzy <huangzynn@outlook.com>
2024-05-24 12:44:10 -07:00
Robert Rose 6886c0977f Follow directory symlinks in auto deploying manifests (#9288)
Signed-off-by: Robert Rose <robert.rose@mailbox.org>
2024-05-24 12:42:25 -07:00
linxin f24ba9d3a9 Validate resolv.conf for presence of nameserver entries
Co-authored-by: Brad Davidson <brad@oatmail.org>
Signed-off-by: linxin <linxin@geedgenetworks.com>
2024-05-24 12:39:34 -07:00
Brad Davidson 5a0162d8ee Drop check for legacy traefik v1 chart
We have been bundling traefik v2 for three years, its time to drop the legacy chart check

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-23 14:13:13 -07:00
Brad Davidson 37f97b33c9 Add support for svclb pod PriorityClassName
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-23 14:11:15 -07:00
Brad Davidson b453630478 Update local-path-provisioner helper script
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-23 14:00:00 -07:00
Brad Davidson 095ecdb034 Fix issue with local traffic policy for single-stack services on dual-stack nodes.
Just enable IP forwarding for all address families regardless of service address families.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-23 13:54:30 -07:00
Brad Davidson 5cf4d75749 Bump spegel version
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-23 13:48:38 -07:00
Brad Davidson 30999f9a07 Switch stargz over to cri registry config_path
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-23 13:35:15 -07:00
Brad Davidson 7374010c0c Use fixed stream server bind address for cri-dockerd
Will now use 127.0.0.1:10010, same as containerd's CRI

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-23 13:33:27 -07:00
Brad Davidson 5f6b813cc8 Add WithSkipMissing to not fail import on missing blobs
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-05-23 13:32:22 -07:00
Manuel Buil 811de8b819 Fix bug when using tailscale config by file
Signed-off-by: Manuel Buil <mbuil@suse.com>
2024-05-23 11:55:20 +02:00
Harrison Affel 1d22b6971f windows changes
Signed-off-by: Harrison Affel <harrisonaffel@gmail.com>
2024-05-16 14:40:27 -07:00
Derek Nola 6531fb79b0
Deprecate pod-infra-container-image kubelet flag (#7409)
Signed-off-by: Derek Nola <derek.nola@suse.com>
2024-05-06 10:39:10 -07:00
Hussein Galal 144f5ad333
Kubernetes V1.30.0-k3s1 (#10063)
* kubernetes 1.30.0-k3s1

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Update go version to v1.22.2

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update dynamiclistener and helm-controller

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update go in go.mod to 1.22.2

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update go in Dockerfiles

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update cri-dockerd

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Add proctitle package with linux and windows constraints

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go mod tidy

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Fixing setproctitle function

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* update dynamiclistener to v0.6.0-rc1

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

---------

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2024-05-06 19:42:27 +03:00
Brad Davidson 94e29e2ef5 Make /db/info available anonymously from localhost
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-22 19:34:43 -07:00
Brad Davidson d3b60543e7 Fix 10 second etcd-snapshot request timeout
The default clientaccess request timeout is too short. Wait longer by default, and add the s3 timeout if s3 is enabled.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-19 23:26:51 -07:00
Brad Davidson 5b431ca531 Fix on-demand snapshots not honoring folder
Also fix etcd s3 tests to actually check that the files are saved to s3 🙃

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-19 23:26:51 -07:00
Thomas Anderson c59820a52a Allow LPP to read helper logs (#9834)
Signed-off-by: Thomas Anderson <127358482+zc-devs@users.noreply.github.com>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-11 12:31:54 -07:00
Brad Davidson 3f906bee79 Update packaged manifests
* Update traefik chart to bump image tag and fix quoting
* Fix image quoting in flat manifests
* Update local-path-provisioner config to stop using deprecated hostpath volume type

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-11 09:22:51 -07:00
Brad Davidson 4cc73b1fee Actually fix agent certificate rotation
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-10 09:21:01 -07:00
Brad Davidson 08f1022663 Don't log 'apiserver disabled' error sent by etcd-only nodes
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-09 15:36:33 -07:00
Brad Davidson 7d9abc9f07 Improve etcd load-balancer startup behavior
Prefer the address of the etcd member being joined, and seed the full address list immediately on startup.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-09 15:36:33 -07:00
Brad Davidson fe465cc832 Move etcd snapshot management CLI to request/response
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-09 15:21:26 -07:00
Brad Davidson 60248c42de Add supervisor cert/key to rotate list
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-04-05 10:59:17 -07:00