I should have caught `[]string{cfg.NodeIP}[0]` and `[]string{envInfo.NodeIP.String()}[0]` in code review...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
We shouldn't be replacing the configured server address on agents. Doing
so breaks the agent's ability to fall back to the fixed registration
endpoint when all servers are down, since we replaced it with the first
discovered apiserver address. The fixed registration endpoint will be
restored as default when the service is restarted, but this is not the
correct behavior. This should have only been done on etcd-only nodes
that start up using their local supervisor, but need to switch to a
control-plane node as soon as one is available.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Move snapshot structs and functions into pkg/etcd/snapshot
* Move s3 client code and functions into pkg/etcd/s3
* Refactor pkg/etcd to track snapshot and s3 moves
* Add support for reading s3 client config from secret
* Add minio client cache, since S3 client configuration can now be
changed at runtime by modifying the secret, and don't want to have to
create a new minio client every time we read config.
* Add tests for pkg/etcd/s3
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* chore: Bump Local Path Provisioner version
Made with ❤️️ by updatecli
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Fixes an issue where the semaphore wasn't permanently initialized
until a scheduled snapshot was taken, allowing multiple on-demand
snapshots to be taken until the first scheduled snapshot was triggered.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
If proxy.SetAPIServerPort was called multiple times, all calls after the
first one would cause the apiserver address to be set to the default
server address, bypassing the local load-balancer. This was most likely
to occur on RKE2, where the supervisor may be up for a period of time
before it is ready to manage node password secrets, causing the agent
to retry.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Add write-kubeconfig-group flag to server
* update kubectl unable to read config message for kubeconfig mode/group
Signed-off-by: Katherine Pata <me@kitty.sh>
If health checks are failing for all servers, make a second pass through the server list with health-checks ignored before returning failure
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
It is concievable that users might take more than 60 seconds to deploy their own cloud-provider. Instead of exiting, we should wait forever, but with more logging to indicate what's being waited on.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Refactor agent supervisor listener startup and authn/authz to use upstream
auth delegators to perform for SubjectAccessReview for access to
metrics.
* Convert spegel and pprof handlers over to new structure.
* Promote bind-address to agent flag to allow setting supervisor bind
address for both agent and server.
* Promote enable-pprof to agent flag to allow profiling agents. Access
to the pprof endpoint now requires client cert auth, similar to the
spegel registry api endpoint.
* Add prometheus metrics handler.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Start shared informer caches when k3s-etcd controller wins leader election. Previously, these were only started when the main k3s apiserver controller won an election. If the leaders ended up going to different nodes, some informers wouldn't be started
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* kubernetes 1.30.0-k3s1
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* Update go version to v1.22.2
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* update dynamiclistener and helm-controller
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* update go in go.mod to 1.22.2
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* update go in Dockerfiles
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* update cri-dockerd
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* Add proctitle package with linux and windows constraints
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* go mod tidy
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* Fixing setproctitle function
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* update dynamiclistener to v0.6.0-rc1
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
---------
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
The default clientaccess request timeout is too short. Wait longer by default, and add the s3 timeout if s3 is enabled.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Update traefik chart to bump image tag and fix quoting
* Fix image quoting in flat manifests
* Update local-path-provisioner config to stop using deprecated hostpath volume type
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Prefer the address of the etcd member being joined, and seed the full address list immediately on startup.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>