* Use clientv3.NewCtxClient instead of New to avoid automatic retry of all RPCs
* Only timeout status requests; allow defrag and alarm clear requests to run to completion.
* Only clear alarms on the local cluster member, not ALL cluster members
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Move snapshot structs and functions into pkg/etcd/snapshot
* Move s3 client code and functions into pkg/etcd/s3
* Refactor pkg/etcd to track snapshot and s3 moves
* Add support for reading s3 client config from secret
* Add minio client cache, since S3 client configuration can now be
changed at runtime by modifying the secret, and don't want to have to
create a new minio client every time we read config.
* Add tests for pkg/etcd/s3
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Fixes an issue where the semaphore wasn't permanently initialized
until a scheduled snapshot was taken, allowing multiple on-demand
snapshots to be taken until the first scheduled snapshot was triggered.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Start shared informer caches when k3s-etcd controller wins leader election. Previously, these were only started when the main k3s apiserver controller won an election. If the leaders ended up going to different nodes, some informers wouldn't be started
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* kubernetes 1.30.0-k3s1
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* Update go version to v1.22.2
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* update dynamiclistener and helm-controller
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* update go in go.mod to 1.22.2
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* update go in Dockerfiles
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* update cri-dockerd
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* Add proctitle package with linux and windows constraints
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* go mod tidy
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* Fixing setproctitle function
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* update dynamiclistener to v0.6.0-rc1
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
---------
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
Prefer the address of the etcd member being joined, and seed the full address list immediately on startup.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Prevents joining nodes from being stuck with bad initial member list if there is a transient failure, or if they try to join themselves
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Set ServerNodeName in snapshot CLI setup
* Raise errer if ServerNodeName ends up empty some other way
* Fix status controller to use etcd node name annotation instead of prefix checking
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
These fields are only necessary when saving snapshots to S3, and will block restoration if attempted
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Removing this in 002e6c43ee regressed
control-plane-only nodes, as we rely on the etcd client to update its
endpoint list internally so that we can use it to sync the load-balancer
address list.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
k3s etcd-snapshot save --etcd-s3 ... is creating a local snapshot and uploading it to s3 while k3s etcd-snapshot delete --etcd-s3 ... was deleting the snapshot only on s3 buckets, this commit change the behavior of delete to do it locally and on s3
Signed-off-by: Ian Cardoso <osodracnai@gmail.com>
Wire up a node watch to collect addresses of server nodes, to prevent adding unauthorized SANs to the dynamiclistener cert.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Turns out etcd-only nodes were never running **any** of the controllers,
so allowing multiple controllers didn't really fix things.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Also ensure that the snapshot job does not attempt to trigger multiple concurrent runs, as this is not supported.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Requires tweaking existing method signature to allow specifying whether or not IPv6 addresses should be return URL-safe.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Addressess issue where the compact may take more than 10 seconds on slower disks. These disks probably aren't really suitable for etcd, but apparently run fine otherwise.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Increase the default snapshot timeout. The timeout is not currently
configurable from Rancher, and larger clusters are frequently seeing
uploads fail at 30 seconds.
* Enable compression for scheduled snapshots if enabled on the
command-line. The CLI flag was not being passed into the etcd config.
* Only set the S3 content-type to application/zip if the file is zipped.
* Don't run more than one snapshot at once, to prevent misconfigured
etcd snapshot cron schedules from stacking up.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
If the user points S3 backups at a bucket containing other files, those
file names may not be valid configmap keys.
For example, RKE1 generates backup files with names like
`s3-c-zrjnb-rs-6hxpk_2022-05-05T12:05:15Z.zip`; the semicolons in the
timestamp portion of the name are not allowed for use in configmap keys.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Bump etcd to v3.5.4-k3s1
* Fix issue with datastore corruption on cluster-reset
* Disable unnecessary components during cluster reset
Disable control-plane components and the tunnel setup during
cluster-reset, even when not doing a restore. This reduces the amount of
log clutter during cluster reset/restore, making any errors encountered
more obvious.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>