Write the extra metadata both locally and to S3. These files are placed such that they will not be used by older versions of K3s that do not make use of them.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
k3s etcd-snapshot save --etcd-s3 ... is creating a local snapshot and uploading it to s3 while k3s etcd-snapshot delete --etcd-s3 ... was deleting the snapshot only on s3 buckets, this commit change the behavior of delete to do it locally and on s3
Signed-off-by: Ian Cardoso <osodracnai@gmail.com>
Wire up a node watch to collect addresses of server nodes, to prevent adding unauthorized SANs to the dynamiclistener cert.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Turns out etcd-only nodes were never running **any** of the controllers,
so allowing multiple controllers didn't really fix things.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Also ensure that the snapshot job does not attempt to trigger multiple concurrent runs, as this is not supported.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Requires tweaking existing method signature to allow specifying whether or not IPv6 addresses should be return URL-safe.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Addressess issue where the compact may take more than 10 seconds on slower disks. These disks probably aren't really suitable for etcd, but apparently run fine otherwise.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Increase the default snapshot timeout. The timeout is not currently
configurable from Rancher, and larger clusters are frequently seeing
uploads fail at 30 seconds.
* Enable compression for scheduled snapshots if enabled on the
command-line. The CLI flag was not being passed into the etcd config.
* Only set the S3 content-type to application/zip if the file is zipped.
* Don't run more than one snapshot at once, to prevent misconfigured
etcd snapshot cron schedules from stacking up.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
If the user points S3 backups at a bucket containing other files, those
file names may not be valid configmap keys.
For example, RKE1 generates backup files with names like
`s3-c-zrjnb-rs-6hxpk_2022-05-05T12:05:15Z.zip`; the semicolons in the
timestamp portion of the name are not allowed for use in configmap keys.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Bump etcd to v3.5.4-k3s1
* Fix issue with datastore corruption on cluster-reset
* Disable unnecessary components during cluster reset
Disable control-plane components and the tunnel setup during
cluster-reset, even when not doing a restore. This reduces the amount of
log clutter during cluster reset/restore, making any errors encountered
more obvious.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
This controller only needs to run when using managed etcd, so move it in
with the rest of the etcd stuff. This change also modifies the
controller to only watch the Kubernetes service endpoint, instead of
watching all endpoints in the entire cluster.
Fixes an error message revealed by use of a newer grpc client in
Kubernetes 1.24, which logs an error when the Put to etcd failed because
kine doesn't support the etcd Put operation. The controller shouldn't
have been running without etcd in the first place.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Don't attempt to retrieve snapshot metadata configmap if the apiserver
isn't available. This could be triggered if the cron expression caused a
snapshot to be triggered before the apiserver is up.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Removed vagrant folder
* Fix comments around E2E ENVs
* Eliminate testutil folder
* Convert flock integration test to unit test
* Point to other READMEs
Signed-off-by: Derek Nola <derek.nola@suse.com>
Reuse the existing etcd library code to start up the temporary etcd
server for bootstrap reconcile. This allows us to do proper
health-checking of the datastore on startup, including handling of
alarms.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Several types contained redundant references to ControlRuntime data. Switch to consistently accessing this via config.Runtime instead.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Upgrade and convert ginkgo from v1 to v2
* Move all integration tests into integration folder
* Update TESTING.md
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Remove sudo commands from integration tests
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Added cleanup fucntion
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Implement better int cleanup
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Rename test utils
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Enable K3sCmd to be a single string
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Removed parsePod function
Signed-off-by: Derek Nola <derek.nola@suse.com>
* codespell
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Revert startup timeout
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Reorder sonobuoy tests, drop concurrent tests to 3
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Disable etcd
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Skip parallel testing for etcd
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Add etcd extra args support for K3s
Signed-off-by: Chris Kim <oats87g@gmail.com>
* Add etcd custom argument integration test
Signed-off-by: Chris Kim <oats87g@gmail.com>
* go generate
Signed-off-by: Chris Kim <oats87g@gmail.com>
* Make sure there are no duplicates in etcd member list
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* fix node names with hyphens
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* use full server name for etcd node name
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* Added test runner and build files
* Changes to int test to output junit results.
* Updated documentation, removed comments
Signed-off-by: dereknola <derek.nola@suse.com>
Required to support apiextensions.v1 as v1beta1 has been deleted. Also
update helm-controller and dynamiclistener to track wrangler versions.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Fix URL pruning when joining an etcd member
Problem:
Existing member clientURLs were checked if they contain the joining
node's IP. In some edge cases this would prune valid URLs when the
joining IP is a substring match of the only existing member's IP.
Because of this, it was impossible to e.g. join 10.0.0.2 to an existing
node that has an IP of 10.0.0.2X or 10.0.0.2XX:
level=fatal msg="starting kubernetes: preparing server: start managed database:
joining etcd cluster: etcdclient: no available endpoints"
Solution:
Fixed by properly parsing the URLs and comparing the IPs for equality
instead of substring match.
Signed-off-by: Malte Starostik <info@stellaware.de>
* Commit of new etcd snapshot integration tests.
* Updated integration github action to not run on doc changes.
* Update Drone runner to only run unit tests
Signed-off-by: dereknola <derek.nola@suse.com>