Commit Graph

127 Commits (b010c941cf9fa2f0459bbfb99f22e1f1f6440941)

Author SHA1 Message Date
Brad Davidson 8705a88bf4 Clear remove annotations on cluster reset; refuse to delete last member from cluster
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-09-25 11:54:23 -07:00
Brad Davidson 002e6c43ee Reorganize Driver interface and etcd driver to avoid passing context and config into most calls
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-09-25 11:54:23 -07:00
Brad Davidson 890645924f Don't export functions not needed outside the etcd package
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-09-25 11:54:23 -07:00
Brad Davidson 8c73fd670b Disable HTTP on main etcd client port
Fixes performance issue under load, ref: https://github.com/etcd-io/etcd/issues/15402 and https://github.com/kubernetes/kubernetes/pull/118460

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-09-25 08:29:57 -07:00
Vitor Savian e83b1ba4aa
Fixed the etcd retention to delete orphaned snapshots based on the date (#8177)
* Fix retention using name instead of date

Signed-off-by: Vitor <vitor.savian@suse.com>
2023-08-14 18:48:59 -03:00
Ian Cardoso e551308db8
fix for etcd-snapshot delete with --etcd-s3 flag (#8110)
k3s etcd-snapshot save --etcd-s3 ... is creating a local snapshot and uploading it to s3 while k3s etcd-snapshot delete --etcd-s3 ... was deleting the snapshot only on s3 buckets, this commit change the behavior of delete to do it locally and on s3

Signed-off-by: Ian Cardoso <osodracnai@gmail.com>
2023-08-04 14:26:32 -03:00
Vitor Savian ca7aeed090
Etcd snapshots retention when node name changes (#8099)
Fixed the etcd retention to delete orphaned snapshots

Signed-off-by: Vitor <vitor.savian@suse.com>
2023-08-03 10:54:40 -03:00
Brad Davidson aa76942d0f Add FilterCN function to prevent SAN Stuffing
Wire up a node watch to collect addresses of server nodes, to prevent adding unauthorized SANs to the dynamiclistener cert.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-08-02 11:15:39 -07:00
Brad Davidson e61fde93c1 Fix MemberList error handling and incorrect etcd-arg passthrough
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-04-28 22:04:30 -07:00
Brad Davidson 91afb38799 Retry cluster join on "too many learners" error
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-04-28 11:28:33 -07:00
Brad Davidson d95980bba3 Lock bootstrap data with empty key to prevent conflicts
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-04-05 10:56:57 -07:00
Brad Davidson b010db0cff Ensure that loopback is used for the advertised address when resetting
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-04-03 17:01:43 -07:00
Brad Davidson 0c302f4341 Fix etcd member deletion
Turns out etcd-only nodes were never running **any** of the controllers,
so allowing multiple controllers didn't really fix things.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-02-14 09:39:41 -08:00
Brad Davidson 3d146d2f1b Allow for multiple sets of leader-elected controllers
Addresses an issue where etcd controllers did not run on etcd-only nodes

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-02-10 10:46:48 -08:00
Brad Davidson a298bfdb18 Add jitter to scheduled snapshots and retry harder on conflicts
Also ensure that the snapshot job does not attempt to trigger multiple concurrent runs, as this is not supported.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-01-11 14:32:03 -08:00
Derek Nola 06d81cb936
Replace deprecated ioutil package (#6230)
* Replace ioutil package
* check integration test null pointer
* Remove rotate retries

Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-10-07 17:36:57 -07:00
Derek Nola 4c0bc8c046
Update etcd error to match correct url (#5909)
Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-07-29 09:40:53 -07:00
Brad Davidson 5eaa0a9422 Replace getLocalhostIP with Loopback helper method
Requires tweaking existing method signature to allow specifying whether or not IPv6 addresses should be return URL-safe.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-21 16:51:57 -07:00
Brad Davidson 1674b9d640 Raise etcd connection test timeout to 30 seconds
Addressess issue where the compact may take more than 10 seconds on slower disks. These disks probably aren't really suitable for etcd, but apparently run fine otherwise.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-21 13:23:19 -07:00
Brad Davidson ffe72eecc4 Address issues with etcd snapshots
* Increase the default snapshot timeout. The timeout is not currently
  configurable from Rancher, and larger clusters are frequently seeing
  uploads fail at 30 seconds.
* Enable compression for scheduled snapshots if enabled on the
  command-line. The CLI flag was not being passed into the etcd config.
* Only set the S3 content-type to application/zip if the file is zipped.
* Don't run more than one snapshot at once, to prevent misconfigured
  etcd snapshot cron schedules from stacking up.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-12 14:41:38 -07:00
Brad Davidson 6fad63583b Only listen on loopback when resetting
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-15 11:25:54 -07:00
Brad Davidson fb0a342a20 Sanitize filenames for use in configmap keys
If the user points S3 backups at a bucket containing other files, those
file names may not be valid configmap keys.

For example, RKE1 generates backup files with names like
`s3-c-zrjnb-rs-6hxpk_2022-05-05T12:05:15Z.zip`; the semicolons in the
timestamp portion of the name are not allowed for use in configmap keys.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-15 10:54:26 -07:00
Brad Davidson ce5b9347c9 Replace DefaultProxyDialerFn dialer injection with EgressSelector support
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-29 17:54:36 -07:00
Brad Davidson 418c3fa858
Fix issue with datastore corruption on cluster-reset (#5515)
* Bump etcd to v3.5.4-k3s1
* Fix issue with datastore corruption on cluster-reset
* Disable unnecessary components during cluster reset

Disable control-plane components and the tunnel setup during
cluster-reset, even when not doing a restore. This reduces the amount of
log clutter during cluster reset/restore, making any errors encountered
more obvious.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-27 13:44:15 -07:00
Brad Davidson f2ceeb01d9
Fix issue with long-running apiserver endpoints watch (#5478)
Use ListWatch helpers to retry when the watch channel is closed.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-21 09:24:34 -07:00
Brad Davidson 7760e2177a Bump etcd to 3.5.3-k3s1
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-15 01:53:18 -07:00
Brad Davidson b12cd62935 Move IPv4/v6 selection into helpers
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-15 01:02:42 -07:00
Roberto Bonafiglia 9c9adda61b Added default endpoint for IPv6
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2022-04-14 09:58:40 +02:00
Brad Davidson f37e7565b8 Move the apiserver addresses controller into the etcd package
This controller only needs to run when using managed etcd, so move it in
with the rest of the etcd stuff. This change also modifies the
controller to only watch the Kubernetes service endpoint, instead of
watching all endpoints in the entire cluster.

Fixes an error message revealed by use of a newer grpc client in
Kubernetes 1.24, which logs an error when the Put to etcd failed because
kine doesn't support the etcd Put operation. The controller shouldn't
have been running without etcd in the first place.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-07 11:28:15 -07:00
Brad Davidson 2a429aac65 Fix crash on early snapshot
Don't attempt to retrieve snapshot metadata configmap if the apiserver
isn't available. This could be triggered if the cron expression caused a
snapshot to be triggered before the apiserver is up.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-07 09:23:34 -07:00
Roberto Bonafiglia 4afeb9c5c7
Merge pull request #5325 from rbrtbnfgl/fix-etcd-ipv6-url
Fixed etcd URL in case of IPv6 address
2022-04-05 09:55:42 +02:00
Roberto Bonafiglia 0746dde758 Fixed http URL on etcd
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2022-03-31 14:24:59 +02:00
Roberto Bonafiglia 06c779c57d Fixed loadbalancer in case of IPv6 addresses
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2022-03-31 11:49:30 +02:00
Brad Davidson 62cc1ed24f Skip setting up client tls when etcd server does not have tls enabled
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-30 01:03:41 -07:00
Roberto Bonafiglia dda409b041 Updated localhost address on IPv6 only setup
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2022-03-29 09:35:54 +02:00
Brad Davidson 1339626a5b Defragment etcd datastore before clearing alarms
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-28 09:27:59 -07:00
Roberto Bonafiglia 2285aa699b Fixed etcd URL in case of IPv6 address
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2022-03-23 15:35:51 +01:00
Brad Davidson 078da46532 Close additional leaked GPRC clients
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-15 18:07:55 -07:00
Derek Nola 1f7abe5dbb
Testing directory and documentation rework. (#5256)
* Removed vagrant folder
* Fix comments around E2E ENVs
* Eliminate testutil folder
* Convert flock integration test to unit test
* Point to other READMEs

Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-03-15 10:29:56 -07:00
Luther Monson 9a849b1bb7
[master] changing package to k3s-io (#4846)
* changing package to k3s-io

Signed-off-by: Luther Monson <luther.monson@gmail.com>

Co-authored-by: Derek Nola <derek.nola@suse.com>
2022-03-02 15:47:27 -08:00
Brad Davidson 9a48086524 Ignore cluster membership errors when reconciling from temp etcd
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-01 20:25:20 -08:00
Brad Davidson e4846c92b4 Move temporary etcd startup into etcd module
Reuse the existing etcd library code to start up the temporary etcd
server for bootstrap reconcile. This allows us to do proper
health-checking of the datastore on startup, including handling of
alarms.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-01 20:25:20 -08:00
Brad Davidson 555087b9b8 Add function to clear local alarms on etcd startup
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-01 11:56:52 -08:00
Brad Davidson 5014c9e0e8 Fix adding etcd-only node to existing cluster
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-02-28 19:56:08 -08:00
Brad Davidson 2989b8b2c5 Remove unnecessary copies of runtime struct
Several types contained redundant references to ControlRuntime data. Switch to consistently accessing this via config.Runtime instead.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-02-28 12:05:16 -08:00
Derek Nola e28be2912c
Migrate Ginkgo testing framework to V2, consolidate integration tests (#5097)
* Upgrade and convert ginkgo from v1 to v2
* Move all integration tests into integration folder
* Update TESTING.md

Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-02-09 08:22:53 -08:00
Hussein Galal 13728058a4
Add k3s etcd restoration integration test (#5014)
* Add k3s etcd restoration test

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Fix tests and rebase

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Reorganizing the tests

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Fixing comments

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Fix etcd restore

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* dont check for errors when restoring

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* use eventually to test for restoration

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix tests

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix golint

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2022-02-08 21:24:34 +02:00
Brian Downs effcb15adb
Adds the ability to compress etcd snapshots (#4866) 2022-01-14 10:31:22 -07:00
Derek Nola 2ac8df3602
Integration tests utilities improvements (#4832)
* Remove sudo commands from integration tests

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Added cleanup fucntion

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Implement better int cleanup

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Rename test utils

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Enable K3sCmd to be a single string

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Removed parsePod function

Signed-off-by: Derek Nola <derek.nola@suse.com>

* codespell

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Revert startup timeout

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Reorder sonobuoy tests, drop concurrent tests to 3

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Disable etcd

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Skip parallel testing for etcd

Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-01-06 08:05:56 -08:00
Brad Davidson a5c6e6a68a Fix panic checking name of uninitialized etcd member
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-12-21 23:38:20 -08:00