Commit Graph

113 Commits (d78e490716c1d11fc7e58348a0e7235ca32c2be7)

Author SHA1 Message Date
Brad Davidson a298bfdb18 Add jitter to scheduled snapshots and retry harder on conflicts
Also ensure that the snapshot job does not attempt to trigger multiple concurrent runs, as this is not supported.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-01-11 14:32:03 -08:00
Derek Nola 06d81cb936
Replace deprecated ioutil package (#6230)
* Replace ioutil package
* check integration test null pointer
* Remove rotate retries

Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-10-07 17:36:57 -07:00
Derek Nola 4c0bc8c046
Update etcd error to match correct url (#5909)
Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-07-29 09:40:53 -07:00
Brad Davidson 5eaa0a9422 Replace getLocalhostIP with Loopback helper method
Requires tweaking existing method signature to allow specifying whether or not IPv6 addresses should be return URL-safe.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-21 16:51:57 -07:00
Brad Davidson 1674b9d640 Raise etcd connection test timeout to 30 seconds
Addressess issue where the compact may take more than 10 seconds on slower disks. These disks probably aren't really suitable for etcd, but apparently run fine otherwise.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-21 13:23:19 -07:00
Brad Davidson ffe72eecc4 Address issues with etcd snapshots
* Increase the default snapshot timeout. The timeout is not currently
  configurable from Rancher, and larger clusters are frequently seeing
  uploads fail at 30 seconds.
* Enable compression for scheduled snapshots if enabled on the
  command-line. The CLI flag was not being passed into the etcd config.
* Only set the S3 content-type to application/zip if the file is zipped.
* Don't run more than one snapshot at once, to prevent misconfigured
  etcd snapshot cron schedules from stacking up.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-12 14:41:38 -07:00
Brad Davidson 6fad63583b Only listen on loopback when resetting
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-15 11:25:54 -07:00
Brad Davidson fb0a342a20 Sanitize filenames for use in configmap keys
If the user points S3 backups at a bucket containing other files, those
file names may not be valid configmap keys.

For example, RKE1 generates backup files with names like
`s3-c-zrjnb-rs-6hxpk_2022-05-05T12:05:15Z.zip`; the semicolons in the
timestamp portion of the name are not allowed for use in configmap keys.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-15 10:54:26 -07:00
Brad Davidson ce5b9347c9 Replace DefaultProxyDialerFn dialer injection with EgressSelector support
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-29 17:54:36 -07:00
Brad Davidson 418c3fa858
Fix issue with datastore corruption on cluster-reset (#5515)
* Bump etcd to v3.5.4-k3s1
* Fix issue with datastore corruption on cluster-reset
* Disable unnecessary components during cluster reset

Disable control-plane components and the tunnel setup during
cluster-reset, even when not doing a restore. This reduces the amount of
log clutter during cluster reset/restore, making any errors encountered
more obvious.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-27 13:44:15 -07:00
Brad Davidson f2ceeb01d9
Fix issue with long-running apiserver endpoints watch (#5478)
Use ListWatch helpers to retry when the watch channel is closed.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-21 09:24:34 -07:00
Brad Davidson 7760e2177a Bump etcd to 3.5.3-k3s1
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-15 01:53:18 -07:00
Brad Davidson b12cd62935 Move IPv4/v6 selection into helpers
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-15 01:02:42 -07:00
Roberto Bonafiglia 9c9adda61b Added default endpoint for IPv6
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2022-04-14 09:58:40 +02:00
Brad Davidson f37e7565b8 Move the apiserver addresses controller into the etcd package
This controller only needs to run when using managed etcd, so move it in
with the rest of the etcd stuff. This change also modifies the
controller to only watch the Kubernetes service endpoint, instead of
watching all endpoints in the entire cluster.

Fixes an error message revealed by use of a newer grpc client in
Kubernetes 1.24, which logs an error when the Put to etcd failed because
kine doesn't support the etcd Put operation. The controller shouldn't
have been running without etcd in the first place.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-07 11:28:15 -07:00
Brad Davidson 2a429aac65 Fix crash on early snapshot
Don't attempt to retrieve snapshot metadata configmap if the apiserver
isn't available. This could be triggered if the cron expression caused a
snapshot to be triggered before the apiserver is up.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-07 09:23:34 -07:00
Roberto Bonafiglia 4afeb9c5c7
Merge pull request #5325 from rbrtbnfgl/fix-etcd-ipv6-url
Fixed etcd URL in case of IPv6 address
2022-04-05 09:55:42 +02:00
Roberto Bonafiglia 0746dde758 Fixed http URL on etcd
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2022-03-31 14:24:59 +02:00
Roberto Bonafiglia 06c779c57d Fixed loadbalancer in case of IPv6 addresses
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2022-03-31 11:49:30 +02:00
Brad Davidson 62cc1ed24f Skip setting up client tls when etcd server does not have tls enabled
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-30 01:03:41 -07:00
Roberto Bonafiglia dda409b041 Updated localhost address on IPv6 only setup
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2022-03-29 09:35:54 +02:00
Brad Davidson 1339626a5b Defragment etcd datastore before clearing alarms
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-28 09:27:59 -07:00
Roberto Bonafiglia 2285aa699b Fixed etcd URL in case of IPv6 address
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2022-03-23 15:35:51 +01:00
Brad Davidson 078da46532 Close additional leaked GPRC clients
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-15 18:07:55 -07:00
Derek Nola 1f7abe5dbb
Testing directory and documentation rework. (#5256)
* Removed vagrant folder
* Fix comments around E2E ENVs
* Eliminate testutil folder
* Convert flock integration test to unit test
* Point to other READMEs

Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-03-15 10:29:56 -07:00
Luther Monson 9a849b1bb7
[master] changing package to k3s-io (#4846)
* changing package to k3s-io

Signed-off-by: Luther Monson <luther.monson@gmail.com>

Co-authored-by: Derek Nola <derek.nola@suse.com>
2022-03-02 15:47:27 -08:00
Brad Davidson 9a48086524 Ignore cluster membership errors when reconciling from temp etcd
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-01 20:25:20 -08:00
Brad Davidson e4846c92b4 Move temporary etcd startup into etcd module
Reuse the existing etcd library code to start up the temporary etcd
server for bootstrap reconcile. This allows us to do proper
health-checking of the datastore on startup, including handling of
alarms.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-01 20:25:20 -08:00
Brad Davidson 555087b9b8 Add function to clear local alarms on etcd startup
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-01 11:56:52 -08:00
Brad Davidson 5014c9e0e8 Fix adding etcd-only node to existing cluster
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-02-28 19:56:08 -08:00
Brad Davidson 2989b8b2c5 Remove unnecessary copies of runtime struct
Several types contained redundant references to ControlRuntime data. Switch to consistently accessing this via config.Runtime instead.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-02-28 12:05:16 -08:00
Derek Nola e28be2912c
Migrate Ginkgo testing framework to V2, consolidate integration tests (#5097)
* Upgrade and convert ginkgo from v1 to v2
* Move all integration tests into integration folder
* Update TESTING.md

Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-02-09 08:22:53 -08:00
Hussein Galal 13728058a4
Add k3s etcd restoration integration test (#5014)
* Add k3s etcd restoration test

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Fix tests and rebase

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Reorganizing the tests

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Fixing comments

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Fix etcd restore

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* dont check for errors when restoring

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* use eventually to test for restoration

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix tests

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix golint

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2022-02-08 21:24:34 +02:00
Brian Downs effcb15adb
Adds the ability to compress etcd snapshots (#4866) 2022-01-14 10:31:22 -07:00
Derek Nola 2ac8df3602
Integration tests utilities improvements (#4832)
* Remove sudo commands from integration tests

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Added cleanup fucntion

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Implement better int cleanup

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Rename test utils

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Enable K3sCmd to be a single string

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Removed parsePod function

Signed-off-by: Derek Nola <derek.nola@suse.com>

* codespell

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Revert startup timeout

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Reorder sonobuoy tests, drop concurrent tests to 3

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Disable etcd

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Skip parallel testing for etcd

Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-01-06 08:05:56 -08:00
Brad Davidson a5c6e6a68a Fix panic checking name of uninitialized etcd member
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-12-21 23:38:20 -08:00
Hussein Galal d71b335871
Fix snapshot restoration on fresh nodes (#4737)
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-12-14 02:04:39 +02:00
Brian Downs a6fe2c0bc5
Resolve restore bootstrap (#4704) 2021-12-09 14:54:27 -07:00
Manuel Buil 1b3187ea07 Check HA network parameters
Signed-off-by: Manuel Buil <mbuil@suse.com>
2021-12-07 23:09:05 +01:00
Derek Nola d05c334a78
Improved cleanup for etcd unit test (#4537)
* Improved cleanup for etcd unit test

Signed-off-by: Derek Nola <derek.nola@suse.com>
2021-11-29 14:46:58 -08:00
Chris Kim ae4a1a144a
etcd snapshot functionality enhancements (#4453)
Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-11-29 10:30:04 -08:00
Chris Kim f18b3252c0
[master] Add etcd extra args support for K3s (#4463)
* Add etcd extra args support for K3s

Signed-off-by: Chris Kim <oats87g@gmail.com>

* Add etcd custom argument integration test

Signed-off-by: Chris Kim <oats87g@gmail.com>

* go generate

Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-11-11 21:03:15 -08:00
Brian Downs adaeae351c
update bootstrap logic (#4438)
* update bootstrap logic resolving a startup bug and account for etcd
2021-11-10 05:33:42 -07:00
Derek Nola 7c3f21e581
K3s Integration test fixes (#4341)
* Move tests into sub folders
* Updated documentation
* Prevent infinite loop is user has not made k3s

Signed-off-by: dereknola <derek.nola@suse.com>
2021-10-28 12:35:28 -07:00
galal-hussein ab3d25a2c5 Update peer address when running cluster-reset
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-10-25 15:43:27 -07:00
Brian Downs 0452f017c1
Add etcd s3 timeout (#4207) 2021-10-15 10:24:14 -07:00
Brad Davidson 5a923ab8dc Add containerd ready channel to delay etcd node join
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-10-14 14:03:52 -07:00
Brian Downs ac7a8d89c6
Add ability to reconcile bootstrap data between datastore and disk (#3398) 2021-10-07 12:47:00 -07:00
Brian Downs f4cea90cb9
set transport to skip verify if se skip flag passed (#4102) 2021-09-28 10:13:50 -07:00
Hussein Galal 7826407a2e
Make sure there are no duplicates in etcd member list (#4025)
* Make sure there are no duplicates in etcd member list

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix node names with hyphens

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* use full server name for etcd node name

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-09-18 00:51:18 +02:00