Commit Graph

81 Commits (18098ca0d8d628d88886294b6c0d1c17bf949817)

Author SHA1 Message Date
Brad Davidson 18098ca0d8
Fix issue with long-running apiserver endpoints watch (#5480)
Use ListWatch helpers to retry when the watch channel is closed.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-21 09:27:45 -07:00
Brad Davidson 3b36c7e88b Move the apiserver addresses controller into the etcd package
This controller only needs to run when using managed etcd, so move it in
with the rest of the etcd stuff. This change also modifies the
controller to only watch the Kubernetes service endpoint, instead of
watching all endpoints in the entire cluster.

Fixes an error message revealed by use of a newer grpc client in
Kubernetes 1.24, which logs an error when the Put to etcd failed because
kine doesn't support the etcd Put operation. The controller shouldn't
have been running without etcd in the first place.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit f37e7565b8)
2022-04-15 11:15:28 -07:00
Brad Davidson 4225c93cd6 Fix crash on early snapshot
Don't attempt to retrieve snapshot metadata configmap if the apiserver
isn't available. This could be triggered if the cron expression caused a
snapshot to be triggered before the apiserver is up.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 2a429aac65)
2022-04-15 11:15:28 -07:00
Brad Davidson f1c323c268 Skip setting up client tls when etcd server does not have tls enabled
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-30 01:06:28 -07:00
Brad Davidson 90ce62ceaa Defragment etcd datastore before clearing alarms
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-28 09:28:24 -07:00
Brad Davidson 5ba59d98c8 Close additional leaked GPRC clients
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-15 18:09:00 -07:00
Brad Davidson cbf8cadb92 Ignore cluster membership errors when reconciling from temp etcd
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 9a48086524)
2022-03-15 18:09:00 -07:00
Brad Davidson 7da7a00f8f Move temporary etcd startup into etcd module
Reuse the existing etcd library code to start up the temporary etcd
server for bootstrap reconcile. This allows us to do proper
health-checking of the datastore on startup, including handling of
alarms.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit e4846c92b4)
2022-03-15 18:09:00 -07:00
Brad Davidson 8d13e68cc5 Add function to clear local alarms on etcd startup
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 555087b9b8)
2022-03-15 18:09:00 -07:00
Brad Davidson f55f09672e Fix adding etcd-only node to existing cluster
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 5014c9e0e8)
2022-03-15 18:09:00 -07:00
Brad Davidson a18c38d63d Remove unnecessary copies of runtime struct
Several types contained redundant references to ControlRuntime data. Switch to consistently accessing this via config.Runtime instead.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 2989b8b2c5)
2022-03-15 18:09:00 -07:00
Brian Downs 8755fd45f6
[Engine-1.21] Adds the ability to compress etcd snapshots (#4866) (#4958) 2022-01-18 11:08:54 -07:00
Brad Davidson cf3e02acea Fix panic checking name of uninitialized etcd member
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-12-21 23:39:03 -08:00
Hussein Galal 9b67692414 Fix snapshot restoration on fresh nodes (#4737)
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-12-13 18:14:38 -07:00
Brian Downs 50b358048b
Resolve restore bootstrap (#4704) (#4717) 2021-12-09 17:54:43 -07:00
Manuel Buil b6e176f6a0 Check HA network parameters
Signed-off-by: Manuel Buil <mbuil@suse.com>
2021-12-08 14:42:20 +01:00
Derek Nola bd9fca62d1
Improved cleanup for etcd unit test (#4537) (#4609)
* Improved cleanup for etcd unit test

Signed-off-by: Derek Nola <derek.nola@suse.com>
2021-11-30 11:05:09 -08:00
Chris Kim 4e3a074c11
[engine-1.21] etcd snapshot functionality enhancements (#4607)
* etcd snapshot functionality enhancements (#4453)

Signed-off-by: Chris Kim <oats87g@gmail.com>

* feat: add option to disable s3 over https

Signed-off-by: Chris Kim <oats87g@gmail.com>

Co-authored-by: Devin Buhl <devin.kray@gmail.com>
2021-11-29 13:30:12 -08:00
Chris Kim 381d086cf0
[engine-1.21] Add etcd extra args support for K3s (#4470)
* Add etcd extra args support for K3s

Signed-off-by: Chris Kim <oats87g@gmail.com>

* Add etcd custom argument integration test

Signed-off-by: Chris Kim <oats87g@gmail.com>

* Redux: Enable K3s integration test to run on existing cluster (#3905)

* Made it possible to run int tests on existing cluster

Signed-off-by: dereknola <derek.nola@suse.com>

Signed-off-by: Chris Kim <oats87g@gmail.com>

Co-authored-by: Derek Nola <derek.nola@suse.com>
2021-11-11 19:53:20 -08:00
Brian Downs 30c7723c03
[Engine-1.21] All bootstrap backport (#4451)
Add ability to reconcile bootstrap data between datastore and disk (#3398)
2021-11-10 16:20:33 -07:00
Luther Monson 14cf963225
Update wrangler to v0.8.5 (#4428)
Required to support apiextensions.v1 as v1beta1 has been deleted. Also
update helm-controller and dynamiclistener to track wrangler versions.

Signed-off-by: Luther Monson <luther.monson@gmail.com>

Co-authored-by: Brad Davidson <brad.davidson@rancher.com>
2021-11-08 19:59:46 -07:00
Brad Davidson 7d0ecf3ab2 Revert "Backport bootstrap engine 1.21 (#4314)"
This reverts commits
c5a9154538
45c5d78cd7

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-10-27 13:46:00 -07:00
galal-hussein 174b3881a2 Update peer address when running cluster-reset
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-10-25 15:44:18 -07:00
Brian Downs 45c5d78cd7
Backport bootstrap engine 1.21 (#4314) 2021-10-25 13:03:30 -07:00
Brad Davidson 1a8bd3156f Add containerd ready channel to delay etcd node join
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 88178ae65e)
2021-10-20 12:35:16 -07:00
Brad Davidson edde820e89 Fix premature etcd shutdown when joining an existing cluster
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 086ca8ba6a)
2021-10-20 12:35:16 -07:00
Brad Davidson 36332c8cfe Pass context in to embedded etcd so that it can be stopped
Partial cherry-pick from 29c8b238e5

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-10-20 12:35:16 -07:00
Brian Downs 697f7e471a
[Engine-1.21] - Add etcd s3 timeout (#4207) (#4229) 2021-10-18 10:45:47 -07:00
Hussein Galal 0c109a58b0
Make sure there are no duplicates in etcd member list (#4025) (#4050)
* Make sure there are no duplicates in etcd member list

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix node names with hyphens

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* use full server name for etcd node name

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-10-07 22:28:42 +02:00
Brian Downs 1eeffbb330
set transport to skip verify if se skip flag passed (#4102) (#4106) 2021-10-07 12:47:50 -07:00
Chris Kim 661b5aeb94
No-op when etcd member was already removed and use existing name for etcd controller (#4016)
Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-09-15 07:36:21 -07:00
Chris Kim 34de120875
Initial leader elected etcd member management controller (#4011)
Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-09-14 10:19:38 -07:00
Brad Davidson f2c7882750 Add exposed metrics listener instead of replacing loopback listener
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-09-10 10:59:08 -07:00
Malte Starostik b23955e835
Fix URL pruning when joining an etcd member (#3832)
* Fix URL pruning when joining an etcd member

Problem:
Existing member clientURLs were checked if they contain the joining
node's IP. In some edge cases this would prune valid URLs when the
joining IP is a substring match of the only existing member's IP.
Because of this, it was impossible to e.g. join 10.0.0.2 to an existing
node that has an IP of 10.0.0.2X or 10.0.0.2XX:

level=fatal msg="starting kubernetes: preparing server: start managed database:
joining etcd cluster: etcdclient: no available endpoints"

Solution:
Fixed by properly parsing the URLs and comparing the IPs for equality
instead of substring match.

Signed-off-by: Malte Starostik <info@stellaware.de>
2021-08-12 15:59:04 -07:00
Derek Nola a1e36153f9
Added locking system for integration tests (#3820)
* Added locking system for integration tests
Signed-off-by: dereknola <derek.nola@suse.com>
2021-08-10 16:22:12 -07:00
Derek Nola 4cc781b5e3
Moved testing utils into tests directory. Improved gotests template. (#3805)
* Moved testing utils into tests directory. Improved gotests template.
* Updated cgroups2 with util folder rename

Signed-off-by: dereknola <derek.nola@suse.com>
2021-08-10 11:13:26 -07:00
Brian Downs dcf0657b20
account for an s3 folder when listing objects (#3807)
* account for an s3 folder when listing objects
2021-08-09 16:14:41 -07:00
Derek Nola b4eca61aeb
Prevent snapshot commands from creating empty snapshot directory (#3783)
Signed-off-by: dereknola <derek.nola@suse.com>
2021-08-09 09:04:18 -07:00
Hussein Galal bc96ffb5f3
Fix Node stuck at deletion (#3771)
* fix Node stuck at deletion

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix Node stuck at deletion

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-08-05 22:32:01 +02:00
Derek Nola 21c8a33647
Introduction of Integration Tests (#3695)
* Commit of new etcd snapshot integration tests.
* Updated integration github action to not run on doc changes.
* Update Drone runner to only run unit tests

Signed-off-by: dereknola <derek.nola@suse.com>
2021-07-26 09:59:33 -07:00
Derek Nola bba49ea447
Fix to allow prune to correctly cleanup custom named snapshots (#3649)
Signed-off-by: dereknola <derek.nola@suse.com>
2021-07-19 14:30:57 -07:00
Derek Nola c833183517
Add unit tests for pkg/etcd (#3549)
* Created new etcd unit tests and testing support file

Signed-off-by: dereknola <derek.nola@suse.com>
2021-07-01 16:08:35 -07:00
Hussein Galal f5fbb9a9a8
Export cli server flags and etcd restoration functions (#3527)
* Export cli server flags and etfd restoration functions

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* export S3

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-06-30 22:29:03 +02:00
Brian Downs ecbf17e2ed move object channel defer close to goroutine
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-05-18 19:58:30 -07:00
Brian Downs 254b52077e add retention default and wire in s3 prune
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-05-18 13:57:40 -07:00
Brian Downs 6ee28214fa
Add the ability to prune etcd snapshots (#3310)
* add prune subcommand to force rentention policy enforcement
2021-05-13 13:36:33 -07:00
Brian Downs bcd8b67db4
Add the ability to list etcd snapshots (#3303)
* add ability to list local and s3 etcd snapshots
2021-05-11 16:59:33 -07:00
Brian Downs e998cd110d
Add the ability to delete an etcd snapshot locally or from S3 (#3277)
* Add the ability to delete a given set of etcd snapshots from the CLI for locally stored and S3 store snapshots.
2021-05-07 16:10:04 -07:00
Brian Downs beb0d8397a reference node name when needed
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-05-04 10:03:28 -07:00
Brian Downs c5ad71ce0b
Collect and Store etcd Snapshots and Metadata (#3239)
* Add the ability to store local etcd snapshots and etcd snapshots stored in an S3 compatible object store in a ConfigMap.
2021-04-30 18:26:39 -07:00