Commit Graph

63 Commits (8d13e68cc539b96ac23cd665bdb9bf652d51a2ea)

Author SHA1 Message Date
Brad Davidson 8d13e68cc5 Add function to clear local alarms on etcd startup
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 555087b9b8)
2022-03-15 18:09:00 -07:00
Brad Davidson f55f09672e Fix adding etcd-only node to existing cluster
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 5014c9e0e8)
2022-03-15 18:09:00 -07:00
Brad Davidson a18c38d63d Remove unnecessary copies of runtime struct
Several types contained redundant references to ControlRuntime data. Switch to consistently accessing this via config.Runtime instead.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 2989b8b2c5)
2022-03-15 18:09:00 -07:00
Brian Downs 8755fd45f6
[Engine-1.21] Adds the ability to compress etcd snapshots (#4866) (#4958) 2022-01-18 11:08:54 -07:00
Brad Davidson cf3e02acea Fix panic checking name of uninitialized etcd member
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-12-21 23:39:03 -08:00
Hussein Galal 9b67692414 Fix snapshot restoration on fresh nodes (#4737)
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-12-13 18:14:38 -07:00
Brian Downs 50b358048b
Resolve restore bootstrap (#4704) (#4717) 2021-12-09 17:54:43 -07:00
Chris Kim 4e3a074c11
[engine-1.21] etcd snapshot functionality enhancements (#4607)
* etcd snapshot functionality enhancements (#4453)

Signed-off-by: Chris Kim <oats87g@gmail.com>

* feat: add option to disable s3 over https

Signed-off-by: Chris Kim <oats87g@gmail.com>

Co-authored-by: Devin Buhl <devin.kray@gmail.com>
2021-11-29 13:30:12 -08:00
Chris Kim 381d086cf0
[engine-1.21] Add etcd extra args support for K3s (#4470)
* Add etcd extra args support for K3s

Signed-off-by: Chris Kim <oats87g@gmail.com>

* Add etcd custom argument integration test

Signed-off-by: Chris Kim <oats87g@gmail.com>

* Redux: Enable K3s integration test to run on existing cluster (#3905)

* Made it possible to run int tests on existing cluster

Signed-off-by: dereknola <derek.nola@suse.com>

Signed-off-by: Chris Kim <oats87g@gmail.com>

Co-authored-by: Derek Nola <derek.nola@suse.com>
2021-11-11 19:53:20 -08:00
Brian Downs 30c7723c03
[Engine-1.21] All bootstrap backport (#4451)
Add ability to reconcile bootstrap data between datastore and disk (#3398)
2021-11-10 16:20:33 -07:00
Luther Monson 14cf963225
Update wrangler to v0.8.5 (#4428)
Required to support apiextensions.v1 as v1beta1 has been deleted. Also
update helm-controller and dynamiclistener to track wrangler versions.

Signed-off-by: Luther Monson <luther.monson@gmail.com>

Co-authored-by: Brad Davidson <brad.davidson@rancher.com>
2021-11-08 19:59:46 -07:00
Brad Davidson 7d0ecf3ab2 Revert "Backport bootstrap engine 1.21 (#4314)"
This reverts commits
c5a9154538
45c5d78cd7

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-10-27 13:46:00 -07:00
galal-hussein 174b3881a2 Update peer address when running cluster-reset
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-10-25 15:44:18 -07:00
Brian Downs 45c5d78cd7
Backport bootstrap engine 1.21 (#4314) 2021-10-25 13:03:30 -07:00
Brad Davidson 1a8bd3156f Add containerd ready channel to delay etcd node join
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 88178ae65e)
2021-10-20 12:35:16 -07:00
Brad Davidson edde820e89 Fix premature etcd shutdown when joining an existing cluster
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 086ca8ba6a)
2021-10-20 12:35:16 -07:00
Brad Davidson 36332c8cfe Pass context in to embedded etcd so that it can be stopped
Partial cherry-pick from 29c8b238e5

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-10-20 12:35:16 -07:00
Brian Downs 697f7e471a
[Engine-1.21] - Add etcd s3 timeout (#4207) (#4229) 2021-10-18 10:45:47 -07:00
Hussein Galal 0c109a58b0
Make sure there are no duplicates in etcd member list (#4025) (#4050)
* Make sure there are no duplicates in etcd member list

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix node names with hyphens

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* use full server name for etcd node name

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-10-07 22:28:42 +02:00
Chris Kim 34de120875
Initial leader elected etcd member management controller (#4011)
Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-09-14 10:19:38 -07:00
Brad Davidson f2c7882750 Add exposed metrics listener instead of replacing loopback listener
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-09-10 10:59:08 -07:00
Malte Starostik b23955e835
Fix URL pruning when joining an etcd member (#3832)
* Fix URL pruning when joining an etcd member

Problem:
Existing member clientURLs were checked if they contain the joining
node's IP. In some edge cases this would prune valid URLs when the
joining IP is a substring match of the only existing member's IP.
Because of this, it was impossible to e.g. join 10.0.0.2 to an existing
node that has an IP of 10.0.0.2X or 10.0.0.2XX:

level=fatal msg="starting kubernetes: preparing server: start managed database:
joining etcd cluster: etcdclient: no available endpoints"

Solution:
Fixed by properly parsing the URLs and comparing the IPs for equality
instead of substring match.

Signed-off-by: Malte Starostik <info@stellaware.de>
2021-08-12 15:59:04 -07:00
Brian Downs dcf0657b20
account for an s3 folder when listing objects (#3807)
* account for an s3 folder when listing objects
2021-08-09 16:14:41 -07:00
Derek Nola b4eca61aeb
Prevent snapshot commands from creating empty snapshot directory (#3783)
Signed-off-by: dereknola <derek.nola@suse.com>
2021-08-09 09:04:18 -07:00
Hussein Galal bc96ffb5f3
Fix Node stuck at deletion (#3771)
* fix Node stuck at deletion

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix Node stuck at deletion

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-08-05 22:32:01 +02:00
Derek Nola 21c8a33647
Introduction of Integration Tests (#3695)
* Commit of new etcd snapshot integration tests.
* Updated integration github action to not run on doc changes.
* Update Drone runner to only run unit tests

Signed-off-by: dereknola <derek.nola@suse.com>
2021-07-26 09:59:33 -07:00
Derek Nola bba49ea447
Fix to allow prune to correctly cleanup custom named snapshots (#3649)
Signed-off-by: dereknola <derek.nola@suse.com>
2021-07-19 14:30:57 -07:00
Hussein Galal f5fbb9a9a8
Export cli server flags and etcd restoration functions (#3527)
* Export cli server flags and etfd restoration functions

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* export S3

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-06-30 22:29:03 +02:00
Brian Downs ecbf17e2ed move object channel defer close to goroutine
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-05-18 19:58:30 -07:00
Brian Downs 254b52077e add retention default and wire in s3 prune
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-05-18 13:57:40 -07:00
Brian Downs 6ee28214fa
Add the ability to prune etcd snapshots (#3310)
* add prune subcommand to force rentention policy enforcement
2021-05-13 13:36:33 -07:00
Brian Downs bcd8b67db4
Add the ability to list etcd snapshots (#3303)
* add ability to list local and s3 etcd snapshots
2021-05-11 16:59:33 -07:00
Brian Downs e998cd110d
Add the ability to delete an etcd snapshot locally or from S3 (#3277)
* Add the ability to delete a given set of etcd snapshots from the CLI for locally stored and S3 store snapshots.
2021-05-07 16:10:04 -07:00
Brian Downs beb0d8397a reference node name when needed
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-05-04 10:03:28 -07:00
Brian Downs c5ad71ce0b
Collect and Store etcd Snapshots and Metadata (#3239)
* Add the ability to store local etcd snapshots and etcd snapshots stored in an S3 compatible object store in a ConfigMap.
2021-04-30 18:26:39 -07:00
Brian Downs 66ed6efd57 Resolve local retention issue when S3 in use.
Remove early return preventing local retention policy to be enforced
resulting in N number of snapshots being stored.

Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-04-14 10:40:08 -07:00
Hussein Galal 73df65d93a
remove etcd data dir when etcd is disabled (#3059)
* remove etcd data dir when etcd is disabled

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix comment

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* use debug instead of info logs

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-03-16 18:14:43 +02:00
Brian Downs 7c99f8645d
Have Bootstrap Data Stored in etcd at Completed Start (#3038)
* have state stored in etcd at completed start and remove unneeded code
2021-03-11 13:07:40 -07:00
Brad Davidson 7cdfaad6ce
Always use static ports for client load-balancers (#3026)
* Always use static ports for the load-balancers

This fixes an issue where RKE2 kube-proxy daemonset pods were failing to
communicate with the apiserver when RKE2 was restarted because the
load-balancer used a different port every time it started up.

This also changes the apiserver load-balancer port to be 1 below the
supervisor port instead of 1 above it. This makes the apiserver port
consistent at 6443 across servers and agents on RKE2.

Additional fixes below were required to successfully test and use this change
on etcd-only nodes.

* Actually add lb-server-port flag to CLI
* Fix nil pointer when starting server with --disable-etcd but no --server
* Don't try to use full URI as initial load-balancer endpoint
* Fix etcd load-balancer pool updates
* Update dynamiclistener to fix cert updates on etcd-only nodes
* Handle recursive initial server URL in load balancer
* Don't run the deploy controller on etcd-only nodes
2021-03-06 02:29:57 -08:00
Brian Downs 4d1f9eda9d
Etcd Snapshot/Restore to/from S3 Compatible Backends (#2902)
* Add functionality for etcd snapshot/restore to and from S3 compatible backends.
* Update etcd restore functionality to extract and write certificates and configs from snapshot.
2021-03-03 11:14:12 -07:00
galal-hussein d6124981d5 remove etcd member if disable etcd is passed
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-03-01 23:50:50 +02:00
Hussein Galal 5749f66aa3
Add disable flags for control components (#2900)
* Add disable flags to control components

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* golint

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fixes to disable flags

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Add comments to functions

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Fix joining problem

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* golint

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix ticker

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix role labels

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-02-12 17:35:57 +02:00
Brad Davidson 071de833ae Fix typo in field tag
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-01-22 19:38:37 -08:00
Yuriy 06fda7accf
Add functionality to bind custom IP address for Etcd metrics endpoint (#2750)
* Add functionality to bind custom IP address for Etcd metrics endpoint

Signed-off-by: yuriydzobak <yurii.dzobak@lotusflare.com>
2021-01-22 17:40:48 -08:00
Brian Downs 13229019f8
Add ability to perform an etcd on-demand snapshot via cli (#2819)
* add ability to perform an etcd on-demand snapshot via cli
2021-01-21 14:09:15 -07:00
MonzElmasry 86f68d5d62
change etcd dir permission if it exists
Signed-off-by: MonzElmasry <menna.elmasry@rancher.com>
2021-01-08 23:47:36 +02:00
Hussein Galal fadc5a8057
Add tombstone file to etcd and catch errc etcd channel (#2592)
* Add tombstone file to embedded etcd

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go mod update

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more changes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* gofmt and goimports

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go mod update

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go lint

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go lint

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go mod tidy

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2020-12-07 22:30:44 +02:00
Menna Elmasry 523ccaf3f2
Merge pull request #2448 from MonzElmasry/new_b
Make etcd use node private ip
2020-10-29 00:23:56 +02:00
MonzElmasry e8436cc76b
Make etcd use node private ip
Signed-off-by: MonzElmasry <menna.elmasry@rancher.com>
2020-10-28 23:45:24 +02:00
Hussein Galal fcd18d1b6e
skip node delete from removed member (#2413)
* skip node delete from removed member

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* use grpc errors

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go imports

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* exit if node is the etcd that being removed

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2020-10-28 18:32:51 +02:00