Commit Graph

60 Commits (bf4e037fcfc4cfc5c2a3bca4846bff9550d93887)

Author SHA1 Message Date
Brian Downs a6fe2c0bc5
Resolve restore bootstrap (#4704) 2021-12-09 14:54:27 -07:00
Chris Kim ae4a1a144a
etcd snapshot functionality enhancements (#4453)
Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-11-29 10:30:04 -08:00
Chris Kim f18b3252c0
[master] Add etcd extra args support for K3s (#4463)
* Add etcd extra args support for K3s

Signed-off-by: Chris Kim <oats87g@gmail.com>

* Add etcd custom argument integration test

Signed-off-by: Chris Kim <oats87g@gmail.com>

* go generate

Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-11-11 21:03:15 -08:00
Brian Downs adaeae351c
update bootstrap logic (#4438)
* update bootstrap logic resolving a startup bug and account for etcd
2021-11-10 05:33:42 -07:00
galal-hussein ab3d25a2c5 Update peer address when running cluster-reset
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-10-25 15:43:27 -07:00
Brian Downs 0452f017c1
Add etcd s3 timeout (#4207) 2021-10-15 10:24:14 -07:00
Brad Davidson 5a923ab8dc Add containerd ready channel to delay etcd node join
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-10-14 14:03:52 -07:00
Brian Downs ac7a8d89c6
Add ability to reconcile bootstrap data between datastore and disk (#3398) 2021-10-07 12:47:00 -07:00
Hussein Galal 7826407a2e
Make sure there are no duplicates in etcd member list (#4025)
* Make sure there are no duplicates in etcd member list

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix node names with hyphens

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* use full server name for etcd node name

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-09-18 00:51:18 +02:00
Brad Davidson 086ca8ba6a Fix premature etcd shutdown when joining an existing cluster
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-09-15 10:35:07 -07:00
Chris Kim 928b8531c3
[master] Add `etcd-member-management` controller to K3s (#4001)
* Initial leader elected etcd member management controller
* Bump etcd to v3.5.0-k3s2

Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-09-14 08:20:38 -07:00
Brad Davidson b4d8c641c6 Add exposed metrics listener instead of replacing loopback listener
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-09-10 09:39:39 -07:00
Brad Davidson 29c8b238e5 Replace klog with non-exiting fork
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-09-10 09:36:16 -07:00
Darren Shepherd 741ba95b04 Migrate sqlite data to etcd when initializing the cluster
Signed-off-by: Darren Shepherd <darren@rancher.com>
2021-09-09 10:24:02 -07:00
Devin Buhl a1ec43e0b7
feat: add option to disable s3 over https
Signed-off-by: Devin Buhl <devin.kray@gmail.com>
2021-09-05 12:03:49 -04:00
Brad Davidson b8add39b07 Bump kine for metrics/tls changes
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-09-01 01:51:30 -07:00
Brad Davidson e95b75409a Fix lint failures
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-08-20 18:47:16 -07:00
Brad Davidson 872855015c Update etcd to v3.5.0
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-08-20 18:47:16 -07:00
Malte Starostik b23955e835
Fix URL pruning when joining an etcd member (#3832)
* Fix URL pruning when joining an etcd member

Problem:
Existing member clientURLs were checked if they contain the joining
node's IP. In some edge cases this would prune valid URLs when the
joining IP is a substring match of the only existing member's IP.
Because of this, it was impossible to e.g. join 10.0.0.2 to an existing
node that has an IP of 10.0.0.2X or 10.0.0.2XX:

level=fatal msg="starting kubernetes: preparing server: start managed database:
joining etcd cluster: etcdclient: no available endpoints"

Solution:
Fixed by properly parsing the URLs and comparing the IPs for equality
instead of substring match.

Signed-off-by: Malte Starostik <info@stellaware.de>
2021-08-12 15:59:04 -07:00
Brian Downs dcf0657b20
account for an s3 folder when listing objects (#3807)
* account for an s3 folder when listing objects
2021-08-09 16:14:41 -07:00
Derek Nola b4eca61aeb
Prevent snapshot commands from creating empty snapshot directory (#3783)
Signed-off-by: dereknola <derek.nola@suse.com>
2021-08-09 09:04:18 -07:00
Hussein Galal bc96ffb5f3
Fix Node stuck at deletion (#3771)
* fix Node stuck at deletion

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix Node stuck at deletion

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-08-05 22:32:01 +02:00
Derek Nola 21c8a33647
Introduction of Integration Tests (#3695)
* Commit of new etcd snapshot integration tests.
* Updated integration github action to not run on doc changes.
* Update Drone runner to only run unit tests

Signed-off-by: dereknola <derek.nola@suse.com>
2021-07-26 09:59:33 -07:00
Derek Nola bba49ea447
Fix to allow prune to correctly cleanup custom named snapshots (#3649)
Signed-off-by: dereknola <derek.nola@suse.com>
2021-07-19 14:30:57 -07:00
Hussein Galal f5fbb9a9a8
Export cli server flags and etcd restoration functions (#3527)
* Export cli server flags and etfd restoration functions

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* export S3

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-06-30 22:29:03 +02:00
Brian Downs ecbf17e2ed move object channel defer close to goroutine
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-05-18 19:58:30 -07:00
Brian Downs 254b52077e add retention default and wire in s3 prune
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-05-18 13:57:40 -07:00
Brian Downs 6ee28214fa
Add the ability to prune etcd snapshots (#3310)
* add prune subcommand to force rentention policy enforcement
2021-05-13 13:36:33 -07:00
Brian Downs bcd8b67db4
Add the ability to list etcd snapshots (#3303)
* add ability to list local and s3 etcd snapshots
2021-05-11 16:59:33 -07:00
Brian Downs e998cd110d
Add the ability to delete an etcd snapshot locally or from S3 (#3277)
* Add the ability to delete a given set of etcd snapshots from the CLI for locally stored and S3 store snapshots.
2021-05-07 16:10:04 -07:00
Brian Downs beb0d8397a reference node name when needed
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-05-04 10:03:28 -07:00
Brian Downs c5ad71ce0b
Collect and Store etcd Snapshots and Metadata (#3239)
* Add the ability to store local etcd snapshots and etcd snapshots stored in an S3 compatible object store in a ConfigMap.
2021-04-30 18:26:39 -07:00
Brian Downs 66ed6efd57 Resolve local retention issue when S3 in use.
Remove early return preventing local retention policy to be enforced
resulting in N number of snapshots being stored.

Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-04-14 10:40:08 -07:00
Hussein Galal 73df65d93a
remove etcd data dir when etcd is disabled (#3059)
* remove etcd data dir when etcd is disabled

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix comment

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* use debug instead of info logs

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-03-16 18:14:43 +02:00
Brian Downs 7c99f8645d
Have Bootstrap Data Stored in etcd at Completed Start (#3038)
* have state stored in etcd at completed start and remove unneeded code
2021-03-11 13:07:40 -07:00
Brad Davidson 7cdfaad6ce
Always use static ports for client load-balancers (#3026)
* Always use static ports for the load-balancers

This fixes an issue where RKE2 kube-proxy daemonset pods were failing to
communicate with the apiserver when RKE2 was restarted because the
load-balancer used a different port every time it started up.

This also changes the apiserver load-balancer port to be 1 below the
supervisor port instead of 1 above it. This makes the apiserver port
consistent at 6443 across servers and agents on RKE2.

Additional fixes below were required to successfully test and use this change
on etcd-only nodes.

* Actually add lb-server-port flag to CLI
* Fix nil pointer when starting server with --disable-etcd but no --server
* Don't try to use full URI as initial load-balancer endpoint
* Fix etcd load-balancer pool updates
* Update dynamiclistener to fix cert updates on etcd-only nodes
* Handle recursive initial server URL in load balancer
* Don't run the deploy controller on etcd-only nodes
2021-03-06 02:29:57 -08:00
Brian Downs 4d1f9eda9d
Etcd Snapshot/Restore to/from S3 Compatible Backends (#2902)
* Add functionality for etcd snapshot/restore to and from S3 compatible backends.
* Update etcd restore functionality to extract and write certificates and configs from snapshot.
2021-03-03 11:14:12 -07:00
galal-hussein d6124981d5 remove etcd member if disable etcd is passed
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-03-01 23:50:50 +02:00
Hussein Galal 5749f66aa3
Add disable flags for control components (#2900)
* Add disable flags to control components

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* golint

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fixes to disable flags

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Add comments to functions

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Fix joining problem

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* golint

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix ticker

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix role labels

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-02-12 17:35:57 +02:00
Brad Davidson 071de833ae Fix typo in field tag
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-01-22 19:38:37 -08:00
Yuriy 06fda7accf
Add functionality to bind custom IP address for Etcd metrics endpoint (#2750)
* Add functionality to bind custom IP address for Etcd metrics endpoint

Signed-off-by: yuriydzobak <yurii.dzobak@lotusflare.com>
2021-01-22 17:40:48 -08:00
Brian Downs 13229019f8
Add ability to perform an etcd on-demand snapshot via cli (#2819)
* add ability to perform an etcd on-demand snapshot via cli
2021-01-21 14:09:15 -07:00
MonzElmasry 86f68d5d62
change etcd dir permission if it exists
Signed-off-by: MonzElmasry <menna.elmasry@rancher.com>
2021-01-08 23:47:36 +02:00
Hussein Galal fadc5a8057
Add tombstone file to etcd and catch errc etcd channel (#2592)
* Add tombstone file to embedded etcd

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go mod update

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more changes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* gofmt and goimports

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go mod update

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go lint

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go lint

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go mod tidy

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2020-12-07 22:30:44 +02:00
Menna Elmasry 523ccaf3f2
Merge pull request #2448 from MonzElmasry/new_b
Make etcd use node private ip
2020-10-29 00:23:56 +02:00
MonzElmasry e8436cc76b
Make etcd use node private ip
Signed-off-by: MonzElmasry <menna.elmasry@rancher.com>
2020-10-28 23:45:24 +02:00
Hussein Galal fcd18d1b6e
skip node delete from removed member (#2413)
* skip node delete from removed member

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* use grpc errors

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go imports

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* exit if node is the etcd that being removed

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2020-10-28 18:32:51 +02:00
Brad Davidson de18528412
Make etcd voting members responsible for managing learners (#2399)
* Set etcd timeouts using values from k8s instead of etcdctl
  Fix for one of the warnings from #2303
* Use etcd zap logger instead of deprecated capsnlog
  Fix for one of the warnings from #2303
* Remove member self-promotion code paths
* Add learner promotion tracking code
* Fix RaftAppliedIndex progress check
* Remove ErrGRPCKeyNotFound check
  This is not used by v3 API - it just returns a response with 0 KVs.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-10-27 11:06:26 -07:00
Hussein Galal 373449ec0a
Allow for multiple etcd snapshot restoration (#2307)
* add reset tmp file

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go imports

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix multiple lines string

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix typo

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* use resetFile function

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2020-09-30 02:53:31 +02:00
Brad Davidson 703ba5cde7 Add a bunch of doc comments
Also change identical error messages to clarify where problems are
occurring.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:10:00 -07:00