Commit Graph

30 Commits (7da7a00f8f4c9307742c99bdbdd15693f709d25b)

Author SHA1 Message Date
Brad Davidson 7da7a00f8f Move temporary etcd startup into etcd module
Reuse the existing etcd library code to start up the temporary etcd
server for bootstrap reconcile. This allows us to do proper
health-checking of the datastore on startup, including handling of
alarms.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit e4846c92b4)
2022-03-15 18:09:00 -07:00
Brad Davidson f55f09672e Fix adding etcd-only node to existing cluster
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 5014c9e0e8)
2022-03-15 18:09:00 -07:00
Brad Davidson ee4c209df9 Remove unnecessary copies of etcdconfig struct
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit a1b800f0bf)
2022-03-15 18:09:00 -07:00
Brad Davidson a18c38d63d Remove unnecessary copies of runtime struct
Several types contained redundant references to ControlRuntime data. Switch to consistently accessing this via config.Runtime instead.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 2989b8b2c5)
2022-03-15 18:09:00 -07:00
Brad Davidson 0a1f013f68 Fix handling of agent-token fallback to token
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-01-07 10:15:17 -08:00
Brad Davidson 4decce56a3 Fix use of agent creds for secrets-encrypt and config validate
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-01-06 14:17:03 -08:00
Brad Davidson b38ef3a6aa Close etcd clients to avoid leaking GRPC connections
If you don't explicitly close the etcd client when you're done with it,
the GRPC connection hangs around in the background. Normally this is
harmelss, but in the case of the temporary etcd we start up on 2399 to
reconcile bootstrap data, the client will start logging errors
afterwards when the server goes away.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 8ad7d141e8)
2021-12-22 12:41:36 -08:00
Brian Downs 5b0c1661f1
Update bootstrap logic to output all changed files on disk (#4800) (#4810)
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-12-21 16:22:23 -07:00
Derek Nola 962113d4a0
[Engine-1.21] Fix cold boot and reconcilation on secondary servers (#4754)
* Fix cold boot restarts on secondary servers

Signed-off-by: Derek Nola <derek.nola@suse.com>
2021-12-15 16:12:07 -08:00
Hussein Galal 9b67692414 Fix snapshot restoration on fresh nodes (#4737)
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-12-13 18:14:38 -07:00
Brian Downs bc84ded38c Resolve Bootstrap Migration Edge Case (#4730)
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-12-13 13:09:28 -07:00
Brian Downs 50b358048b
Resolve restore bootstrap (#4704) (#4717) 2021-12-09 17:54:43 -07:00
Manuel Buil b6e176f6a0 Check HA network parameters
Signed-off-by: Manuel Buil <mbuil@suse.com>
2021-12-08 14:42:20 +01:00
Derek Nola e7fe71ea1e
[Engine-1.21] Secrets-encryption rotation (#4656)
* Backport secrets encrypt rotation
* Backport integration fixes for custom etcd args

Signed-off-by: Derek Nola <derek.nola@suse.com>
2021-12-07 21:55:00 -08:00
Chris Kim 381d086cf0
[engine-1.21] Add etcd extra args support for K3s (#4470)
* Add etcd extra args support for K3s

Signed-off-by: Chris Kim <oats87g@gmail.com>

* Add etcd custom argument integration test

Signed-off-by: Chris Kim <oats87g@gmail.com>

* Redux: Enable K3s integration test to run on existing cluster (#3905)

* Made it possible to run int tests on existing cluster

Signed-off-by: dereknola <derek.nola@suse.com>

Signed-off-by: Chris Kim <oats87g@gmail.com>

Co-authored-by: Derek Nola <derek.nola@suse.com>
2021-11-11 19:53:20 -08:00
Brian Downs 30c7723c03
[Engine-1.21] All bootstrap backport (#4451)
Add ability to reconcile bootstrap data between datastore and disk (#3398)
2021-11-10 16:20:33 -07:00
Brad Davidson 7d0ecf3ab2 Revert "Backport bootstrap engine 1.21 (#4314)"
This reverts commits
c5a9154538
45c5d78cd7

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-10-27 13:46:00 -07:00
Brian Downs c5a9154538
reset buffer after use (#4279) (#4330) 2021-10-26 17:58:19 -07:00
Brian Downs 45c5d78cd7
Backport bootstrap engine 1.21 (#4314) 2021-10-25 13:03:30 -07:00
Hussein Galal 136dddca11
Fix storing bootstrap data with empty token string (#3422)
* Fix storing bootstrap data with empty token string

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* delete node password secret after restoration

fixes to bootstrap key

vendor update

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix comment

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix typo

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* typos

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Removing dynamic listener file after restoration

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go mod tidy

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-06-22 22:42:34 +02:00
Brian Downs 7c99f8645d
Have Bootstrap Data Stored in etcd at Completed Start (#3038)
* have state stored in etcd at completed start and remove unneeded code
2021-03-11 13:07:40 -07:00
Brad Davidson 7cdfaad6ce
Always use static ports for client load-balancers (#3026)
* Always use static ports for the load-balancers

This fixes an issue where RKE2 kube-proxy daemonset pods were failing to
communicate with the apiserver when RKE2 was restarted because the
load-balancer used a different port every time it started up.

This also changes the apiserver load-balancer port to be 1 below the
supervisor port instead of 1 above it. This makes the apiserver port
consistent at 6443 across servers and agents on RKE2.

Additional fixes below were required to successfully test and use this change
on etcd-only nodes.

* Actually add lb-server-port flag to CLI
* Fix nil pointer when starting server with --disable-etcd but no --server
* Don't try to use full URI as initial load-balancer endpoint
* Fix etcd load-balancer pool updates
* Update dynamiclistener to fix cert updates on etcd-only nodes
* Handle recursive initial server URL in load balancer
* Don't run the deploy controller on etcd-only nodes
2021-03-06 02:29:57 -08:00
Brian Downs 13229019f8
Add ability to perform an etcd on-demand snapshot via cli (#2819)
* add ability to perform an etcd on-demand snapshot via cli
2021-01-21 14:09:15 -07:00
JenTing Hsiao 57041f0239
Add codespell CI test and fix codespell error (#2740)
* Add codespell CI test
* Fix codespell error
2020-12-22 12:35:58 -08:00
Brad Davidson c3c983198f Add temporary fix for issue with interrupted etcd promote
This is a minimal fix for https://github.com/rancher/rke2/issues/392

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-30 11:45:58 -07:00
Brad Davidson 45dd4afe50 Simplify token parsing
Improves readability, reduces round-trips to the join server to validate certs.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:26:24 -07:00
Brad Davidson 9074da7405 Fix misc nits and missing/unused imports
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:10:00 -07:00
Brad Davidson 703ba5cde7 Add a bunch of doc comments
Also change identical error messages to clarify where problems are
occurring.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:10:00 -07:00
Brad Davidson a3bbd58f37 Fix managed etcd cold startup deadlock issue #2249
We should ignore --token and --server if the managed database is initialized,
just like we ignore --cluster-init. If the user wants to join a new
cluster, or rejoin a cluster after --cluster-reset, they need to delete
the database. This a cleaner way to prevent deadlocking on quorum loss,
and removes the requirement that the target of the --server argument
must be online before already joined nodes can start.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 02:44:49 -07:00
Darren Shepherd a18d387390 Refactor clustered DB framework 2020-06-06 16:39:41 -07:00