Commit Graph

47 Commits (16898462993f78b2f33c0c0a527a0fd3ca42ebd2)

Author SHA1 Message Date
Brad Davidson 7ecd5874d2 Skip initial datastore reconcile during cluster-reset
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-11-15 14:31:44 -08:00
Brad Davidson d885162967 Add server token hash to CR and S3
This required pulling the token hash stuff out of the cluster package, into util.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-12 15:04:45 -07:00
Brad Davidson 7464007037 Store extra metadata and cluster ID for snapshots
Write the extra metadata both locally and to S3. These files are placed such that they will not be used by older versions of K3s that do not make use of them.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-12 15:04:45 -07:00
Brad Davidson 002e6c43ee Reorganize Driver interface and etcd driver to avoid passing context and config into most calls
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-09-25 11:54:23 -07:00
Brad Davidson d95980bba3 Lock bootstrap data with empty key to prevent conflicts
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-04-05 10:56:57 -07:00
Brad Davidson 992e64993d Add support for kubeadm token and client certificate auth
Allow bootstrapping with kubeadm bootstrap token strings or existing
Kubelet certs. This allows agents to join the cluster using kubeadm
bootstrap tokens, as created with the `k3s token create` command.

When the token expires or is deleted, agents can successfully restart by
authenticating with their kubelet certificate via node authentication.
If the token is gone and the node is deleted from the cluster, node auth
will fail and they will be prevented from rejoining the cluster until
provided with a valid token.

Servers still must be bootstrapped with the static cluster token, as
they will need to know it to decrypt the bootstrap data.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-02-07 14:55:04 -08:00
Derek Nola 13c633da12
Add Secrets Encryption to CriticalArgs (#6409)
* Add EncryptSecrets to Critical Control Args
* use deep comparison to extract differences

Signed-off-by: Derek Nola <derek.nola@suse.com>

Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-11-04 10:35:29 -07:00
iyear 3aae7b8783 Fix incorrect defer usage
Problem:
Using defer inside a loop can lead to resource leaks

Solution:
Judge newer file in the separate function

Signed-off-by: iyear <ljyngup@gmail.com>
2022-11-01 16:23:25 -07:00
Derek Nola 06d81cb936
Replace deprecated ioutil package (#6230)
* Replace ioutil package
* check integration test null pointer
* Remove rotate retries

Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-10-07 17:36:57 -07:00
Brad Davidson fc1c100ffd Remove legacy bidirectional datastore sync code
Since #4438 removed 2-way sync and treats any changed+newer files on disk as an error, we no longer need to determine if files are newer on disk/db or if there is a conflicting mix of both. Any changed+newer file is an error, unless we're doing a cluster reset in which case everything is unconditionally replaced.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-12 12:10:30 -07:00
Brad Davidson 83420ef78e Fix fatal error when reconciling bootstrap data
Properly skip restoring bootstrap data for files that don't have a path
set because the feature that would set it isn't enabled.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-12 12:10:30 -07:00
Brad Davidson 96162c07c5 Handle egress-selector-mode change during upgrade
Properly handle unset egress-selector-mode from existing servers during cluster upgrade.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-30 11:57:41 -07:00
Brad Davidson 1339626a5b Defragment etcd datastore before clearing alarms
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-28 09:27:59 -07:00
Brad Davidson 3cebde924b Handle empty entries in bootstrap path map
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-17 13:42:27 -07:00
Luther Monson 9a849b1bb7
[master] changing package to k3s-io (#4846)
* changing package to k3s-io

Signed-off-by: Luther Monson <luther.monson@gmail.com>

Co-authored-by: Derek Nola <derek.nola@suse.com>
2022-03-02 15:47:27 -08:00
Brad Davidson 9a48086524 Ignore cluster membership errors when reconciling from temp etcd
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-01 20:25:20 -08:00
Brad Davidson e4846c92b4 Move temporary etcd startup into etcd module
Reuse the existing etcd library code to start up the temporary etcd
server for bootstrap reconcile. This allows us to do proper
health-checking of the datastore on startup, including handling of
alarms.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-01 20:25:20 -08:00
Brad Davidson 5014c9e0e8 Fix adding etcd-only node to existing cluster
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-02-28 19:56:08 -08:00
Brad Davidson a1b800f0bf Remove unnecessary copies of etcdconfig struct
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-02-28 12:05:16 -08:00
Brad Davidson 2989b8b2c5 Remove unnecessary copies of runtime struct
Several types contained redundant references to ControlRuntime data. Switch to consistently accessing this via config.Runtime instead.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-02-28 12:05:16 -08:00
Brad Davidson 5ca206ad3b Fix handling of agent-token fallback to token
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-01-07 09:56:37 -08:00
Brad Davidson e7464a17f7 Fix use of agent creds for secrets-encrypt and config validate
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-01-06 12:55:18 -08:00
Brian Downs 3ae550ae51
Update bootstrap logic to output all changed files on disk (#4800) 2021-12-21 14:28:32 -07:00
Brad Davidson 8ad7d141e8 Close etcd clients to avoid leaking GRPC connections
If you don't explicitly close the etcd client when you're done with it,
the GRPC connection hangs around in the background. Normally this is
harmelss, but in the case of the temporary etcd we start up on 2399 to
reconcile bootstrap data, the client will start logging errors
afterwards when the server goes away.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-12-17 23:55:17 -08:00
Derek Nola 17eebe0563
Fix cold boot and reconcilation on secondary servers (#4747)
* Enable reconcilation on secondary servers

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Remove unused code

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Attempt to reconcile with datastore first

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Added warning on failure

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Update warning

Signed-off-by: Derek Nola <derek.nola@suse.com>

* golangci-lint fix

Signed-off-by: Derek Nola <derek.nola@suse.com>
2021-12-15 15:38:50 -08:00
Hussein Galal d71b335871
Fix snapshot restoration on fresh nodes (#4737)
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-12-14 02:04:39 +02:00
Brian Downs bf4e037fcf
Resolve Bootstrap Migration Edge Case (#4730) 2021-12-13 13:02:30 -07:00
Brian Downs a6fe2c0bc5
Resolve restore bootstrap (#4704) 2021-12-09 14:54:27 -07:00
Manuel Buil 1e0696628e
Merge pull request #4581 from manuelbuil/checking-HA-parameters
Verify new control plane nodes joining the cluster share the same config as cluster members
2021-12-08 10:49:28 +01:00
Derek Nola bcb662926d
Secrets-encryption rotation (#4372)
* Regular CLI framework for encrypt commands
* New secrets-encryption feature
* New integration test
* fixes for flaky integration test CI
* Fix to bootstrap on restart of existing nodes
* Consolidate event recorder

Signed-off-by: Derek Nola <derek.nola@suse.com>
2021-12-07 14:31:32 -08:00
Manuel Buil 1b3187ea07 Check HA network parameters
Signed-off-by: Manuel Buil <mbuil@suse.com>
2021-12-07 23:09:05 +01:00
Chris Kim f18b3252c0
[master] Add etcd extra args support for K3s (#4463)
* Add etcd extra args support for K3s

Signed-off-by: Chris Kim <oats87g@gmail.com>

* Add etcd custom argument integration test

Signed-off-by: Chris Kim <oats87g@gmail.com>

* go generate

Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-11-11 21:03:15 -08:00
Brian Downs adaeae351c
update bootstrap logic (#4438)
* update bootstrap logic resolving a startup bug and account for etcd
2021-11-10 05:33:42 -07:00
Brian Downs 0a0b915921
reset buffer after use (#4279) 2021-10-22 15:56:01 -07:00
Brian Downs 34080b23b1
Copy old bootstrap buffer data for use during migration (#4215) 2021-10-15 10:17:29 -07:00
Brian Downs ac7a8d89c6
Add ability to reconcile bootstrap data between datastore and disk (#3398) 2021-10-07 12:47:00 -07:00
Hussein Galal 136dddca11
Fix storing bootstrap data with empty token string (#3422)
* Fix storing bootstrap data with empty token string

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* delete node password secret after restoration

fixes to bootstrap key

vendor update

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix comment

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix typo

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* typos

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Removing dynamic listener file after restoration

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go mod tidy

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-06-22 22:42:34 +02:00
Brian Downs 7c99f8645d
Have Bootstrap Data Stored in etcd at Completed Start (#3038)
* have state stored in etcd at completed start and remove unneeded code
2021-03-11 13:07:40 -07:00
Brad Davidson 7cdfaad6ce
Always use static ports for client load-balancers (#3026)
* Always use static ports for the load-balancers

This fixes an issue where RKE2 kube-proxy daemonset pods were failing to
communicate with the apiserver when RKE2 was restarted because the
load-balancer used a different port every time it started up.

This also changes the apiserver load-balancer port to be 1 below the
supervisor port instead of 1 above it. This makes the apiserver port
consistent at 6443 across servers and agents on RKE2.

Additional fixes below were required to successfully test and use this change
on etcd-only nodes.

* Actually add lb-server-port flag to CLI
* Fix nil pointer when starting server with --disable-etcd but no --server
* Don't try to use full URI as initial load-balancer endpoint
* Fix etcd load-balancer pool updates
* Update dynamiclistener to fix cert updates on etcd-only nodes
* Handle recursive initial server URL in load balancer
* Don't run the deploy controller on etcd-only nodes
2021-03-06 02:29:57 -08:00
Brian Downs 13229019f8
Add ability to perform an etcd on-demand snapshot via cli (#2819)
* add ability to perform an etcd on-demand snapshot via cli
2021-01-21 14:09:15 -07:00
JenTing Hsiao 57041f0239
Add codespell CI test and fix codespell error (#2740)
* Add codespell CI test
* Fix codespell error
2020-12-22 12:35:58 -08:00
Brad Davidson c3c983198f Add temporary fix for issue with interrupted etcd promote
This is a minimal fix for https://github.com/rancher/rke2/issues/392

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-30 11:45:58 -07:00
Brad Davidson 45dd4afe50 Simplify token parsing
Improves readability, reduces round-trips to the join server to validate certs.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:26:24 -07:00
Brad Davidson 9074da7405 Fix misc nits and missing/unused imports
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:10:00 -07:00
Brad Davidson 703ba5cde7 Add a bunch of doc comments
Also change identical error messages to clarify where problems are
occurring.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:10:00 -07:00
Brad Davidson a3bbd58f37 Fix managed etcd cold startup deadlock issue #2249
We should ignore --token and --server if the managed database is initialized,
just like we ignore --cluster-init. If the user wants to join a new
cluster, or rejoin a cluster after --cluster-reset, they need to delete
the database. This a cleaner way to prevent deadlocking on quorum loss,
and removes the requirement that the target of the --server argument
must be online before already joined nodes can start.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 02:44:49 -07:00
Darren Shepherd a18d387390 Refactor clustered DB framework 2020-06-06 16:39:41 -07:00