Commit Graph

113 Commits (e5e1a674ce4e8772eebbb99f119a78b38dfc01c4)

Author SHA1 Message Date
Brad Davidson cf12a13175 Add missing node name entry to apiserver SAN list
Also honor node-ip when adding the node address to the SAN list, instead
of hardcoding the autodetected IP address.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-09-01 13:22:32 -07:00
Brad Davidson b8add39b07 Bump kine for metrics/tls changes
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-09-01 01:51:30 -07:00
Brad Davidson dc14f370c4 Update wrangler to v0.8.5
Required to support apiextensions.v1 as v1beta1 has been deleted. Also
update helm-controller and dynamiclistener to track wrangler versions.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-08-20 18:47:16 -07:00
galal-hussein 20a48734c2 more fixes
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-07-21 22:42:05 +02:00
galal-hussein 7ebcc4b134 more fixes
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-07-21 22:39:44 +02:00
galal-hussein b4401296ec replace error with warn in delete
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-07-21 22:18:56 +02:00
galal-hussein 2f82bfcf67 fix warning msg
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-07-21 22:05:43 +02:00
galal-hussein b377839148 migrate old token key format
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-07-21 20:59:57 +02:00
galal-hussein 997ed7b9b4 simplifying the code
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-07-21 19:56:19 +02:00
galal-hussein ad17292fa8 migrate empty string key properly
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-07-21 19:21:38 +02:00
galal-hussein a65e5b6466 Fix multiple bootstrap keys found
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-07-21 02:50:42 +02:00
Hussein Galal a939decf01
fix a runtime core panic (#3627)
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-07-13 23:33:07 +02:00
Brian Downs 238dc2086e
prevent snapshot save when snapshots are disabled (#3475)
* prevent snapshot save when snapshots are disabled
2021-07-09 10:22:49 -07:00
Brad Davidson cbfe673c43 Fix spelling to satisfy codespell check
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-07-01 13:29:03 -07:00
Brad Davidson 246b378a27 Bump kine to resolve race condition and unrevisioned delete
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-06-30 09:54:46 -07:00
Hussein Galal 136dddca11
Fix storing bootstrap data with empty token string (#3422)
* Fix storing bootstrap data with empty token string

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* delete node password secret after restoration

fixes to bootstrap key

vendor update

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix comment

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix typo

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* typos

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Removing dynamic listener file after restoration

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go mod tidy

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-06-22 22:42:34 +02:00
Brad Davidson f6cec4e75d Add kubernetes.default.svc to serving certs
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-06-08 12:55:20 -07:00
Brian Downs afd506a595 fix possible race where bootstrap data might not save
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-06-04 15:05:47 -07:00
Hussein Galal 948295e8e8
Fix cluster restoration in rke2 (#3295)
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-05-11 00:06:33 +02:00
Hussein Galal f410fc7d1e
Invoke cluster reset function when only reset flag is passed (#3276)
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-05-05 17:40:04 +02:00
Brian Downs c5ad71ce0b
Collect and Store etcd Snapshots and Metadata (#3239)
* Add the ability to store local etcd snapshots and etcd snapshots stored in an S3 compatible object store in a ConfigMap.
2021-04-30 18:26:39 -07:00
Brian Downs 4a49b9e40b delete nocluster file and remove build tag
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-04-07 12:16:28 -07:00
Brian Downs 400a632666 put etcd bootstrap save call in goroutine and update comment
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-03-17 14:33:00 -07:00
Hussein Galal 73df65d93a
remove etcd data dir when etcd is disabled (#3059)
* remove etcd data dir when etcd is disabled

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix comment

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* use debug instead of info logs

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-03-16 18:14:43 +02:00
Brian Downs 7c99f8645d
Have Bootstrap Data Stored in etcd at Completed Start (#3038)
* have state stored in etcd at completed start and remove unneeded code
2021-03-11 13:07:40 -07:00
Brad Davidson c0d129003b Handle loadbalancer port in TIME_WAIT
If the port wanted by the client load balancer is in TIME_WAIT, startup
will fail. Set SO_REUSEPORT so that it can be listened on again
immediately.

The configurable Listen call wants a context, so plumb that through as
well.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-03-08 17:05:25 -08:00
Brad Davidson 7cdfaad6ce
Always use static ports for client load-balancers (#3026)
* Always use static ports for the load-balancers

This fixes an issue where RKE2 kube-proxy daemonset pods were failing to
communicate with the apiserver when RKE2 was restarted because the
load-balancer used a different port every time it started up.

This also changes the apiserver load-balancer port to be 1 below the
supervisor port instead of 1 above it. This makes the apiserver port
consistent at 6443 across servers and agents on RKE2.

Additional fixes below were required to successfully test and use this change
on etcd-only nodes.

* Actually add lb-server-port flag to CLI
* Fix nil pointer when starting server with --disable-etcd but no --server
* Don't try to use full URI as initial load-balancer endpoint
* Fix etcd load-balancer pool updates
* Update dynamiclistener to fix cert updates on etcd-only nodes
* Handle recursive initial server URL in load balancer
* Don't run the deploy controller on etcd-only nodes
2021-03-06 02:29:57 -08:00
Brian Downs 4d1f9eda9d
Etcd Snapshot/Restore to/from S3 Compatible Backends (#2902)
* Add functionality for etcd snapshot/restore to and from S3 compatible backends.
* Update etcd restore functionality to extract and write certificates and configs from snapshot.
2021-03-03 11:14:12 -07:00
galal-hussein ef999f0b4f change error to warn when removing self from etcd members
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-03-02 00:19:57 +02:00
galal-hussein d6124981d5 remove etcd member if disable etcd is passed
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-03-01 23:50:50 +02:00
Hussein Galal 5749f66aa3
Add disable flags for control components (#2900)
* Add disable flags to control components

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* golint

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fixes to disable flags

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Add comments to functions

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Fix joining problem

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* golint

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix ticker

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix role labels

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-02-12 17:35:57 +02:00
Brian Downs 13229019f8
Add ability to perform an etcd on-demand snapshot via cli (#2819)
* add ability to perform an etcd on-demand snapshot via cli
2021-01-21 14:09:15 -07:00
JenTing Hsiao 57041f0239
Add codespell CI test and fix codespell error (#2740)
* Add codespell CI test
* Fix codespell error
2020-12-22 12:35:58 -08:00
Jacob Blain Christen 36230daa86
[migration k3s-io] update kine dependency (#2568)
rancher/kine ➡️ k3s-io/kine

Part of https://github.com/rancher/k3s/issues/2189

Signed-off-by: Jacob Blain Christen <jacob@rancher.com>
2020-11-30 16:45:22 -07:00
Brad Davidson de18528412
Make etcd voting members responsible for managing learners (#2399)
* Set etcd timeouts using values from k8s instead of etcdctl
  Fix for one of the warnings from #2303
* Use etcd zap logger instead of deprecated capsnlog
  Fix for one of the warnings from #2303
* Remove member self-promotion code paths
* Add learner promotion tracking code
* Fix RaftAppliedIndex progress check
* Remove ErrGRPCKeyNotFound check
  This is not used by v3 API - it just returns a response with 0 KVs.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-10-27 11:06:26 -07:00
Brad Davidson c3c983198f Add temporary fix for issue with interrupted etcd promote
This is a minimal fix for https://github.com/rancher/rke2/issues/392

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-30 11:45:58 -07:00
Hussein Galal 373449ec0a
Allow for multiple etcd snapshot restoration (#2307)
* add reset tmp file

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go imports

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix multiple lines string

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix typo

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* use resetFile function

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2020-09-30 02:53:31 +02:00
Brad Davidson 8262e23169
Revert removal of EndpointName hooks (#2319)
* Revert "Remove dead EndpointName code"
    This reverts commit 8025da5a8d.
* Fix docstrings based on proper understanding of use
2020-09-28 18:13:55 -07:00
Brad Davidson 45dd4afe50 Simplify token parsing
Improves readability, reduces round-trips to the join server to validate certs.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:26:24 -07:00
Brad Davidson 9074da7405 Fix misc nits and missing/unused imports
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:10:00 -07:00
Brad Davidson 703ba5cde7 Add a bunch of doc comments
Also change identical error messages to clarify where problems are
occurring.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:10:00 -07:00
Brad Davidson ae916c2dec Use const for kube-system namespace
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:10:00 -07:00
Brad Davidson 8025da5a8d Remove dead EndpointName code
According to @galal-hussein this is dead code that was probably brought
over from Kine. I certainly couldn't figure out what it is supposed to
be doing.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:10:00 -07:00
Brad Davidson 97eb28a01a Remove unnecessary listener arg from managed DB setup
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:09:45 -07:00
Brad Davidson a3bbd58f37 Fix managed etcd cold startup deadlock issue #2249
We should ignore --token and --server if the managed database is initialized,
just like we ignore --cluster-init. If the user wants to join a new
cluster, or rejoin a cluster after --cluster-reset, they need to delete
the database. This a cleaner way to prevent deadlocking on quorum loss,
and removes the requirement that the target of the --server argument
must be online before already joined nodes can start.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 02:44:49 -07:00
Brian Downs ba70c41cce
Initial Logging Output Update (#2246)
This attempts to update logging statements to make them consistent
through out the code base. It also adds additional context to messages
where possible, simplifies messages, and updates level where necessary.
2020-09-21 09:56:03 -07:00
Menna Elmasry edb3e5b7a7
Add error logger to http server (#2242)
* add error logger to http server

Signed-off-by: MonzElmasry <menna.elmasry@rancher.com>
2020-09-14 23:14:30 +02:00
Brian Downs 866dc94cea
Galal hussein etcd backup restore (#2154)
* Add etcd snapshot and restore

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix error logs

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* goimports

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix flag describtion

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Add disable snapshot and retention

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* use creation time for snapshot retention

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* unexport method, update var name

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* adjust snapshot flags

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update var name, string concat

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* revert previous change, create constants

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* updates

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* type assertion error checking

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* pr remediation

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* pr remediation

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* pr remediation

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* pr remediation

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* pr remediation

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* updates

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* updates

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* simplify logic, remove unneeded function

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update flags

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update flags

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* add comment

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* exit on restore completion, update flag names, move retention check

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* exit on restore completion, update flag names, move retention check

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* exit on restore completion, update flag names, move retention check

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update disable snapshots flag and field names

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* move function

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update field names

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update var and field names

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update var and field names

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update defaultSnapshotIntervalMinutes to 12 like rke

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update directory perms

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update etc-snapshot-dir usage

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update interval to 12 hours

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* fix usage typo

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* add cron

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* add cron

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* add cron

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* wire in cron

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* wire in cron

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* wire in cron

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* wire in cron

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* wire in cron

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* wire in cron

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* wire in cron

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update deps target to work, add build/data target for creation, and generate

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* remove dead make targets

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* error handling, cluster reset functionality

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* error handling, cluster reset functionality

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* remove intermediate dapper file

Signed-off-by: Brian Downs <brian.downs@gmail.com>

Co-authored-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2020-08-28 16:57:40 -07:00
Brad Davidson b1d017f892 Update dynamiclistener
Second round of fixes for #1621

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-08-18 10:38:47 -07:00
Brad Davidson 3e8141dc65 Update dynamiclistener
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-08-04 13:05:37 -07:00
Hussein Galal 169ee63907
Add etcd members as learners (#2066)
* Add etcd members as learners

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Ignore errors in promote member

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2020-07-29 22:52:49 +02:00
Darren Shepherd 6b5b69378f Add embedded etcd support
This is replaces dqlite with etcd.  The each same UX of dqlite is
followed so there is no change to the CLI args for this.
2020-06-06 16:39:41 -07:00
Darren Shepherd a18d387390 Refactor clustered DB framework 2020-06-06 16:39:41 -07:00
Darren Shepherd 4317a91b96 Delete dqlite 2020-06-06 16:39:41 -07:00
Darren Shepherd 7e59c0801e Make program name a variable to be changed at compile time 2020-06-06 16:39:41 -07:00
Chuck Schweizer ca9c9c2e1e Adding support for TLS MinVersion and CipherSuites
This will watch for the following kube-apiserver-arg variables and apply
them to the k3s kube-apiserver https listener.

  --kube-apiserver-arg=tls-cipher-suites=XXXXXXX
  --kube-apiserver-arg=tls-min-version=XXXXXXX
2020-05-07 09:27:09 -05:00
Darren Shepherd 2f5ee914f9 Add supervisor port
In k3s today the kubernetes API and the /v1-k3s API are combined into
one http server.  In rke2 we are running unmodified, non-embedded Kubernetes
and as such it is preferred to run k8s and the /v1-k3s API on different
ports.  The /v1-k3s API port is called the SupervisorPort in the code.

To support this separation of ports a new shim was added on the client in
then pkg/agent/proxy package that will launch two load balancers instead
of just one load balancer.  One load balancer for 6443 and the other
for 9345 (which is the supervisor port).
2020-05-05 15:54:51 -07:00
Darren Shepherd 4acaa0740d Small dqlite fixes 2019-12-16 11:45:01 -07:00
galal-hussein 99b8222e8d Change storage to datastore 2019-11-15 21:52:07 -07:00
Darren Shepherd 6063317144 Add a couple more known SANs 2019-11-13 06:05:31 +00:00
Darren Shepherd 0ae20eb7a3 Support both http and db based bootstrap 2019-11-12 01:12:24 +00:00
Darren Shepherd e2431bdf9d Add dqlite support 2019-11-10 03:49:56 +00:00
Darren Shepherd ba240d0611 Refactor tokens, bootstrap, and cli args 2019-10-30 19:06:49 -07:00