Hussein Galal
a939decf01
fix a runtime core panic ( #3627 )
...
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-07-13 23:33:07 +02:00
Brian Downs
238dc2086e
prevent snapshot save when snapshots are disabled ( #3475 )
...
* prevent snapshot save when snapshots are disabled
2021-07-09 10:22:49 -07:00
Brad Davidson
cbfe673c43
Fix spelling to satisfy codespell check
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-07-01 13:29:03 -07:00
Brad Davidson
246b378a27
Bump kine to resolve race condition and unrevisioned delete
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-06-30 09:54:46 -07:00
Hussein Galal
136dddca11
Fix storing bootstrap data with empty token string ( #3422 )
...
* Fix storing bootstrap data with empty token string
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* delete node password secret after restoration
fixes to bootstrap key
vendor update
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* fix comment
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* fix typo
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* more fixes
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* fixes
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* fixes
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* typos
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* Removing dynamic listener file after restoration
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* go mod tidy
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-06-22 22:42:34 +02:00
Brad Davidson
f6cec4e75d
Add kubernetes.default.svc to serving certs
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-06-08 12:55:20 -07:00
Brian Downs
afd506a595
fix possible race where bootstrap data might not save
...
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-06-04 15:05:47 -07:00
Hussein Galal
948295e8e8
Fix cluster restoration in rke2 ( #3295 )
...
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-05-11 00:06:33 +02:00
Hussein Galal
f410fc7d1e
Invoke cluster reset function when only reset flag is passed ( #3276 )
...
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-05-05 17:40:04 +02:00
Brian Downs
c5ad71ce0b
Collect and Store etcd Snapshots and Metadata ( #3239 )
...
* Add the ability to store local etcd snapshots and etcd snapshots stored in an S3 compatible object store in a ConfigMap.
2021-04-30 18:26:39 -07:00
Brian Downs
4a49b9e40b
delete nocluster file and remove build tag
...
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-04-07 12:16:28 -07:00
Brian Downs
400a632666
put etcd bootstrap save call in goroutine and update comment
...
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-03-17 14:33:00 -07:00
Hussein Galal
73df65d93a
remove etcd data dir when etcd is disabled ( #3059 )
...
* remove etcd data dir when etcd is disabled
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* fix comment
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* more fixes
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* use debug instead of info logs
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-03-16 18:14:43 +02:00
Brian Downs
7c99f8645d
Have Bootstrap Data Stored in etcd at Completed Start ( #3038 )
...
* have state stored in etcd at completed start and remove unneeded code
2021-03-11 13:07:40 -07:00
Brad Davidson
c0d129003b
Handle loadbalancer port in TIME_WAIT
...
If the port wanted by the client load balancer is in TIME_WAIT, startup
will fail. Set SO_REUSEPORT so that it can be listened on again
immediately.
The configurable Listen call wants a context, so plumb that through as
well.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-03-08 17:05:25 -08:00
Brad Davidson
7cdfaad6ce
Always use static ports for client load-balancers ( #3026 )
...
* Always use static ports for the load-balancers
This fixes an issue where RKE2 kube-proxy daemonset pods were failing to
communicate with the apiserver when RKE2 was restarted because the
load-balancer used a different port every time it started up.
This also changes the apiserver load-balancer port to be 1 below the
supervisor port instead of 1 above it. This makes the apiserver port
consistent at 6443 across servers and agents on RKE2.
Additional fixes below were required to successfully test and use this change
on etcd-only nodes.
* Actually add lb-server-port flag to CLI
* Fix nil pointer when starting server with --disable-etcd but no --server
* Don't try to use full URI as initial load-balancer endpoint
* Fix etcd load-balancer pool updates
* Update dynamiclistener to fix cert updates on etcd-only nodes
* Handle recursive initial server URL in load balancer
* Don't run the deploy controller on etcd-only nodes
2021-03-06 02:29:57 -08:00
Brian Downs
4d1f9eda9d
Etcd Snapshot/Restore to/from S3 Compatible Backends ( #2902 )
...
* Add functionality for etcd snapshot/restore to and from S3 compatible backends.
* Update etcd restore functionality to extract and write certificates and configs from snapshot.
2021-03-03 11:14:12 -07:00
galal-hussein
ef999f0b4f
change error to warn when removing self from etcd members
...
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-03-02 00:19:57 +02:00
galal-hussein
d6124981d5
remove etcd member if disable etcd is passed
...
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-03-01 23:50:50 +02:00
Hussein Galal
5749f66aa3
Add disable flags for control components ( #2900 )
...
* Add disable flags to control components
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* golint
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* more fixes
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* fixes to disable flags
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* Add comments to functions
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* Fix joining problem
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* more fixes
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* golint
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* fix ticker
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* fix role labels
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* more fixes
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-02-12 17:35:57 +02:00
Brian Downs
13229019f8
Add ability to perform an etcd on-demand snapshot via cli ( #2819 )
...
* add ability to perform an etcd on-demand snapshot via cli
2021-01-21 14:09:15 -07:00
JenTing Hsiao
57041f0239
Add codespell CI test and fix codespell error ( #2740 )
...
* Add codespell CI test
* Fix codespell error
2020-12-22 12:35:58 -08:00
Jacob Blain Christen
36230daa86
[migration k3s-io] update kine dependency ( #2568 )
...
rancher/kine ➡️ k3s-io/kine
Part of https://github.com/rancher/k3s/issues/2189
Signed-off-by: Jacob Blain Christen <jacob@rancher.com>
2020-11-30 16:45:22 -07:00
Brad Davidson
de18528412
Make etcd voting members responsible for managing learners ( #2399 )
...
* Set etcd timeouts using values from k8s instead of etcdctl
Fix for one of the warnings from #2303
* Use etcd zap logger instead of deprecated capsnlog
Fix for one of the warnings from #2303
* Remove member self-promotion code paths
* Add learner promotion tracking code
* Fix RaftAppliedIndex progress check
* Remove ErrGRPCKeyNotFound check
This is not used by v3 API - it just returns a response with 0 KVs.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-10-27 11:06:26 -07:00
Brad Davidson
c3c983198f
Add temporary fix for issue with interrupted etcd promote
...
This is a minimal fix for https://github.com/rancher/rke2/issues/392
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-30 11:45:58 -07:00
Hussein Galal
373449ec0a
Allow for multiple etcd snapshot restoration ( #2307 )
...
* add reset tmp file
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* go imports
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* fix multiple lines string
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* fix typo
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* use resetFile function
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2020-09-30 02:53:31 +02:00
Brad Davidson
8262e23169
Revert removal of EndpointName hooks ( #2319 )
...
* Revert "Remove dead EndpointName code"
This reverts commit 8025da5a8d
.
* Fix docstrings based on proper understanding of use
2020-09-28 18:13:55 -07:00
Brad Davidson
45dd4afe50
Simplify token parsing
...
Improves readability, reduces round-trips to the join server to validate certs.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:26:24 -07:00
Brad Davidson
9074da7405
Fix misc nits and missing/unused imports
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:10:00 -07:00
Brad Davidson
703ba5cde7
Add a bunch of doc comments
...
Also change identical error messages to clarify where problems are
occurring.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:10:00 -07:00
Brad Davidson
ae916c2dec
Use const for kube-system namespace
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:10:00 -07:00
Brad Davidson
8025da5a8d
Remove dead EndpointName code
...
According to @galal-hussein this is dead code that was probably brought
over from Kine. I certainly couldn't figure out what it is supposed to
be doing.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:10:00 -07:00
Brad Davidson
97eb28a01a
Remove unnecessary listener arg from managed DB setup
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:09:45 -07:00
Brad Davidson
a3bbd58f37
Fix managed etcd cold startup deadlock issue #2249
...
We should ignore --token and --server if the managed database is initialized,
just like we ignore --cluster-init. If the user wants to join a new
cluster, or rejoin a cluster after --cluster-reset, they need to delete
the database. This a cleaner way to prevent deadlocking on quorum loss,
and removes the requirement that the target of the --server argument
must be online before already joined nodes can start.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 02:44:49 -07:00
Brian Downs
ba70c41cce
Initial Logging Output Update ( #2246 )
...
This attempts to update logging statements to make them consistent
through out the code base. It also adds additional context to messages
where possible, simplifies messages, and updates level where necessary.
2020-09-21 09:56:03 -07:00
Menna Elmasry
edb3e5b7a7
Add error logger to http server ( #2242 )
...
* add error logger to http server
Signed-off-by: MonzElmasry <menna.elmasry@rancher.com>
2020-09-14 23:14:30 +02:00
Brian Downs
866dc94cea
Galal hussein etcd backup restore ( #2154 )
...
* Add etcd snapshot and restore
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* fix error logs
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* goimports
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* fix flag describtion
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* Add disable snapshot and retention
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* use creation time for snapshot retention
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* unexport method, update var name
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* adjust snapshot flags
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* update var name, string concat
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* revert previous change, create constants
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* update
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* updates
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* type assertion error checking
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* update
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* update
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* update
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* pr remediation
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* pr remediation
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* pr remediation
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* pr remediation
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* pr remediation
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* updates
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* updates
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* simplify logic, remove unneeded function
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* update flags
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* update flags
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* add comment
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* exit on restore completion, update flag names, move retention check
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* exit on restore completion, update flag names, move retention check
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* exit on restore completion, update flag names, move retention check
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* update disable snapshots flag and field names
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* move function
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* update field names
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* update var and field names
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* update var and field names
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* update defaultSnapshotIntervalMinutes to 12 like rke
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* update directory perms
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* update etc-snapshot-dir usage
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* update interval to 12 hours
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* fix usage typo
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* add cron
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* add cron
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* add cron
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* wire in cron
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* wire in cron
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* wire in cron
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* wire in cron
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* wire in cron
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* wire in cron
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* wire in cron
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* update deps target to work, add build/data target for creation, and generate
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* remove dead make targets
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* error handling, cluster reset functionality
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* error handling, cluster reset functionality
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* update
Signed-off-by: Brian Downs <brian.downs@gmail.com>
* remove intermediate dapper file
Signed-off-by: Brian Downs <brian.downs@gmail.com>
Co-authored-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2020-08-28 16:57:40 -07:00
Brad Davidson
b1d017f892
Update dynamiclistener
...
Second round of fixes for #1621
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-08-18 10:38:47 -07:00
Brad Davidson
3e8141dc65
Update dynamiclistener
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-08-04 13:05:37 -07:00
Hussein Galal
169ee63907
Add etcd members as learners ( #2066 )
...
* Add etcd members as learners
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* Ignore errors in promote member
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2020-07-29 22:52:49 +02:00
Darren Shepherd
6b5b69378f
Add embedded etcd support
...
This is replaces dqlite with etcd. The each same UX of dqlite is
followed so there is no change to the CLI args for this.
2020-06-06 16:39:41 -07:00
Darren Shepherd
a18d387390
Refactor clustered DB framework
2020-06-06 16:39:41 -07:00
Darren Shepherd
4317a91b96
Delete dqlite
2020-06-06 16:39:41 -07:00
Darren Shepherd
7e59c0801e
Make program name a variable to be changed at compile time
2020-06-06 16:39:41 -07:00
Chuck Schweizer
ca9c9c2e1e
Adding support for TLS MinVersion and CipherSuites
...
This will watch for the following kube-apiserver-arg variables and apply
them to the k3s kube-apiserver https listener.
--kube-apiserver-arg=tls-cipher-suites=XXXXXXX
--kube-apiserver-arg=tls-min-version=XXXXXXX
2020-05-07 09:27:09 -05:00
Darren Shepherd
2f5ee914f9
Add supervisor port
...
In k3s today the kubernetes API and the /v1-k3s API are combined into
one http server. In rke2 we are running unmodified, non-embedded Kubernetes
and as such it is preferred to run k8s and the /v1-k3s API on different
ports. The /v1-k3s API port is called the SupervisorPort in the code.
To support this separation of ports a new shim was added on the client in
then pkg/agent/proxy package that will launch two load balancers instead
of just one load balancer. One load balancer for 6443 and the other
for 9345 (which is the supervisor port).
2020-05-05 15:54:51 -07:00
Darren Shepherd
4acaa0740d
Small dqlite fixes
2019-12-16 11:45:01 -07:00
galal-hussein
99b8222e8d
Change storage to datastore
2019-11-15 21:52:07 -07:00
Darren Shepherd
6063317144
Add a couple more known SANs
2019-11-13 06:05:31 +00:00
Darren Shepherd
0ae20eb7a3
Support both http and db based bootstrap
2019-11-12 01:12:24 +00:00