Brad Davidson
3576ed4327
Clean up snapshotDir create/exists logic
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-04 12:09:29 -08:00
Brad Davidson
82432a2df7
Fix issue with etcd node name missing hostname
...
* Set ServerNodeName in snapshot CLI setup
* Raise errer if ServerNodeName ends up empty some other way
* Fix status controller to use etcd node name annotation instead of prefix checking
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-03-01 13:52:53 -08:00
Derek Nola
fae41a8b2a
Rename AgentReady to ContainerRuntimeReady for better clarity
...
Signed-off-by: Derek Nola <derek.nola@suse.com>
2024-02-21 12:21:19 -08:00
Brad Davidson
de825845b2
Bump kine and set NotifyInterval to what the apiserver expects
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-02-09 14:22:38 -08:00
Vitor Savian
f9ee66f4d8
Changed how lastHeartBeatTime works in the etcd condition
...
Signed-off-by: Vitor Savian <vitor.savian@suse.com>
2024-02-07 15:05:33 -03:00
Brad Davidson
8224a3a7f6
Fix ipv6 endpoint address selection for on-demand snapshots
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-02-06 18:02:36 -08:00
Brad Davidson
6ec1926f88
Add check for etcd-snapshot-dir and fix panic in Walk
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-02-06 17:47:33 -08:00
Brad Davidson
4005600d4e
Fix excessive retry on snapshot reconcile
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-02-06 17:46:24 -08:00
Vitor Savian
9a70021a9e
Error getting node in setEtcdStatusCondition
...
Signed-off-by: Vitor Savian <vitor.savian@suse.com>
Added retry and changed nodes for
Signed-off-by: Vitor Savian <vitor.savian@suse.com>
2024-01-11 22:06:36 -03:00
Vitor Savian
4a92ced8ee
Handle etcd status condition when cluster reset and disable etcd
...
Signed-off-by: Vitor Savian <vitor.savian@suse.com>
Set condition if node is unhealthy
Signed-off-by: Vitor Savian <vitor.savian@suse.com>
2024-01-09 11:20:41 -03:00
Brad Davidson
319dca3e82
Fix nil map in full snapshot configmap reconcile
...
If a full reconcile wins the race against sync of an individual snapshot resource, or someone intentionally deletes the configmap, the data map could be nil and cause a crash.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-01-04 16:49:58 -08:00
Brad Davidson
1e663622d2
Fix the OTHER log message that prints the wrong variable
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2024-01-04 15:23:39 -08:00
Brad Davidson
6d3a92a658
Print key instead of file path in snapshot metadata log message
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-11-21 14:03:27 -08:00
Brad Davidson
b23e70d519
Don't apply s3 retention if S3 client failed to initialize
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-11-21 14:03:27 -08:00
Brad Davidson
a92c4a0f17
Don't request metadata when listing objects
...
While some implementations may support it, it appears that most don't,
and some may in fact return an error if it is requested.
We already stat the object to get the metadata anyway, so this was
unnecessary if harmless on most implementations.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-11-21 14:03:27 -08:00
Brad Davidson
1e0a7044cf
Reorder snapshot configmap reconcile to reduce log spew during initial startup
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-11-17 10:09:01 -08:00
Vitor Savian
e53c189587
Handle nil pointer when runtime core is not ready in etcd
...
Signed-off-by: Vitor <vitor.savian@suse.com>
2023-11-16 15:58:42 -08:00
Brad Davidson
2088218c5f
Fix issue with snapshot metadata configmap
...
Omit snapshot list configmap entries for snapshots without extra metadata; reduce log level of warnings about missing s3 metadata files.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-11-15 14:25:28 -08:00
Vitor Savian
c5cd7b3d65
Added etcd status condition
...
Signed-off-by: Vitor <vitor.savian@suse.com>
2023-11-13 06:39:24 -08:00
Brad Davidson
49411e7084
Don't try to read token hash and cluster id during cluster-reset
...
These fields are only necessary when saving snapshots to S3, and will block restoration if attempted
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-27 15:06:29 -07:00
Brad Davidson
5b6b9685e9
Manually requeue configmap reconcile when no nodes have reconciled snapshots
...
Silences error message from lasso - this is a normal startup condition
when no snapshots exist so we shouldn't log nasty looking errors.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-18 15:09:25 -07:00
Brad Davidson
3db1d33282
Re-enable etcd endpoint auto-sync
...
Removing this in 002e6c43ee
regressed
control-plane-only nodes, as we rely on the etcd client to update its
endpoint list internally so that we can use it to sync the load-balancer
address list.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-18 08:33:03 -07:00
Brad Davidson
9597ea1183
Start etcd client before ensuring self removal
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-13 23:24:16 -07:00
Brad Davidson
d885162967
Add server token hash to CR and S3
...
This required pulling the token hash stuff out of the cluster package, into util.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-12 15:04:45 -07:00
Brad Davidson
550ab36ab7
Switch to managing ETCDSnapshotFile resources
...
Reconcile snapshot CRs instead of ConfigMap; manage ConfigMap downstream from CR list
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-12 15:04:45 -07:00
Brad Davidson
5cd4f69bfa
Move snapshot delete into local/s3 functions
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-12 15:04:45 -07:00
Brad Davidson
7464007037
Store extra metadata and cluster ID for snapshots
...
Write the extra metadata both locally and to S3. These files are placed such that they will not be used by older versions of K3s that do not make use of them.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-12 15:04:45 -07:00
Brad Davidson
80f909d0ca
Move s3 snapshot list functionality to s3.go
...
Also, don't list ONLY s3 snapshots if S3 is enabled.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-12 15:04:45 -07:00
Brad Davidson
8d47645312
Consistently set snapshotFile timestamp
...
Attempt to use timestamp from creation or filename instead of file/object modification times
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-12 15:04:45 -07:00
Brad Davidson
f1afe153a3
Tidy s3 upload functions
...
Consistently refer to object keys as such, simplify error handling.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-12 15:04:45 -07:00
Brad Davidson
2b0e2e8ada
Elide old snapshot data when apiserver rejects configmap with ErrRequestEntityTooLarge
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-12 15:04:45 -07:00
Brad Davidson
676b00aa0e
Move etcd snapshot code into separate file
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-10-12 15:04:45 -07:00
Brad Davidson
8705a88bf4
Clear remove annotations on cluster reset; refuse to delete last member from cluster
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-09-25 11:54:23 -07:00
Brad Davidson
002e6c43ee
Reorganize Driver interface and etcd driver to avoid passing context and config into most calls
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-09-25 11:54:23 -07:00
Brad Davidson
890645924f
Don't export functions not needed outside the etcd package
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-09-25 11:54:23 -07:00
Brad Davidson
8c73fd670b
Disable HTTP on main etcd client port
...
Fixes performance issue under load, ref: https://github.com/etcd-io/etcd/issues/15402 and https://github.com/kubernetes/kubernetes/pull/118460
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-09-25 08:29:57 -07:00
Vitor Savian
e83b1ba4aa
Fixed the etcd retention to delete orphaned snapshots based on the date ( #8177 )
...
* Fix retention using name instead of date
Signed-off-by: Vitor <vitor.savian@suse.com>
2023-08-14 18:48:59 -03:00
Ian Cardoso
e551308db8
fix for etcd-snapshot delete with --etcd-s3 flag ( #8110 )
...
k3s etcd-snapshot save --etcd-s3 ... is creating a local snapshot and uploading it to s3 while k3s etcd-snapshot delete --etcd-s3 ... was deleting the snapshot only on s3 buckets, this commit change the behavior of delete to do it locally and on s3
Signed-off-by: Ian Cardoso <osodracnai@gmail.com>
2023-08-04 14:26:32 -03:00
Vitor Savian
ca7aeed090
Etcd snapshots retention when node name changes ( #8099 )
...
Fixed the etcd retention to delete orphaned snapshots
Signed-off-by: Vitor <vitor.savian@suse.com>
2023-08-03 10:54:40 -03:00
Brad Davidson
aa76942d0f
Add FilterCN function to prevent SAN Stuffing
...
Wire up a node watch to collect addresses of server nodes, to prevent adding unauthorized SANs to the dynamiclistener cert.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-08-02 11:15:39 -07:00
Brad Davidson
e61fde93c1
Fix MemberList error handling and incorrect etcd-arg passthrough
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-04-28 22:04:30 -07:00
Brad Davidson
91afb38799
Retry cluster join on "too many learners" error
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-04-28 11:28:33 -07:00
Brad Davidson
d95980bba3
Lock bootstrap data with empty key to prevent conflicts
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-04-05 10:56:57 -07:00
Brad Davidson
b010db0cff
Ensure that loopback is used for the advertised address when resetting
...
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-04-03 17:01:43 -07:00
Brad Davidson
0c302f4341
Fix etcd member deletion
...
Turns out etcd-only nodes were never running **any** of the controllers,
so allowing multiple controllers didn't really fix things.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-02-14 09:39:41 -08:00
Brad Davidson
3d146d2f1b
Allow for multiple sets of leader-elected controllers
...
Addresses an issue where etcd controllers did not run on etcd-only nodes
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-02-10 10:46:48 -08:00
Brad Davidson
a298bfdb18
Add jitter to scheduled snapshots and retry harder on conflicts
...
Also ensure that the snapshot job does not attempt to trigger multiple concurrent runs, as this is not supported.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-01-11 14:32:03 -08:00
Derek Nola
06d81cb936
Replace deprecated ioutil package ( #6230 )
...
* Replace ioutil package
* check integration test null pointer
* Remove rotate retries
Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-10-07 17:36:57 -07:00
Derek Nola
4c0bc8c046
Update etcd error to match correct url ( #5909 )
...
Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-07-29 09:40:53 -07:00
Brad Davidson
5eaa0a9422
Replace getLocalhostIP with Loopback helper method
...
Requires tweaking existing method signature to allow specifying whether or not IPv6 addresses should be return URL-safe.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-21 16:51:57 -07:00