* Bump etcd to v3.5.4-k3s1
* Fix issue with datastore corruption on cluster-reset
* Disable unnecessary components during cluster reset
Disable control-plane components and the tunnel setup during
cluster-reset, even when not doing a restore. This reduces the amount of
log clutter during cluster reset/restore, making any errors encountered
more obvious.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Since we now start the server's agent sooner and in the background, we
may need to wait longer than 30 seconds for the apiserver to become
ready on downstream projects such as RKE2.
Since this essentially just serves as an analogue for the server's
apiReady channel, there's little danger in setting it to something
relatively high.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Updated the logic to handle if extra args are passed with existing hyphens in the arg. The test was updated to add the additional case of having pre-existing hyphens. The method name was also refactored based on previous feedback.
If the port wanted by the client load balancer is in TIME_WAIT, startup
will fail. Set SO_REUSEPORT so that it can be listened on again
immediately.
The configurable Listen call wants a context, so plumb that through as
well.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Always use static ports for the load-balancers
This fixes an issue where RKE2 kube-proxy daemonset pods were failing to
communicate with the apiserver when RKE2 was restarted because the
load-balancer used a different port every time it started up.
This also changes the apiserver load-balancer port to be 1 below the
supervisor port instead of 1 above it. This makes the apiserver port
consistent at 6443 across servers and agents on RKE2.
Additional fixes below were required to successfully test and use this change
on etcd-only nodes.
* Actually add lb-server-port flag to CLI
* Fix nil pointer when starting server with --disable-etcd but no --server
* Don't try to use full URI as initial load-balancer endpoint
* Fix etcd load-balancer pool updates
* Update dynamiclistener to fix cert updates on etcd-only nodes
* Handle recursive initial server URL in load balancer
* Don't run the deploy controller on etcd-only nodes
It is possible that the apiserver may serve read requests but not allow
writes yet, in which case flannel will crash on startup when trying to
configure the subnet manager.
Fix this by waiting for the apiserver to become fully ready before
starting flannel and the network policy controller.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Related to rancher/rke2#474
Note that anyone who customizes the data-dir path will have to set
CRI_CONFIG_FILE to the correct path when using the wrapped binaries
(crictl, etc). This is better than dropping files in the incorrect
location.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
This attempts to update logging statements to make them consistent
through out the code base. It also adds additional context to messages
where possible, simplifies messages, and updates level where necessary.
In k3s today the kubernetes API and the /v1-k3s API are combined into
one http server. In rke2 we are running unmodified, non-embedded Kubernetes
and as such it is preferred to run k8s and the /v1-k3s API on different
ports. The /v1-k3s API port is called the SupervisorPort in the code.
To support this separation of ports a new shim was added on the client in
then pkg/agent/proxy package that will launch two load balancers instead
of just one load balancer. One load balancer for 6443 and the other
for 9345 (which is the supervisor port).
Values passed in via the server/agent `--node-label` flag are treated as mutable. They are passed through to the kubelet just as before but after the kubelet comes up they are applied again. This allows for passing labels a k3s start-time that may be necessary for scheduling but may change from boot to boot, e.g. `k3os.io/version` after an upgrade.
Tested locallon on my amd64 workstation with the docker container.
Addresses #1119.