Now rootless mode can be used with cgroup v2 resource limitations.
A pod is executed in a cgroup like "/user.slice/user-1001.slice/user@1001.service/k3s-rootless.service/kubepods/podd0eb6921-c81a-4214-b36c-d3b9bb212fac/63b5a253a1fd4627da16bfce9bec58d72144cf30fe833e0ca9a6d60ebf837475".
This is accomplished by running `kubelet` in a cgroup namespace, and enabling `cgroupfs` driver for the cgroup hierarchy delegated by systemd.
To enable cgroup v2 resource limitation, `k3s server --rootless` needs to be launched as `systemctl --user` service.
Please see the comment lines in `k3s-rootless.service` for the usage.
Running `k3s server --rootless` via a terminal is not supported.
When it really needs to be launched via a terminal, `systemd-run --user -p Delegate --tty` needs to be prepended to create a systemd scope.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Support repository regex rewrite rules when fetching image content.
Example configuration:
```yaml
# /etc/rancher/k3s/registries.yaml
mirrors:
"docker.io":
endpoint:
- "https://registry-1.docker.io/v2"
rewrite:
"^library/alpine$": "my-org/alpine"
```
This will instruct k3s containerd to fetch content for `alpine` images
from `docker.io/my-org/alpine` instead of the default
`docker.io/library/alpine` locations.
Signed-off-by: Jacob Blain Christen <jacob@rancher.com>
get() is called in a loop until client configuration is successfully
retrieved. Each iteration will try to configure the apiserver proxy,
which will in turn create a new load balancer. Skip creating a new
load balancer if we already have one.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
If the port wanted by the client load balancer is in TIME_WAIT, startup
will fail. Set SO_REUSEPORT so that it can be listened on again
immediately.
The configurable Listen call wants a context, so plumb that through as
well.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Always use static ports for the load-balancers
This fixes an issue where RKE2 kube-proxy daemonset pods were failing to
communicate with the apiserver when RKE2 was restarted because the
load-balancer used a different port every time it started up.
This also changes the apiserver load-balancer port to be 1 below the
supervisor port instead of 1 above it. This makes the apiserver port
consistent at 6443 across servers and agents on RKE2.
Additional fixes below were required to successfully test and use this change
on etcd-only nodes.
* Actually add lb-server-port flag to CLI
* Fix nil pointer when starting server with --disable-etcd but no --server
* Don't try to use full URI as initial load-balancer endpoint
* Fix etcd load-balancer pool updates
* Update dynamiclistener to fix cert updates on etcd-only nodes
* Handle recursive initial server URL in load balancer
* Don't run the deploy controller on etcd-only nodes
* Add functionality for etcd snapshot/restore to and from S3 compatible backends.
* Update etcd restore functionality to extract and write certificates and configs from snapshot.
Servers should always be upgraded before agents, but generally this
isn't required because things are compatible between versions. In this
case we're OK with failing closed if the user upgrades out of order, but
we should give a clearer message about what steps are required to fix
the issue.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
It is possible that the apiserver may serve read requests but not allow
writes yet, in which case flannel will crash on startup when trying to
configure the subnet manager.
Fix this by waiting for the apiserver to become fully ready before
starting flannel and the network policy controller.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Adds support for retagging images to appear to have been sourced from
one or more additional registries as they are imported from the tarball.
This is intended to support RKE2 use cases with system-default-registry
where the images need to appear to have been pulled from a registry
other than docker.io.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Problem:
While using ZFS on debian and K3s with docker, I am unable to get k3s working as the snapshotter value is being validated and the validation fails.
Solution:
We should not validate snapshotter value if we are using docker as it's a no-op in that case.
Signed-off-by: Waqar Ahmed <waqarahmedjoyia@live.com>
Removing the cfg.DataDir mutation in 3e4fd7b did not break anything, but
did change some paths in unwanted ways. Rather than mutating the
user-supplied command-line flags, explicitly specify the agent
subdirectory as needed.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Related to rancher/rke2#474
Note that anyone who customizes the data-dir path will have to set
CRI_CONFIG_FILE to the correct path when using the wrapped binaries
(crictl, etc). This is better than dropping files in the incorrect
location.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Related to #2455 and containerd/containerd#4684
These were not meant to be enabled by default, break images with many
layers, and will be disabled by default on the next containerd release.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
This attempts to update logging statements to make them consistent
through out the code base. It also adds additional context to messages
where possible, simplifies messages, and updates level where necessary.
Since we're replacing the k3s rolebindings.yaml in rke2, we should allow
renaming this so that we can use the white-labeled name downstream.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Add support for compressed images when pre-loading images
Signed-off-by: Frederick F. Kautz IV <fkautz@alumni.cmu.edu>
* attempting to fix vendor source being dirty
Signed-off-by: Frederick F. Kautz IV <fkautz@alumni.cmu.edu>
* fixing file extension for .tar.lz4
Signed-off-by: Frederick F. Kautz IV <fkautz@alumni.cmu.edu>
* cli: add --selinux flag to agent/server sub-cmds
Introduces --selinux flag to affirmatively enable SELinux in containerd.
Deprecates --disable-selinux flag which now defaults to true which
auto-detection of SELinux configuration for containerd is no longer
supported. Specifying both --selinux and --disable-selinux will result
in an error message encouraging you to pick a side.
* Update pkg/agent/containerd/containerd.go
update log warning message about enabled selinux host but disabled runtime
Co-authored-by: Brad Davidson <brad@oatmail.org>
Signed-off-by: Jacob Blain Christen <jacob@rancher.com>
Remove $NOTIFY_SOCKET, if present, from env when invoking containerd to
prevent gratuitous notifications sent to systemd.
Signed-off-by: Jacob Blain Christen <jacob@rancher.com>
In k3s today the kubernetes API and the /v1-k3s API are combined into
one http server. In rke2 we are running unmodified, non-embedded Kubernetes
and as such it is preferred to run k8s and the /v1-k3s API on different
ports. The /v1-k3s API port is called the SupervisorPort in the code.
To support this separation of ports a new shim was added on the client in
then pkg/agent/proxy package that will launch two load balancers instead
of just one load balancer. One load balancer for 6443 and the other
for 9345 (which is the supervisor port).
If the ip_set kernel module is not available we should warn
that the network policy controller can not start rather than
cause a fatal error.
Also adds module probing and config checks for ip_set.
This reverts commit e712cdf7e8, reversing
changes made to d5929bc8c8.
Wireguard docs fail to describe that persistent-keepalive is only valid
when peer is set.
The Linux kernel is inconsistent about how devconf is configured for new
network namespaces between ipv4 and ipv6. The behavior can also be
controlled via net.core.devconf_inherit_init_net in Linux 5.1+ so make
sure to enable forwarding on all and default for both ipv6 and ipv4.
This issue first came up testing on a yocto kernel that had this patch:
ipv4: net namespace does not inherit network configurations
[0] https://www.kernel.org/doc/html/latest/admin-guide/sysctl/net.html#devconf-inherit-init-net
[1] https://lkml.org/lkml/2014/7/29/119
Signed-off-by: Brennan Ashton <brennana@jfrog.com>
Values passed in via the server/agent `--node-label` flag are treated as mutable. They are passed through to the kubelet just as before but after the kubelet comes up they are applied again. This allows for passing labels a k3s start-time that may be necessary for scheduling but may change from boot to boot, e.g. `k3os.io/version` after an upgrade.
Tested locallon on my amd64 workstation with the docker container.
Addresses #1119.
Since generated cert/keys are stored locally, each server has a different
copy. In a HA setup we need to ensure we download the cert and key from
the same server so we combined HTTP requests to do that.
Allow the kubelet resolv-conf flag to be set, or automatically
discovered from /etc/resolv.conf & /run/systemd/resolve/resolv.conf if
no loopback devices are present, or create our own which points to
nameserver 8.8.8.8