Requires tweaking existing method signature to allow specifying whether or not IPv6 addresses should be return URL-safe.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Use INVOCATION_ID to detect execution under systemd, since as of a9b5a1933f NOTIFY_SOCKET is now cleared by the server code.
* Set the unit type to notify by default for both server and agent, which is what Rancher-managed installs have done for a while.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Addressess issue where the compact may take more than 10 seconds on slower disks. These disks probably aren't really suitable for etcd, but apparently run fine otherwise.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Increase the default snapshot timeout. The timeout is not currently
configurable from Rancher, and larger clusters are frequently seeing
uploads fail at 30 seconds.
* Enable compression for scheduled snapshots if enabled on the
command-line. The CLI flag was not being passed into the etcd config.
* Only set the S3 content-type to application/zip if the file is zipped.
* Don't run more than one snapshot at once, to prevent misconfigured
etcd snapshot cron schedules from stacking up.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
87e1806697 removed the OwnerReferences
field from the DaemonSet, which makes sense since the Service may now be
in a different namespace than the DaemonSet and cross-namespace owner
references are not supported. Unfortunately, we were relying on
garbage collection to delete the DameonSet, so this started leaving
orphaned DaemonSets when Services were deleted.
We don't want to add an a Service OnRemove handler, since this will add
finalizers to all Services, not just LoadBalancers services, causing
conformance tests to fail. Instead, manage our own finalizers, and
restore the DaemonSet removal Event that was removed by the same commit.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Since #4438 removed 2-way sync and treats any changed+newer files on disk as an error, we no longer need to determine if files are newer on disk/db or if there is a conflicting mix of both. Any changed+newer file is an error, unless we're doing a cluster reset in which case everything is unconditionally replaced.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Properly skip restoring bootstrap data for files that don't have a path
set because the feature that would set it isn't enabled.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
cadvisor still doesn't pull stats via CRI yet, so we have to continue to use the deprecated arg.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Use same kubelet-preferred-address-types setting as RKE2 to improve reliability of the egress selector when using a HTTP proxy. Also, use BindAddressOrLoopback to ensure that the correct supervisor address is used when --bind-address is set.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
If the user points S3 backups at a bucket containing other files, those
file names may not be valid configmap keys.
For example, RKE1 generates backup files with names like
`s3-c-zrjnb-rs-6hxpk_2022-05-05T12:05:15Z.zip`; the semicolons in the
timestamp portion of the name are not allowed for use in configmap keys.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Move startup hooks wg into a runtime pointer, check before notifying systemd
* Switch default systemd notification to server
* Add 1 sec delay to allow etcd to write to disk
Signed-off-by: Derek Nola <derek.nola@suse.com>
This parameter controls which namespace the klipper-lb pods will be create.
It defaults to kube-system so that k3s does not by default create a new
namespace. It can be changed if users wish to isolate the pods and apply
some policy to them.
Signed-off-by: Darren Shepherd <darren@acorn.io>
The baseline PodSecurity profile will reject klipper-lb pods from running.
Since klipper-lb pods are put in the same namespace as the Service this
means users can not use PodSecurity baseline profile in combination with
the k3s servicelb.
The solution is to move all klipper-lb pods to a klipper-lb-system where
the security policy of the klipper-lb pods can be different an uniformly
managed.
Signed-off-by: Darren Shepherd <darren@acorn.io>
The control-plane context handles requests outside the cluster and
should not be sent to the proxy.
In agent mode, we don't watch pods and just direct-dial any request for
a non-node address, which is the original behavior.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Watching pods appears to be the most reliable way to ensure that the
proxy routes and authorizes connections.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Allow the flannel backend to be specified as
backend=option=val,option2=val2 to select a given backend with extra options.
In particular this adds the following options to wireguard-native
backend:
* Mode - flannel wireguard tunnel mode
* PersistentKeepaliveInterval- wireguard persistent keepalive interval
Signed-off-by: Sjoerd Simons <sjoerd@collabora.com>
This reverts commit aa9065749c.
Setting dual-stack node-ip does not work when --cloud-provider is set
to anything, including 'external'. Just set node-ip to the first IP, and
let the cloud provider add the other address.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
The cloud-provider arg is deprecated and cannot be set to anything other than external, but must still be used or node addresses are not set properly.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Remove objects when removed from manifests
If a user puts a file in /var/lib/rancher/k3s/server/manifests/ then the
objects contained therein are deployed to the cluster. If the objects
are removed from that file, they are not removed from the cluster.
This change tracks the GVKs in the files and will remove objects when
there are removed from the cluster.
Signed-off-by: Donnie Adams <donnie.adams@suse.com>
Also update all use of 'go get' => 'go install', update CI tooling for
1.18 compatibility, and gofmt everything so lint passes.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Reduces code complexity a bit and ensures we don't have to handle closed watch channels on our own
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Before this change, kube-router was always assuming that IPv4 is
enabled, which is not the case in IPv6-only clusters. To enable network
policies in IPv6-only, we need to explicitly let kube-router know when
to disable IPv4.
Signed-off-by: Michal Rostecki <vadorovsky@gmail.com>
This gives nicer errors from Kubernetes components during startup, and
reduces LOC a bit by using the upstream responsewriters module instead
of writing the headers and body by hand.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Bump etcd to v3.5.4-k3s1
* Fix issue with datastore corruption on cluster-reset
* Disable unnecessary components during cluster reset
Disable control-plane components and the tunnel setup during
cluster-reset, even when not doing a restore. This reduces the amount of
log clutter during cluster reset/restore, making any errors encountered
more obvious.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Update docs to include s390x arch
Signed-off-by: Venkata Krishna Rohit Sakala <rohitsakala@gmail.com>
* Add s390x drone pipeline
Signed-off-by: Venkata Krishna Rohit Sakala <rohitsakala@gmail.com>
* Install trivy linux arch only for amd64
This is done so that trivy is not installed for s390x arch
Signed-off-by: Venkata Krishna Rohit Sakala <rohitsakala@gmail.com>
* Add s390x arch if condition for Dockerfile.test
Signed-off-by: Venkata Krishna Rohit Sakala <rohitsakala@gmail.com>
* Add s390x arch in install script
Signed-off-by: Venkata Krishna Rohit Sakala <rohitsakala@gmail.com>
* Add s390x GOARCH in build script
Signed-off-by: Venkata Krishna Rohit Sakala <rohitsakala@gmail.com>
* Add SUFFIX s390x in scripts
Signed-off-by: Venkata Krishna Rohit Sakala <rohitsakala@gmail.com>
* Skip image scan for s390x arch
Signed-off-by: Venkata Krishna Rohit Sakala <rohitsakala@gmail.com>
* Update klipper-lb to version v0.3.5
Signed-off-by: Venkata Krishna Rohit Sakala <rohitsakala@gmail.com>
* Update traefik version to v2.6.2
Signed-off-by: Venkata Krishna Rohit Sakala <rohitsakala@gmail.com>
* Update registry to v2.8.1 in tests which supports s390x
Signed-off-by: Venkata Krishna Rohit Sakala <rohitsakala@gmail.com>
* Skip compact tests for s390x arch
This is done because compact test require a previous k3s version which supports s390x and it is not available
Signed-off-by: Venkata Krishna Rohit Sakala <rohitsakala@gmail.com>
Problem:
Specifying extra arguments for the API server for example is not supported as
the arguments get stored in a map before being passed to the API server.
Solution:
Updated the GetArgs function to store the arguments in a map that can have
multiple values. Some more logic is added so that repeated extra arguments
retain their order when sorted whilst overall the arguments can still be
sorted for improved readability when logged.
Support has been added for prefixing and suffixing default argument values
by using -= and += when specifying extra arguments.
Signed-off-by: Terry Cain <terry@terrys-home.co.uk>
This requires a further set of gofmt -s improvements to the
code, but nothing major. golangci-lint 1.45.2 brings golang 1.18
support which might be needed in the future.
Signed-off-by: Dirk Müller <dirk@dmllr.de>
This controller only needs to run when using managed etcd, so move it in
with the rest of the etcd stuff. This change also modifies the
controller to only watch the Kubernetes service endpoint, instead of
watching all endpoints in the entire cluster.
Fixes an error message revealed by use of a newer grpc client in
Kubernetes 1.24, which logs an error when the Put to etcd failed because
kine doesn't support the etcd Put operation. The controller shouldn't
have been running without etcd in the first place.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Don't attempt to retrieve snapshot metadata configmap if the apiserver
isn't available. This could be triggered if the cron expression caused a
snapshot to be triggered before the apiserver is up.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
This is required to make the websocket tunnel server functional on
etcd-only nodes, and will save some code on the RKE2 side once pulled
through.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
This change allows to define two cluster CIDRs for compatibility with
Kubernetes dual-stuck, with an assumption that two CIDRs are usually
IPv4 and IPv6.
It does that by levearaging changes in out kube-router fork, with the
following downstream release:
https://github.com/k3s-io/kube-router/releases/tag/v1.3.2%2Bk3s
Signed-off-by: Michal Rostecki <vadorovsky@gmail.com>
Ideally we'd have fully fleshed out support for it (i.e. #5011), but
that's a potentially breaking change and taking a little while to merge.
This is a much simpler change which won't break anything, but will allow
a "Type": "wireguard" reference in the "--flannel-conf" custom config
file to work.
Signed-off-by: Euan Kemp <euank@euank.com>
Improve feedback when running secrets-encrypt commands on etcd-only nodes, and
allow etcd-only nodes to properly restart when effecting rotation.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Don't hardcode the event namespace when creating event recorders; some controllers want to create events in other namespaces.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Closing idle connections isn't guaranteed to close out a pooled connection to a
loadbalancer endpoint that has been removed. Instead, ensure that requests used
to wait for the apiserver to become ready aren't reused.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
This allows secondary etcd nodes to bootstrap the kubelet before an
apiserver joins the cluster. Rancher waits for all the etcd nodes to
come up before adding the control-plane nodes, so this needs to be
handled properly.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Removed vagrant folder
* Fix comments around E2E ENVs
* Eliminate testutil folder
* Convert flock integration test to unit test
* Point to other READMEs
Signed-off-by: Derek Nola <derek.nola@suse.com>
Fixes issue with secrets-encrypt rotate not having any etcd endpoints
available on nodes without a local etcd server.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
adds a new optional node label
"svccontroller.k3s.cattle.io/lbpool=<pool>" that can be set on nodes.
ServiceType: LoadBalancer services can then specify a matching label,
which will schedule the DaemonSet only on specified nodes. This allows
operators to specify different pools of nodes that can serve different
LoadBalancer services on the same ports.
Signed-off-by: robertlestak <robert.lestak@umusic.com>
Reuse the existing etcd library code to start up the temporary etcd
server for bootstrap reconcile. This allows us to do proper
health-checking of the datastore on startup, including handling of
alarms.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Several types contained redundant references to ControlRuntime data. Switch to consistently accessing this via config.Runtime instead.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Upgrade and convert ginkgo from v1 to v2
* Move all integration tests into integration folder
* Update TESTING.md
Signed-off-by: Derek Nola <derek.nola@suse.com>
Before this change, we were copying a part of kube-router code to
pkg/agent/netpol directory with modifications, from which the biggest
one was consumption of k3s node config instead of kube-router config.
However, that approach made it hard to follow new upstream versions.
It's possible to use kube-router as a library, so it seems like a better
way to do that.
Instead of modifying kube-router network policy controller to comsume
k3s configuration, this change just converts k3s node config into
kube-router config. All the functionality of kube-router except netpol
is still disabled.
Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
Signed-off-by: Manuel Buil <mbuil@suse.com>
* Remove sudo commands from integration tests
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Added cleanup fucntion
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Implement better int cleanup
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Rename test utils
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Enable K3sCmd to be a single string
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Removed parsePod function
Signed-off-by: Derek Nola <derek.nola@suse.com>
* codespell
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Revert startup timeout
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Reorder sonobuoy tests, drop concurrent tests to 3
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Disable etcd
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Skip parallel testing for etcd
Signed-off-by: Derek Nola <derek.nola@suse.com>
If you don't explicitly close the etcd client when you're done with it,
the GRPC connection hangs around in the background. Normally this is
harmelss, but in the case of the temporary etcd we start up on 2399 to
reconcile bootstrap data, the client will start logging errors
afterwards when the server goes away.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Enable reconcilation on secondary servers
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Remove unused code
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Attempt to reconcile with datastore first
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Added warning on failure
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Update warning
Signed-off-by: Derek Nola <derek.nola@suse.com>
* golangci-lint fix
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Regular CLI framework for encrypt commands
* New secrets-encryption feature
* New integration test
* fixes for flaky integration test CI
* Fix to bootstrap on restart of existing nodes
* Consolidate event recorder
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Enabled skipping of unkown flags from config in parser
* Added new unit test, expanded existing
* Add warning back in, without value
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Add etcd extra args support for K3s
Signed-off-by: Chris Kim <oats87g@gmail.com>
* Add etcd custom argument integration test
Signed-off-by: Chris Kim <oats87g@gmail.com>
* go generate
Signed-off-by: Chris Kim <oats87g@gmail.com>
Problem:
Before, to customize CoreDNS, one had to edit the default configmap,
which gets re-written on every K3s server restart.
Solution:
Mount an additional coredns-custom configmap into the CoreDNS container
and import overrides and additional server blocks from the included
files.
Signed-off-by: Thorsten Klein <iwilltry42@gmail.com>
Since we now start the server's agent sooner and in the background, we
may need to wait longer than 30 seconds for the apiserver to become
ready on downstream projects such as RKE2.
Since this essentially just serves as an analogue for the server's
apiReady channel, there's little danger in setting it to something
relatively high.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Using MAINPID breaks systemd's exit detection, as it stops watching the
original pid, but is unable to watch the new pid as it is not a child
of systemd itself. The best we can do is just notify when execing the child
process.
We also need to consolidate forking into a sigle place so that we don't
end up with multiple levels of child processes if both redirecting log
output and reaping child processes.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Make sure there are no duplicates in etcd member list
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* fix node names with hyphens
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* use full server name for etcd node name
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
Works around issue with Job controller not tracking job pods that
are in CrashloopBackoff during upgrade from 1.21 to 1.22.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Update the default containerd config template with support for adding extra container runtimes. Add logic to discover nvidia container runtimes installed via the the gpu operator or package manager.
Signed-off-by: Joe Kralicky <joe.kralicky@suse.com>
Also honor node-ip when adding the node address to the SAN list, instead
of hardcoding the autodetected IP address.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Added test runner and build files
* Changes to int test to output junit results.
* Updated documentation, removed comments
Signed-off-by: dereknola <derek.nola@suse.com>
Required to support apiextensions.v1 as v1beta1 has been deleted. Also
update helm-controller and dynamiclistener to track wrangler versions.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Reset load balancer state during restoraion
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* Reset load balancer state during restoraion
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* Fix URL pruning when joining an etcd member
Problem:
Existing member clientURLs were checked if they contain the joining
node's IP. In some edge cases this would prune valid URLs when the
joining IP is a substring match of the only existing member's IP.
Because of this, it was impossible to e.g. join 10.0.0.2 to an existing
node that has an IP of 10.0.0.2X or 10.0.0.2XX:
level=fatal msg="starting kubernetes: preparing server: start managed database:
joining etcd cluster: etcdclient: no available endpoints"
Solution:
Fixed by properly parsing the URLs and comparing the IPs for equality
instead of substring match.
Signed-off-by: Malte Starostik <info@stellaware.de>
* switch image names to the ones with the prefix mirrored
* bump rancher/mirrored-coredns-coredns to 1.8.4
Signed-off-by: Jiaqi Luo <6218999+jiaqiluo@users.noreply.github.com>
Sync DisableKubeProxy from cfg into control before sending control to clients,
as it may have been modified by a startup hook.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Updated the logic to handle if extra args are passed with existing hyphens in the arg. The test was updated to add the additional case of having pre-existing hyphens. The method name was also refactored based on previous feedback.
* Commit of new etcd snapshot integration tests.
* Updated integration github action to not run on doc changes.
* Update Drone runner to only run unit tests
Signed-off-by: dereknola <derek.nola@suse.com>
Problem:
tcpproxy repository has been moved out of the github.com/google org to github.com/inetaf.
Solution:
Switch to the new repo.
FYI: https://godoc.org/inet.af/tcpproxy/
Signed-off-by: William Zhang <warmchang@outlook.com>
This changes the crictl template for issues with the socket information. It also addresses a typo in the socket address. Last it makes tweaks to configuration that aren't required or had incorrect logic.
Signed-off-by: Jamie Phillips <jamie.phillips@suse.com>
spelling
* Move cloud-controller-manager into an embedded executor
* Import K3s cloud provider and clean up imports
Signed-off-by: Chris Kim <oats87g@gmail.com>
the server's built-in helm controller.
Problem:
Testing installation and uninstallation of the Helm Controller on k3s is
not possible if the Helm Controller is baked into the k3s server.
Solution:
The Helm Controller can optionally be disabled, which will allow users
to manage its installation manually.
Signed-off-by: Joe Kralicky <joe.kralicky@suse.com>
* Changed containerd image licenses from context lifespan to permanent. Delete any existing licenses owned by k3s on server startup
Signed-off-by: dereknola <derek.nola@suse.com>
* Changed containerd image licenses from 24h to permanent. Delete any existing licenses on server startup
Signed-off-by: dereknola <derek.nola@suse.com>
strings has a specific function to replace all matches. We should use that one instead of strings.Replace(string, old, new string, -1)
Signed-off-by: Manuel Buil <mbuil@suse.com>
* Changed cloud-controller-manager user name in ccm.yaml
Signed-off-by: dereknola <derek.nola@suse.com>
* Changed RBAC name in server.go
Signed-off-by: dereknola <derek.nola@suse.com>
* Changed "k3s" string prefix to version.Program to prevent static hardcoding
Signed-off-by: dereknola <derek.nola@suse.com>
* Changed user in ccm.yaml to k3s-cloud-controller-manager
Signed-off-by: dereknola <derek.nola@suse.com>
* Move registries.yaml handling out to rancher/wharfie
* Add system-default-registry support
* Add CLI support for kubelet image credential providers
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Problem:
Only the client CA is passed to the kube-controller-manager and
therefore CSRs with the signer name "kubernetes.io/kubelet-serving" are
signed with the client CA. Serving certificates must be signed with the
server CA otherwise e.g. "kubectl logs" fails with the error message
"x509: certificate signed by unknown authority".
Solution:
Instead of providing only one CA via the kube-controller-manager
parameter "--cluster-signing-cert-file", the corresponding CA for every
signer is set with the parameters
"--cluster-signing-kube-apiserver-client-cert-file",
"--cluster-signing-kubelet-client-cert-file",
"--cluster-signing-kubelet-serving-cert-file", and
"--cluster-signing-legacy-unknown-cert-file".
Signed-off-by: Siegfried Weber <mail@siegfriedweber.net>
The kube-apiserver cert should have the same SANs in the same order,
excluding the extra user-configured SANs since this will only be used
in-cluster.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
If key ends in "+" the value of the key is appended to previous
values found. If values are string instead of a slice they are
automatically converted to a slice of one string.
Signed-off-by: Darren Shepherd <darren@rancher.com>
Configuration will be loaded from config.yaml and then config.yaml.d/*.(yaml|yml) in
alphanumeric order. The merging is done by just taking the last value of
a key found, so LIFO for keys. Slices are not merged but replaced.
Signed-off-by: Darren Shepherd <darren@rancher.com>
* Update Kubernetes to v1.21.0
* Update to golang v1.16.2
* Update dependent modules to track with upstream
* Switch to upstream flannel
* Track changes to upstream cloud-controller-manager and FeatureGates
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Remove early return preventing local retention policy to be enforced
resulting in N number of snapshots being stored.
Signed-off-by: Brian Downs <brian.downs@gmail.com>
When `/dev/kmsg` is unreadable due to sysctl value `kernel.dmesg_restrict=1`,
bind-mount `/dev/null` into `/dev/kmsg`
Fix issue 3011
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Now rootless mode can be used with cgroup v2 resource limitations.
A pod is executed in a cgroup like "/user.slice/user-1001.slice/user@1001.service/k3s-rootless.service/kubepods/podd0eb6921-c81a-4214-b36c-d3b9bb212fac/63b5a253a1fd4627da16bfce9bec58d72144cf30fe833e0ca9a6d60ebf837475".
This is accomplished by running `kubelet` in a cgroup namespace, and enabling `cgroupfs` driver for the cgroup hierarchy delegated by systemd.
To enable cgroup v2 resource limitation, `k3s server --rootless` needs to be launched as `systemctl --user` service.
Please see the comment lines in `k3s-rootless.service` for the usage.
Running `k3s server --rootless` via a terminal is not supported.
When it really needs to be launched via a terminal, `systemd-run --user -p Delegate --tty` needs to be prepended to create a systemd scope.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
* remove etcd data dir when etcd is disabled
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* fix comment
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* more fixes
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* use debug instead of info logs
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
Support repository regex rewrite rules when fetching image content.
Example configuration:
```yaml
# /etc/rancher/k3s/registries.yaml
mirrors:
"docker.io":
endpoint:
- "https://registry-1.docker.io/v2"
rewrite:
"^library/alpine$": "my-org/alpine"
```
This will instruct k3s containerd to fetch content for `alpine` images
from `docker.io/my-org/alpine` instead of the default
`docker.io/library/alpine` locations.
Signed-off-by: Jacob Blain Christen <jacob@rancher.com>
get() is called in a loop until client configuration is successfully
retrieved. Each iteration will try to configure the apiserver proxy,
which will in turn create a new load balancer. Skip creating a new
load balancer if we already have one.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
If the port wanted by the client load balancer is in TIME_WAIT, startup
will fail. Set SO_REUSEPORT so that it can be listened on again
immediately.
The configurable Listen call wants a context, so plumb that through as
well.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Always use static ports for the load-balancers
This fixes an issue where RKE2 kube-proxy daemonset pods were failing to
communicate with the apiserver when RKE2 was restarted because the
load-balancer used a different port every time it started up.
This also changes the apiserver load-balancer port to be 1 below the
supervisor port instead of 1 above it. This makes the apiserver port
consistent at 6443 across servers and agents on RKE2.
Additional fixes below were required to successfully test and use this change
on etcd-only nodes.
* Actually add lb-server-port flag to CLI
* Fix nil pointer when starting server with --disable-etcd but no --server
* Don't try to use full URI as initial load-balancer endpoint
* Fix etcd load-balancer pool updates
* Update dynamiclistener to fix cert updates on etcd-only nodes
* Handle recursive initial server URL in load balancer
* Don't run the deploy controller on etcd-only nodes
* Add functionality for etcd snapshot/restore to and from S3 compatible backends.
* Update etcd restore functionality to extract and write certificates and configs from snapshot.
Servers should always be upgraded before agents, but generally this
isn't required because things are compatible between versions. In this
case we're OK with failing closed if the user upgrades out of order, but
we should give a clearer message about what steps are required to fix
the issue.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
K3s upgrade via watch over file change of static file and manifest
and triggers helm-controller for change. It seems reasonable to
only allow upgrade traefik v1->v2 when there is no existing custom
traefik HelmChartConfig in the cluster to avoid any
incompatibility.
Here also separate the CRDs and put them into a different chart
to support CRD upgrade.
Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
It is possible that the apiserver may serve read requests but not allow
writes yet, in which case flannel will crash on startup when trying to
configure the subnet manager.
Fix this by waiting for the apiserver to become fully ready before
starting flannel and the network policy controller.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Adds support for retagging images to appear to have been sourced from
one or more additional registries as they are imported from the tarball.
This is intended to support RKE2 use cases with system-default-registry
where the images need to appear to have been pulled from a registry
other than docker.io.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Collect IPs from all pods before deciding to use internal or external addresses
@Taloth correctly noted that the code that iterates over ServiceLB pods
to collect IP addresses was failing to add additional internal IPs once
the map contained ANY entry from a previous node. This may date back to
when ServiceLB used a Deployment instead of a DaemonSet, so there was
only ever a single pod.
The new behavior is to collect all internal and external IPs, and then
construct the address list of a single type - external if there are any,
otherwise internal.
https://github.com/k3s-io/k3s/issues/1652#issuecomment-774497788
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Co-authored-by: Brian Downs <brian.downs@gmail.com>
* Add tests to clientaccess/token
* Fix issues in clientaccess/token identified by tests
* Update tests to close coverage gaps
* Remove redundant check turned up by code coverage reports
* Add warnings if CA hash will not be validated
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Fix issue 900
cgroup2 support was introduced in PR 2584, but got broken in f3de60ff31
It was failing with "F1210 19:13:37.305388 4955 server.go:181] cannot set feature gate SupportPodPidsLimit to false, feature is locked to true"
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Since a recent commit, rootless mode was failing with the following errors:
```
E0122 22:59:47.615567 21 kuberuntime_manager.go:755] createPodSandbox for pod "helm-install-traefik-wf8lc_kube-system(9de0a1b2-e2a2-4ea5-8fb6-22c9272a182f)" failed: rpc error: code = Unknown desc = failed to create network namespace for sandbox "285ab835609387f82d304bac1fefa5fb2a6c49a542a9921995d0c35d33c683d5": failed to setup netns: open /var/run/netns/cni-c628a228-651e-e03e-d27d-bb5e87281846: permission denied
...
E0122 23:31:34.027814 21 pod_workers.go:191] Error syncing pod 1a77d21f-ff3d-4475-9749-224229ddc31a ("coredns-854c77959c-w4d7g_kube-system(1a77d21f-ff3d-4475-9749-224229ddc31a)"), skipping: failed to "CreatePodSandbox" for "coredns-854c77959c-w4d7g_kube-system(1a77d21f-ff3d-4475-9749-224229ddc31a)" with CreatePodSandboxError: "CreatePodSandbox for pod \"coredns-854c77959c-w4d7g_kube-system(1a77d21f-ff3d-4475-9749-224229ddc31a)\" failed: rpc error: code = Unknown desc = failed to create containerd task: io.containerd.runc.v2: create new shim socket: listen unix /run/containerd/s/8f0e40e11a69738407f1ebaf31ced3f08c29bb62022058813314fb004f93c422: bind: permission denied\n: exit status 1: unknown"
```
Remove symlinks to /run/{netns,containerd} so that rootless mode can create their own /run/{netns,containerd}.
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
* Replace k3s cloud provider wrangler controller with core node informer
Upstream k8s has exposed an interface for cloud providers to access the
cloud controller manager's node cache and shared informer since
Kubernetes 1.9. This is used by all the other in-tree cloud providers;
we should use it too instead of running a dedicated wrangler controller.
Doing so also appears to fix an intermittent issue with the uninitialized
taint not getting cleared on nodes in CI.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Problem:
While using ZFS on debian and K3s with docker, I am unable to get k3s working as the snapshotter value is being validated and the validation fails.
Solution:
We should not validate snapshotter value if we are using docker as it's a no-op in that case.
Signed-off-by: Waqar Ahmed <waqarahmedjoyia@live.com>
We're not setting ``--service-account-issuer` to a https URL, which causes an
error message at startup when the feature gate is enabled. From the
docs on that flag:
> If this option is not a valid URI per the OpenID Discovery 1.0 spec, the
> ServiceAccountIssuerDiscovery feature will remain disabled, even if the
> feature gate is set to true. It is highly recommended that this value
> comply with the OpenID spec:
> https://openid.net/specs/openid-connect-discovery-1_0.html. In practice,
> this means that service-account-issuer must be an https URL. It is also
> highly recommended that this URL be capable of serving OpenID discovery
> documents at {service-account-issuer}/.well-known/openid-configuration.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Solution: Set priorityClassName to system-node-critical of traefik, metrics-server, local storage and coredns deployment
Signed-off-by: transhapHigsn <fet.prashantsingh@gmail.com>
Ubuntu and Debian kernels support mounting real overlayfs inside userns,
but the vanilla kernel still does not allow it.
OTOH fuse-overlayfs can be mounted inside userns with the vanilla kernel (>= 4.18).
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Removing the cfg.DataDir mutation in 3e4fd7b did not break anything, but
did change some paths in unwanted ways. Rather than mutating the
user-supplied command-line flags, explicitly specify the agent
subdirectory as needed.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
As per documentation, the cloud-provider flag should not be passed to
controller-manager when using cloud-controller. However, the legacy
cloud-related controllers still need to be explicitly disabled to
prevent errors from being logged.
Fixing this also prevents controller-manager from creating the
cloud-controller-manager service account that needed extra RBAC.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Related to rancher/rke2#474
Note that anyone who customizes the data-dir path will have to set
CRI_CONFIG_FILE to the correct path when using the wrapped binaries
(crictl, etc). This is better than dropping files in the incorrect
location.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Resolves warning 2 from #2471.
As per https://github.com/kubernetes/cloud-provider/issues/12 the
ClusterID requirement was never really followed through on, so the
flag is probably going to be removed in the future.
One side-effect of this is that the core k8s cloud-controller-manager
also wants to watch nodes, and needs RBAC to do so.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Related to #2455 and containerd/containerd#4684
These were not meant to be enabled by default, break images with many
layers, and will be disabled by default on the next containerd release.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
The --disable/--no-deploy flags actually turn off some built-in
controllers, in addition to preventing manifests from getting loaded.
Make it clear which controllers can still be disabled even when the
packaged components are ommited by the no_stage build tag.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* skip node delete from removed member
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* use grpc errors
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* go imports
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* exit if node is the etcd that being removed
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* Set etcd timeouts using values from k8s instead of etcdctl
Fix for one of the warnings from #2303
* Use etcd zap logger instead of deprecated capsnlog
Fix for one of the warnings from #2303
* Remove member self-promotion code paths
* Add learner promotion tracking code
* Fix RaftAppliedIndex progress check
* Remove ErrGRPCKeyNotFound check
This is not used by v3 API - it just returns a response with 0 KVs.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
The default http client does not have an overall request timeout, so
connections to misbehaving or unavailable servers can stall for an
excessive amount of time. At the moment, just attempting to join
an unavailable cluster takes 2 minutes and 40 seconds to timeout.
Resolve that by setting a reasonable request timeout.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
According to @galal-hussein this is dead code that was probably brought
over from Kine. I certainly couldn't figure out what it is supposed to
be doing.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
We should ignore --token and --server if the managed database is initialized,
just like we ignore --cluster-init. If the user wants to join a new
cluster, or rejoin a cluster after --cluster-reset, they need to delete
the database. This a cleaner way to prevent deadlocking on quorum loss,
and removes the requirement that the target of the --server argument
must be online before already joined nodes can start.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
This attempts to update logging statements to make them consistent
through out the code base. It also adds additional context to messages
where possible, simplifies messages, and updates level where necessary.
Since we're replacing the k3s rolebindings.yaml in rke2, we should allow
renaming this so that we can use the white-labeled name downstream.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>