* Disable helm CRD installation for disable-helm-controller
The NewContext package requires config as input which would
require all third-party callers to update when the new go module
is published.
This change only affects the behaviour of installation of helm
CRDs. Existing helm crds installed in a cluster would not be removed
when disable-helm-controller flag is set on the server.
Addresses #8701
* address review comments
* remove redundant check
Signed-off-by: Harsimran Singh Maan <maan.harry@gmail.com>
* Tweaked order of ingress IPs in ServiceLB
Previously, ingress IPs were only string-sorted when returned
Sorted by IP family and string-sorted in each family as part of
filterByIPFamily method
* Update pkg/cloudprovider/servicelb.go
* Formatting
Signed-off-by: Jason Costello <jason@hazy.com>
Co-authored-by: Brad Davidson <brad@oatmail.org>
Omit snapshot list configmap entries for snapshots without extra metadata; reduce log level of warnings about missing s3 metadata files.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Problem:
Configuring qos-class features in containerd requres a custom containerd configuration template.
Solution:
Look for configuration files in default locations and configure containerd to use them if they exist.
Signed-off-by: Oliver Larsson <larsson.e.oliver@gmail.com>
Create a generic helper function that finds extra containerd runtimes.
The code was originally inside of the nvidia container discovery file.
Signed-off-by: Flavio Castelli <fcastelli@suse.com>
Discover the containerd shims based on runwasi that are already
available on the node.
The runtimes could have been installed either by a package manager or by
the kwasm operator.
Signed-off-by: Flavio Castelli <fcastelli@suse.com>
The containerd configuration on a Linux system now handles the nvidia
and the WebAssembly runtimes.
Signed-off-by: Flavio Castelli <fcastelli@suse.com>
---------
Signed-off-by: Flavio Castelli <fcastelli@suse.com>
These fields are only necessary when saving snapshots to S3, and will block restoration if attempted
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Silences error message from lasso - this is a normal startup condition
when no snapshots exist so we shouldn't log nasty looking errors.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Removing this in 002e6c43ee regressed
control-plane-only nodes, as we rely on the etcd client to update its
endpoint list internally so that we can use it to sync the load-balancer
address list.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Enable the feature-gate for both kubelet and cloud-controller-manager. Enabling it on only one side breaks RKE2, where feature-gates are not shared due to running in different processes.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* initial windows port.
Signed-off-by: Sean Yen <seanyen@microsoft.com>
Signed-off-by: Derek Nola <derek.nola@suse.com>
Co-authored-by: Derek Nola <derek.nola@suse.com>
Co-authored-by: Wei Ran <weiran@microsoft.com>
Write the extra metadata both locally and to S3. These files are placed such that they will not be used by older versions of K3s that do not make use of them.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Consolidate NewCertCommands
* Add support for user defined new token
* Add E2E testlets
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Ensure agent token also changes
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Add --image-service-endpoint flag
Problem:
External container runtime can be set but image service endpoint is unchanged
and also is not exposed as a flag. This is useful for using containerd
snapshotters outside of the ones that have built-in support like
stargz-snapshotter.
Solution:
Add a flag --image-service-endpoint and also default image service endpoint to
container runtime endpoint if set.
Signed-off-by: Edgar Lee <edgarhinshunlee@gmail.com>
* Update to v1.28.2
Signed-off-by: Johnatas <johnatasr@hotmail.com>
* Bump containerd and stargz versions
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Print message on upgrade fail
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Send Bad Gateway instead of Service Unavailable when tunnel dial fails
Works around new handling for Service Unavailable by apiserver aggregation added in kubernetes/kubernetes#119870
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Add 60 seconds to server upgrade wait to account for delays in apiserver readiness
Also change cleanup helper to ensure upgrade test doesn't pollute the
images for the rest of the tests.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
---------
Signed-off-by: Johnatas <johnatasr@hotmail.com>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Co-authored-by: Brad Davidson <brad.davidson@rancher.com>
k3s etcd-snapshot save --etcd-s3 ... is creating a local snapshot and uploading it to s3 while k3s etcd-snapshot delete --etcd-s3 ... was deleting the snapshot only on s3 buckets, this commit change the behavior of delete to do it locally and on s3
Signed-off-by: Ian Cardoso <osodracnai@gmail.com>
Wire up a node watch to collect addresses of server nodes, to prevent adding unauthorized SANs to the dynamiclistener cert.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Consolidate CopyFile function
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Copy to File, not destination folder
Signed-off-by: Derek Nola <derek.nola@suse.com>
---------
Signed-off-by: Derek Nola <derek.nola@suse.com>
Only configure enable-aggregator-routing and egress-selector-config-file
if required by egress-selector-mode.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Problem:
When support for etcd was added in 3957142, generation of certificates and keys for etcd was not gated behind use of managed etcd.
Keys are generated and distributed across servers even if managed etcd is not enabled.
Solution:
Allow generation of certificates and keys only if managed etc is enabled. Check config.DisableETCD flag.
Signed-off-by: Bartossh <lenartconsulting@gmail.com>
Need to add a cli flag for this. Also, should probably have config file loading support for the certificate commands.
Signed-off-by: leilei.zhai <leilei.zhai@qingteng.cn>
* Shortcircuit search with help and version flag
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Keep functions seperate
Signed-off-by: Derek Nola <derek.nola@suse.com>
---------
Signed-off-by: Derek Nola <derek.nola@suse.com>
Allows nodes to join the cluster during a webhook outage. This also
enhances auditability by creating Kubernetes events for the deferred
verification.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Move coverage writer into agent and server
* Add coverage report to E2E PR tests
* Add codecov upload to drone
Signed-off-by: Derek Nola <derek.nola@suse.com>
It is no way we can configure the lb image because it is a const value.
It would be better that we make it variable value and we can override
the value like the `helm-controller` job image when compiling k3s/rke2
Signed-off-by: Yuxing Deng <jxfa0043379@hotmail.com>
Only actual admin actions should use the admin kubeconfig; everything done by the supervisor/deploy/helm controllers will now use a distinct account for audit purposes.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
As per https://github.com/golang/go/issues/47001 even subtle.ConstantTimeCompare should never be used with variable-length inputs, as it will return 0 if the lengths do not match. Switch to consistently using constant-time comparisons of hashes for password checks to avoid any possible side-channel leaks that could be combined with other vectors to discover password lengths.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Also add bandwidth and firewall plugins. The bandwidth plugin is
automatically registered with the appropriate capability, but the
firewall plugin must be configured by the user if they want to use it.
Ref: https://www.cni.dev/plugins/current/meta/firewall/
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* local-storage: Fix permission
/var/lib/rancher/k3s/storage/ should be 700
/var/lib/rancher/k3s/storage/* should be 777
Fixes#2348
Signed-off-by: Boleyn Su <boleyn.su@gmail.com>
* Fix pod command field type
* Fix to int test
Signed-off-by: Derek Nola <derek.nola@suse.com>
---------
Signed-off-by: Boleyn Su <boleyn.su@gmail.com>
Signed-off-by: Derek Nola <derek.nola@suse.com>
Co-authored-by: Brad Davidson <brad@oatmail.org>
Co-authored-by: Derek Nola <derek.nola@suse.com>
* Add helper function for multiple arguments in stringslice
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Cleanup server setup with util function
Signed-off-by: Derek Nola <derek.nola@suse.com>
Several places in the code used a 5-second retry loop to wait on
Runtime.Core to be set. This caused a race condition where OnChange
handlers could be added after the Wrangler shared informers were already
started. When this happened, the handlers were never called because the
shared informers they relied upon were not started.
Fix that by requiring anything that waits on Runtime.Core to run from a
cluster controller startup hook that is guaranteed to be called before
the shared informers are started, instead of just firing it off in a
goroutine that retries until it is set.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Don't set up the agent tunnel authorizer on agentless servers, and warn when agentless servers won't have a way to reach in-cluster endpoints.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Fixes an issue where CRDs were being created without schema, allowing
resources with invalid content to be created, later stalling the
controller ListWatch event channel when the invalid resources could not
be deserialized.
This also requires moving Addon GVK tracking from a status field to
an annotation, as the GroupVersionKind type has special handling
internal to Kubernetes that prevents it from being serialized to the CRD
when schema validation is enabled.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Bump go version to 1.20.3 to match upstream
* Bump cri-dockerd
* Bump golanci-lint
* go generate
* Bump selinux in cgroup test
* Bump to v1.27.1 tags
* Release documentation improvements
* Only run upgrade e2e test on PR
Signed-off-by: Derek Nola <derek.nola@suse.com>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Co-authored-by: Brad Davidson <brad.davidson@rancher.com>
* Remove deprecated nodeSelector label beta.kubernetes.io/os
Problem:
The nodeSelector label beta.kubernetes.io/os in the CoreDNS deployment was deprecated in 1.14 and will likely be removed soon
Solution:
Change the nodeSelector to remove the beta
Signed-off-by: Dan Mills <evilhamsterman@gmail.com>
We need to send the full chain in order for cross-signing to work
properly during switchover to a new root.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Wait for kubelet port to be ready before setting
* Wait for kubelet to update the Ready status before reading port
Signed-off-by: Daishan Peng <daishan@acorn.io>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Co-authored-by: Brad Davidson <brad.davidson@rancher.com>
Turns out etcd-only nodes were never running **any** of the controllers,
so allowing multiple controllers didn't really fix things.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Prevents errors when starting with fail-closed webhooks
Also, use panic instead of Fatalf so that the CloudControllerManager rescue can handle the error
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Allow bootstrapping with kubeadm bootstrap token strings or existing
Kubelet certs. This allows agents to join the cluster using kubeadm
bootstrap tokens, as created with the `k3s token create` command.
When the token expires or is deleted, agents can successfully restart by
authenticating with their kubelet certificate via node authentication.
If the token is gone and the node is deleted from the cluster, node auth
will fail and they will be prevented from rejoining the cluster until
provided with a valid token.
Servers still must be bootstrapped with the static cluster token, as
they will need to know it to decrypt the bootstrap data.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
This command must be run on a server while the service is running. After this command completes, all the servers in the cluster should be restarted to load the new CA files.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* General cleanup of test-helpers functions to address CI failures
* Install awscli in test image
* Log containerd output to file even when running with --debug
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Also ensure that the snapshot job does not attempt to trigger multiple concurrent runs, as this is not supported.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
ServiceLB now requires this module, but it will not get autoloaded by the kubelet if the host is using nftables.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Add enivironment variables for port-driver, cidr, mtu, and disable-host-loopback settings. Since rootless is still experimental, I don't think they deserve full CLI flag status.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Add EncryptSecrets to Critical Control Args
* use deep comparison to extract differences
Signed-off-by: Derek Nola <derek.nola@suse.com>
Signed-off-by: Derek Nola <derek.nola@suse.com>
Problem:
Using defer inside a loop can lead to resource leaks
Solution:
Judge newer file in the separate function
Signed-off-by: iyear <ljyngup@gmail.com>
Using the node external IP address for all CNI traffic is a breaking change from previous versions; we should make it an opt-in for distributed clusters instead of default behavior.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
The InstancesV1 interface handled this for us by combining the ProviderName and InstanceID values; the new interface requires us to do it manually
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
For 1.24 and earlier, the svclb pods need a ServiceAccount so that we can allow their sysctls in PSPs
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
We should be reading from the hijacked bufio.ReaderWriter instead of
directly from the net.Conn. There is a race condition where the
underlying http handler may consume bytes from the hijacked request
stream, if it comes in the same packet as the CONNECT header. These
bytes are left in the buffered reader, which we were not using. This was
causing us to occasionally drop a few bytes from the start of the
tunneled connection's client data stream.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
If CCM and ServiceLB are both disabled, don't run the cloud-controller-manager at all;
this should provide the same CLI flag behavior as previous releases, and not create
problems when users disable the CCM but still want ServiceLB.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Consolidate data dir flag
* Group cluster flags together
* Reorder and group agent flags
* Add additional info around vmodule flag
* Hide deprecated flags, and add warning about their removal
Signed-off-by: Derek Nola <derek.nola@suse.com>