As per https://github.com/golang/go/issues/47001 even subtle.ConstantTimeCompare should never be used with variable-length inputs, as it will return 0 if the lengths do not match. Switch to consistently using constant-time comparisons of hashes for password checks to avoid any possible side-channel leaks that could be combined with other vectors to discover password lengths.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Also add bandwidth and firewall plugins. The bandwidth plugin is
automatically registered with the appropriate capability, but the
firewall plugin must be configured by the user if they want to use it.
Ref: https://www.cni.dev/plugins/current/meta/firewall/
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* local-storage: Fix permission
/var/lib/rancher/k3s/storage/ should be 700
/var/lib/rancher/k3s/storage/* should be 777
Fixes#2348
Signed-off-by: Boleyn Su <boleyn.su@gmail.com>
* Fix pod command field type
* Fix to int test
Signed-off-by: Derek Nola <derek.nola@suse.com>
---------
Signed-off-by: Boleyn Su <boleyn.su@gmail.com>
Signed-off-by: Derek Nola <derek.nola@suse.com>
Co-authored-by: Brad Davidson <brad@oatmail.org>
Co-authored-by: Derek Nola <derek.nola@suse.com>
* Add helper function for multiple arguments in stringslice
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Cleanup server setup with util function
Signed-off-by: Derek Nola <derek.nola@suse.com>
Several places in the code used a 5-second retry loop to wait on
Runtime.Core to be set. This caused a race condition where OnChange
handlers could be added after the Wrangler shared informers were already
started. When this happened, the handlers were never called because the
shared informers they relied upon were not started.
Fix that by requiring anything that waits on Runtime.Core to run from a
cluster controller startup hook that is guaranteed to be called before
the shared informers are started, instead of just firing it off in a
goroutine that retries until it is set.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Don't set up the agent tunnel authorizer on agentless servers, and warn when agentless servers won't have a way to reach in-cluster endpoints.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Fixes an issue where CRDs were being created without schema, allowing
resources with invalid content to be created, later stalling the
controller ListWatch event channel when the invalid resources could not
be deserialized.
This also requires moving Addon GVK tracking from a status field to
an annotation, as the GroupVersionKind type has special handling
internal to Kubernetes that prevents it from being serialized to the CRD
when schema validation is enabled.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Bump go version to 1.20.3 to match upstream
* Bump cri-dockerd
* Bump golanci-lint
* go generate
* Bump selinux in cgroup test
* Bump to v1.27.1 tags
* Release documentation improvements
* Only run upgrade e2e test on PR
Signed-off-by: Derek Nola <derek.nola@suse.com>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Co-authored-by: Brad Davidson <brad.davidson@rancher.com>
* Remove deprecated nodeSelector label beta.kubernetes.io/os
Problem:
The nodeSelector label beta.kubernetes.io/os in the CoreDNS deployment was deprecated in 1.14 and will likely be removed soon
Solution:
Change the nodeSelector to remove the beta
Signed-off-by: Dan Mills <evilhamsterman@gmail.com>
We need to send the full chain in order for cross-signing to work
properly during switchover to a new root.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Wait for kubelet port to be ready before setting
* Wait for kubelet to update the Ready status before reading port
Signed-off-by: Daishan Peng <daishan@acorn.io>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Co-authored-by: Brad Davidson <brad.davidson@rancher.com>
Turns out etcd-only nodes were never running **any** of the controllers,
so allowing multiple controllers didn't really fix things.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Prevents errors when starting with fail-closed webhooks
Also, use panic instead of Fatalf so that the CloudControllerManager rescue can handle the error
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Allow bootstrapping with kubeadm bootstrap token strings or existing
Kubelet certs. This allows agents to join the cluster using kubeadm
bootstrap tokens, as created with the `k3s token create` command.
When the token expires or is deleted, agents can successfully restart by
authenticating with their kubelet certificate via node authentication.
If the token is gone and the node is deleted from the cluster, node auth
will fail and they will be prevented from rejoining the cluster until
provided with a valid token.
Servers still must be bootstrapped with the static cluster token, as
they will need to know it to decrypt the bootstrap data.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
This command must be run on a server while the service is running. After this command completes, all the servers in the cluster should be restarted to load the new CA files.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* General cleanup of test-helpers functions to address CI failures
* Install awscli in test image
* Log containerd output to file even when running with --debug
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Also ensure that the snapshot job does not attempt to trigger multiple concurrent runs, as this is not supported.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
ServiceLB now requires this module, but it will not get autoloaded by the kubelet if the host is using nftables.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Add enivironment variables for port-driver, cidr, mtu, and disable-host-loopback settings. Since rootless is still experimental, I don't think they deserve full CLI flag status.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Add EncryptSecrets to Critical Control Args
* use deep comparison to extract differences
Signed-off-by: Derek Nola <derek.nola@suse.com>
Signed-off-by: Derek Nola <derek.nola@suse.com>
Problem:
Using defer inside a loop can lead to resource leaks
Solution:
Judge newer file in the separate function
Signed-off-by: iyear <ljyngup@gmail.com>
Using the node external IP address for all CNI traffic is a breaking change from previous versions; we should make it an opt-in for distributed clusters instead of default behavior.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
The InstancesV1 interface handled this for us by combining the ProviderName and InstanceID values; the new interface requires us to do it manually
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
For 1.24 and earlier, the svclb pods need a ServiceAccount so that we can allow their sysctls in PSPs
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
We should be reading from the hijacked bufio.ReaderWriter instead of
directly from the net.Conn. There is a race condition where the
underlying http handler may consume bytes from the hijacked request
stream, if it comes in the same packet as the CONNECT header. These
bytes are left in the buffered reader, which we were not using. This was
causing us to occasionally drop a few bytes from the start of the
tunneled connection's client data stream.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
If CCM and ServiceLB are both disabled, don't run the cloud-controller-manager at all;
this should provide the same CLI flag behavior as previous releases, and not create
problems when users disable the CCM but still want ServiceLB.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Consolidate data dir flag
* Group cluster flags together
* Reorder and group agent flags
* Add additional info around vmodule flag
* Hide deprecated flags, and add warning about their removal
Signed-off-by: Derek Nola <derek.nola@suse.com>
Having separate tokens for server and agent nodes is a nice feature.
However, passing server's plain `K3S_AGENT_TOKEN` value
to `k3s agent --token` without CA hash is insecure when CA is
self-signed, and k3s warns about it in the logs:
```
Cluster CA certificate is not trusted by the host CA bundle, but the token does not include a CA hash.
Use the full token from the server's node-token file to enable Cluster CA validation.
```
Okay so I need CA hash but where should I get it?
This commit attempts to fix this issue by saving agent token value to
`agent-token` file with CA hash appended.
Signed-off-by: Vladimir Kochnev <hashtable@yandex.ru>
Requires tweaking existing method signature to allow specifying whether or not IPv6 addresses should be return URL-safe.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Use INVOCATION_ID to detect execution under systemd, since as of a9b5a1933f NOTIFY_SOCKET is now cleared by the server code.
* Set the unit type to notify by default for both server and agent, which is what Rancher-managed installs have done for a while.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Addressess issue where the compact may take more than 10 seconds on slower disks. These disks probably aren't really suitable for etcd, but apparently run fine otherwise.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Increase the default snapshot timeout. The timeout is not currently
configurable from Rancher, and larger clusters are frequently seeing
uploads fail at 30 seconds.
* Enable compression for scheduled snapshots if enabled on the
command-line. The CLI flag was not being passed into the etcd config.
* Only set the S3 content-type to application/zip if the file is zipped.
* Don't run more than one snapshot at once, to prevent misconfigured
etcd snapshot cron schedules from stacking up.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
87e1806697 removed the OwnerReferences
field from the DaemonSet, which makes sense since the Service may now be
in a different namespace than the DaemonSet and cross-namespace owner
references are not supported. Unfortunately, we were relying on
garbage collection to delete the DameonSet, so this started leaving
orphaned DaemonSets when Services were deleted.
We don't want to add an a Service OnRemove handler, since this will add
finalizers to all Services, not just LoadBalancers services, causing
conformance tests to fail. Instead, manage our own finalizers, and
restore the DaemonSet removal Event that was removed by the same commit.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Since #4438 removed 2-way sync and treats any changed+newer files on disk as an error, we no longer need to determine if files are newer on disk/db or if there is a conflicting mix of both. Any changed+newer file is an error, unless we're doing a cluster reset in which case everything is unconditionally replaced.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Properly skip restoring bootstrap data for files that don't have a path
set because the feature that would set it isn't enabled.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
cadvisor still doesn't pull stats via CRI yet, so we have to continue to use the deprecated arg.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>