Commit Graph

1005 Commits (206006cc333a4ff1fe0d0c32b4385099a84807f0)

Author SHA1 Message Date
Brad Davidson 206006cc33 Return ProviderID in URI format
The InstancesV1 interface handled this for us by combining the ProviderName and InstanceID values; the new interface requires us to do it manually

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-10-17 10:26:22 -07:00
Brad Davidson 128b882b69 Add ServiceAccount for svclb pods
For 1.24 and earlier, the svclb pods need a ServiceAccount so that we can allow their sysctls in PSPs

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit f25419ca2c)
2022-10-14 16:35:31 -07:00
Derek Nola 39361a8053
[Release-1.23] Replace deprecated ioutil package (#6236)
* check integration test null pointer

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Remove rotate retries

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Replace ioutil package

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Rebase fix

Signed-off-by: Derek Nola <derek.nola@suse.com>

Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-10-07 19:02:58 -07:00
Brad Davidson 72177d42ca Bump traefik to 2.9.1 / chart 12.0.0
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-10-07 16:58:35 -07:00
Brad Davidson d814456e90 Handle custom kubelet port in agent tunnel
The kubelet port can be overridden by users; we shouldn't assume its always 10250

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-10-07 16:58:35 -07:00
Brad Davidson dddd06da5c Fix occasional "TLS handshake error" in apiserver network proxy.
We should be reading from the hijacked bufio.ReaderWriter instead of
directly from the net.Conn. There is a race condition where the
underlying http handler may consume bytes from the hijacked request
stream, if it comes in the same packet as the CONNECT header. These
bytes are left in the buffered reader, which we were not using. This was
causing us to occasionally drop a few bytes from the start of the
tunneled connection's client data stream.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-10-07 16:58:35 -07:00
Brad Davidson 95f216a5b0 Use structured logging instead of logrus for event recorders
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-10-07 16:58:35 -07:00
Brad Davidson 75303fe579 Disable cloud-node and cloud-node-lifecycle if CCM is disabled
If CCM and ServiceLB are both disabled, don't run the cloud-controller-manager at all;
this should provide the same CLI flag behavior as previous releases, and not create
problems when users disable the CCM but still want ServiceLB.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-10-07 16:58:35 -07:00
Brad Davidson d59b8407a7 Move servicelb into cloudprovider LoadBalancer interface
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-10-07 16:58:35 -07:00
Brad Davidson 53fede31e2 Move DisableServiceLB/Rootless/ServiceLBNamespace into config.Control
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-10-07 16:58:35 -07:00
Brad Davidson 806a8989e5 Implement InstancesV2 instead of Instances
... and drop legacy ClusterID support.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-10-07 16:58:35 -07:00
Brad Davidson 5d9c826133 Bump metrics-server to v0.6.1
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-10-07 16:58:35 -07:00
Manuel Buil be9c26a086 Add flannel-external-ip when there is a k3s node-external-ip
Signed-off-by: Manuel Buil <mbuil@suse.com>
2022-09-29 10:05:59 +02:00
Derek Nola 5c5201f724 Remove codespell from Drone, add to GH Actions (#6004)
Signed-off-by: Derek Nola <derek.nola@suse.com>
(cherry picked from commit 035c03cfaa)
2022-08-18 13:28:21 -07:00
Roberto Bonafiglia d04af60aad Updated flannel to v0.19.1
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2022-08-08 09:58:17 +02:00
Roberto Bonafiglia c366ee06de Fix comments and add check in case of IPv6 only node
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2022-08-04 18:47:13 +02:00
Roberto Bonafiglia b5f2344283 Added NodeIP autodect in case of dualstack connection
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2022-08-04 18:47:13 +02:00
Vladimir Kochnev c94dd33012 Save agent token to /var/lib/rancher/k3s/server/agent-token
Having separate tokens for server and agent nodes is a nice feature.

However, passing server's plain `K3S_AGENT_TOKEN` value
to `k3s agent --token` without CA hash is insecure when CA is
self-signed, and k3s warns about it in the logs:

```
Cluster CA certificate is not trusted by the host CA bundle, but the token does not include a CA hash.
Use the full token from the server's node-token file to enable Cluster CA validation.
```

Okay so I need CA hash but where should I get it?

This commit attempts to fix this issue by saving agent token value to
`agent-token` file with CA hash appended.

Signed-off-by: Vladimir Kochnev <hashtable@yandex.ru>
(cherry picked from commit 13af0b1d88)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-08-04 09:39:34 -07:00
Brad Davidson 4afe65bff5 Replace getLocalhostIP with Loopback helper method
Requires tweaking existing method signature to allow specifying whether or not IPv6 addresses should be return URL-safe.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 5eaa0a9422)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-08-04 09:39:34 -07:00
Brad Davidson 15100b5081 Add service-cluster-ip-range to controller-manager args
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 84fb8787f2)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-08-04 09:39:34 -07:00
Brad Davidson 89f4bce6d7 Fix server systemd detection
* Use INVOCATION_ID to detect execution under systemd, since as of a9b5a1933f NOTIFY_SOCKET is now cleared by the server code.
* Set the unit type to notify by default for both server and agent, which is what Rancher-managed installs have done for a while.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit bd5fdfce33)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-08-04 09:39:34 -07:00
Brad Davidson 74d017cda2 Raise etcd connection test timeout to 30 seconds
Addressess issue where the compact may take more than 10 seconds on slower disks. These disks probably aren't really suitable for etcd, but apparently run fine otherwise.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 1674b9d640)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-08-04 09:39:34 -07:00
Derek Nola 3bcd7cee81
[Release-1.23] Update etcd error to match correct url (#5949)
* Update etcd error to match correct url 
* Bump macos version used in GH actions.

Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-08-03 20:10:07 -07:00
Derek Nola 313dd6e597
Fix secrets reencryption for 8K+ secrets (#5939)
Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-08-03 09:10:04 -07:00
Brad Davidson ea29d624ad Address issues with etcd snapshots
* Increase the default snapshot timeout. The timeout is not currently
  configurable from Rancher, and larger clusters are frequently seeing
  uploads fail at 30 seconds.
* Enable compression for scheduled snapshots if enabled on the
  command-line. The CLI flag was not being passed into the etcd config.
* Only set the S3 content-type to application/zip if the file is zipped.
* Don't run more than one snapshot at once, to prevent misconfigured
  etcd snapshot cron schedules from stacking up.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-12 14:42:10 -07:00
Brad Davidson 118f9a9f53 Bump remotedialer
Includes fix for recently identified memory leak.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-12 12:29:04 -07:00
Brad Davidson beb6e4c8da Fix deletion of svclb DaemonSet when Service is deleted
87e1806697 removed the OwnerReferences
field from the DaemonSet, which makes sense since the Service may now be
in a different namespace than the DaemonSet and cross-namespace owner
references are not supported.  Unfortunately, we were relying on
garbage collection to delete the DameonSet, so this started leaving
orphaned DaemonSets when Services were deleted.

We don't want to add an a Service OnRemove handler, since this will add
finalizers to all Services, not just LoadBalancers services, causing
conformance tests to fail. Instead, manage our own finalizers, and
restore the DaemonSet removal Event that was removed by the same commit.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-12 12:28:20 -07:00
Brad Davidson 81c8f89c4b Remove legacy bidirectional datastore sync code
Since #4438 removed 2-way sync and treats any changed+newer files on disk as an error, we no longer need to determine if files are newer on disk/db or if there is a conflicting mix of both. Any changed+newer file is an error, unless we're doing a cluster reset in which case everything is unconditionally replaced.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-12 12:10:42 -07:00
Brad Davidson b146f27bc9 Fix fatal error when reconciling bootstrap data
Properly skip restoring bootstrap data for files that don't have a path
set because the feature that would set it isn't enabled.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-12 12:10:42 -07:00
Brad Davidson 9cee7bd5dd Don't crash when service IPFamiliyPolicy is not set
Service.Spec.IPFamilyPolicy may be a nil pointer on freshly upgraded clusters.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-01 11:16:59 -07:00
Brad Davidson 15ad163208 Fix egress selector proxy/bind-address support
Use same kubelet-preferred-address-types setting as RKE2 to improve reliability of the egress selector when using a HTTP proxy. Also, use BindAddressOrLoopback to ensure that the correct supervisor address is used when --bind-address is set.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-01 11:16:59 -07:00
Brad Davidson 7287947726 Handle egress-selector-mode change during upgrade
Properly handle unset egress-selector-mode from existing servers during cluster upgrade.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-30 15:11:04 -07:00
Darren Shepherd c9ae15c85d Introduce servicelb-namespace parameter
This parameter controls which namespace the klipper-lb pods will be create.
It defaults to kube-system so that k3s does not by default create a new
namespace. It can be changed if users wish to isolate the pods and apply
some policy to them.

Signed-off-by: Darren Shepherd <darren@acorn.io>
(cherry picked from commit e6009b1edf)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-16 11:43:17 -07:00
Darren Shepherd 927ff06328 Move all klipper-lb daemonset to common namespace for PodSecurity
The baseline PodSecurity profile will reject klipper-lb pods from running.
Since klipper-lb pods are put in the same namespace as the Service this
means users can not use PodSecurity baseline profile in combination with
the k3s servicelb.

The solution is to move all klipper-lb pods to a klipper-lb-system where
the security policy of the klipper-lb pods can be different an uniformly
managed.

Signed-off-by: Darren Shepherd <darren@acorn.io>
(cherry picked from commit f4cc1b8788)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-16 11:43:17 -07:00
Sjoerd Simons d29b621107 Add ability to pass configuration options to flannel backend
Allow the flannel backend to be specified as
backend=option=val,option2=val2 to select a given backend with extra options.

In particular this adds the following options to wireguard-native
backend:
* Mode - flannel wireguard tunnel mode
* PersistentKeepaliveInterval- wireguard persistent keepalive interval

Signed-off-by: Sjoerd Simons <sjoerd@collabora.com>
2022-06-16 09:56:47 +02:00
Derek Nola 330993e1eb
Delay service readiness until after startuphooks have finished (#5649) (#5723)
* Move startup hooks wg into a runtime pointer, check before notifying systemd
* Switch default systemd notification to server
* Add 1 sec delay to allow etcd to write to disk
Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-06-15 13:11:22 -07:00
Brad Davidson ebd43f5311 Only listen on loopback when resetting
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-15 11:26:27 -07:00
Brad Davidson b00bc5cfd3 Ensure that CONTAINERD_ variables are not shadowed by later entries
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-15 10:58:21 -07:00
Brad Davidson c9e6b056a3 Sanitize filenames for use in configmap keys
If the user points S3 backups at a bucket containing other files, those
file names may not be valid configmap keys.

For example, RKE1 generates backup files with names like
`s3-c-zrjnb-rs-6hxpk_2022-05-05T12:05:15Z.zip`; the semicolons in the
timestamp portion of the name are not allowed for use in configmap keys.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-15 10:54:36 -07:00
Manuel Buil 2507fd8c93
Merge pull request #5682 from manuelbuil/flannelcniconf123
[Release 1.23] Add FlannelConfCNI flag
2022-06-15 10:18:51 +02:00
Derek Nola 9f7eec9525
add support for pprof server (#5690)
Signed-off-by: igor <igor@igor.io>
Signed-off-by: Derek Nola <derek.nola@suse.com>

Co-authored-by: Igor <igorwwwwwwwwwwwwwwwwwwww@users.noreply.github.com>
2022-06-14 17:51:07 -07:00
Manuel Buil 3e747021a8 Add FlannelConfCNI flag
Signed-off-by: Manuel Buil <mbuil@suse.com>
2022-06-14 10:24:58 +02:00
Brad Davidson ec61c6673a Set default egress-selector-mode to agent
... until QA flakes can be addressed.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-10 10:14:34 -07:00
Brad Davidson 7a323400e9 Remove control-plane egress context and fix agent mode.
The control-plane context handles requests outside the cluster and
should not be sent to the proxy.

In agent mode, we don't watch pods and just direct-dial any request for
a non-node address, which is the original behavior.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-10 10:14:34 -07:00
Brad Davidson 15b8fb962a Refactor egress-selector pods mode to watch pods
Watching pods appears to be the most reliable way to ensure that the
proxy routes and authorizes connections.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-08 09:35:25 -07:00
Brad Davidson 615020eed9 Add support for configuring the EgressSelector mode
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 9d7230496d)
2022-05-23 13:54:08 -07:00
Donnie Adams ebc397b61a Remove objects when removed from manifests (#5560)
* Remove objects when removed from manifests

If a user puts a file in /var/lib/rancher/k3s/server/manifests/ then the
objects contained therein are deployed to the cluster. If the objects
are removed from that file, they are not removed from the cluster.

This change tracks the GVKs in the files and will remove objects when
there are removed from the cluster.

Signed-off-by: Donnie Adams <donnie.adams@suse.com>
(cherry picked from commit c38a8c3b43)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-05-20 17:03:14 -07:00
Brad Davidson e6385b2341 Update CNI version in config file
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-05-11 14:39:07 -07:00
Manuel Buil a3b35d21e9 Add "ipFamilyPolicy: PreferDualStack" to have dual-stack ingress support
Signed-off-by: Manuel Buil <mbuil@suse.com>
2022-05-04 17:32:34 +02:00
Brad Davidson 1d4f995edd Move auto-generated resolv.conf out of /tmp to prevent accidental cleanup
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-05-03 20:33:32 -07:00