Commit Graph

1700 Commits (11ef43011a3c1ac77912c94334bc65c40d18a51f)

Author SHA1 Message Date
Erik Wilson 95b895038c
Add locking and verification for data directory extraction 2020-10-06 10:29:27 -07:00
Erik Wilson ce0da0a0f4
Add file verification for data directory 2020-10-06 10:29:27 -07:00
Erik Wilson 66d29148f7
Add Release function for flock 2020-10-06 10:29:27 -07:00
Erik Wilson 360d82d20e
Add flock from k8s.io/kubernetes/pkg/util/flock 2020-10-06 10:29:26 -07:00
Brad Davidson c3c983198f Add temporary fix for issue with interrupted etcd promote
This is a minimal fix for https://github.com/rancher/rke2/issues/392

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-30 11:45:58 -07:00
Hussein Galal 373449ec0a
Allow for multiple etcd snapshot restoration (#2307)
* add reset tmp file

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go imports

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix multiple lines string

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix typo

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* use resetFile function

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2020-09-30 02:53:31 +02:00
Brad Davidson 8262e23169
Revert removal of EndpointName hooks (#2319)
* Revert "Remove dead EndpointName code"
    This reverts commit 8025da5a8d.
* Fix docstrings based on proper understanding of use
2020-09-28 18:13:55 -07:00
Brad Davidson 714227bdc7
Merge pull request #2300 from brandond/fix_2249
Fix managed etcd cold startup deadlock issue #2249
2020-09-28 10:56:51 -07:00
Brad Davidson 360b0f1ee5 Add timeout to clientaccess http client
The default http client does not have an overall request timeout, so
connections to misbehaving or unavailable servers can stall for an
excessive amount of time. At the moment, just attempting to join
an unavailable cluster takes 2 minutes and 40 seconds to timeout.

Resolve that by setting a reasonable request timeout.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:26:27 -07:00
Brad Davidson cdfc6cfa1a Split clientaccess token/kubeconfig code
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:26:27 -07:00
Brad Davidson 45dd4afe50 Simplify token parsing
Improves readability, reduces round-trips to the join server to validate certs.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:26:24 -07:00
Brad Davidson 9074da7405 Fix misc nits and missing/unused imports
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:10:00 -07:00
Brad Davidson 703ba5cde7 Add a bunch of doc comments
Also change identical error messages to clarify where problems are
occurring.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:10:00 -07:00
Brad Davidson ae916c2dec Use const for kube-system namespace
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:10:00 -07:00
Brad Davidson f59e8fc21b Fix etcd directory permissions
Silences warning on startup about insecure directory permissions

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:10:00 -07:00
Brad Davidson ee99660a96 Rename etcd directory helpers to reduce confusion about which datadir we're talking about
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:10:00 -07:00
Brad Davidson 8025da5a8d Remove dead EndpointName code
According to @galal-hussein this is dead code that was probably brought
over from Kine. I certainly couldn't figure out what it is supposed to
be doing.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:10:00 -07:00
Brad Davidson 97eb28a01a Remove unnecessary listener arg from managed DB setup
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:09:45 -07:00
Brad Davidson a3bbd58f37 Fix managed etcd cold startup deadlock issue #2249
We should ignore --token and --server if the managed database is initialized,
just like we ignore --cluster-init. If the user wants to join a new
cluster, or rejoin a cluster after --cluster-reset, they need to delete
the database. This a cleaner way to prevent deadlocking on quorum loss,
and removes the requirement that the target of the --server argument
must be online before already joined nodes can start.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 02:44:49 -07:00
Kevin Messer 6c9f3d528a
feat(install): replace rpm by yum for setup_selinux (#1829)
It's a bad practice to install packages via rpm directly. It's better to install all packages with Yum/Dnf. It's also possible to install packages directly via an URL, which is the purpose of this PR.
2020-09-26 01:45:33 -07:00
Adam Farden 86d2e2a5f8
[systemd] really wait for network to come online (#1665)
Wants= is required to actually set the dependency on network-online.service
After= is required or k3s.service will be started at the same time as network-online.service

In network environments with slow DHCP, both are required to ensure valid network configuration for k3s

Signed-off-by: Adam Farden <adam@farden.cz>
2020-09-26 01:44:06 -07:00
Matthew Clive fc55904d82
Add network dependency to installed service file (#2210)
Adds the line `After=network-online.target` to the k3s systemd service
file. This applies the fix mentioned in
[this GH comment](https://github.com/rancher/k3s/issues/1626#issuecomment-642253812)
which I can confirm makes k3s networking survive reboot in Raspbian
Buster.

[It appears, in some docs I found](https://www.digitalocean.com/community/tutorials/understanding-systemd-units-and-unit-files)
that this is a recommended and usual way of specifying that we need the
target to be _completed_ before starting k3s. Using just the `Wants=`
directive doesn't work for this task, you have to add both directives
at once to do this. Quote:

> `Wants=`: This directive is similar to `Requires=`, but less strict.
> `Systemd` will attempt to start any units listed here when this unit
> is activated. If these units are not found or fail to start, the
> current unit will continue to function. This is the recommended way to
> configure most dependency relationships. **Again, this implies a
> parallel activation unless modified by other directives**

> [...]

> `After=`: The units listed in this directive will be started before
> starting the current unit. This does not imply a dependency
> relationship and **one must be established through the above
> directives if this is required.**

- _(Emphasis mine)_

Signed-off-by: Matthew Clive <arcticlight@arcticlight.me>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-26 01:42:17 -07:00
David Nuzik b56e52ee96
Merge pull request #2291 from MonzElmasry/k3s_1.18.9
Mark k3s v1.18.9+k3s1 as stable
2020-09-24 09:21:06 -07:00
Brad Davidson 42bba04651
Skip etcd snapshots if the local endpoint is still a learner (#2295)
* Don't take snapshots if the local endpoint is still a learner
* Configure timeouts for etcd client dialer
2020-09-21 20:23:18 -07:00
Brad Davidson f5b506ccaf Add trivy cache volume to build
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-21 17:16:11 -07:00
Menna Elmasry bb148d0abc
Merge pull request #2292 from MonzElmasry/bump_k8s_version_master
bump k8s version to v1.19.2 on master
2020-09-21 22:52:22 +02:00
MonzElmasry 302fd26f50
bump k8s version to v1.19.2 on master
Signed-off-by: MonzElmasry <menna.elmasry@rancher.com>
2020-09-21 22:21:25 +02:00
MonzElmasry ef1637cedf
Mark k3s 1.18.9+k3s1 as stable
Signed-off-by: MonzElmasry <menna.elmasry@rancher.com>
2020-09-21 21:15:59 +02:00
Brian Downs ba70c41cce
Initial Logging Output Update (#2246)
This attempts to update logging statements to make them consistent
through out the code base. It also adds additional context to messages
where possible, simplifies messages, and updates level where necessary.
2020-09-21 09:56:03 -07:00
Hussein Galal 46fe57d7e9
reset etcd name on cluster reset (#2284)
* reset etcd name on cluster reset

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* gofmt

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2020-09-19 03:09:36 +02:00
Craig Jellick 0aa73c4765
Update ROADMAP.md 2020-09-18 11:40:03 -07:00
Brad Davidson 4db415c1db Only create k3s-images.txt on amd64
The list is the same across architectures, and is validated against the
list in git as part of CI... so there's no reason to be pushing it from
every pipeline. It's also causing conflicts when multiple pipelines try
to upload it at the same time.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-18 09:49:58 -07:00
Brian Downs e0a9060d59
Merge pull request #2253 from briandowns/issue-2106
Add Trivy Scans for Built Images
2020-09-17 08:46:31 -07:00
Craig Jellick b66760fccd Add 1.19 to channel.yaml
This will cause 1.19.* releases to be aggregated into a v1.19 channel on
https://update.k3s.io/v1-release/channels

Signed-off-by: Craig Jellick <craig@rancher.com>
2020-09-16 16:32:11 -07:00
Brian Downs 20a8327214 use latest trivy version
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2020-09-16 13:49:51 -07:00
Brian Downs 74ce99f5ff remove use of docker image for arch purposes
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2020-09-16 13:37:42 -07:00
Brad Davidson 8c6d3567fe Rename k3s-controller based on the build-time program name
Since we're replacing the k3s rolebindings.yaml in rke2, we should allow
renaming this so that we can use the white-labeled name downstream.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-16 10:53:07 -07:00
Brad Davidson ae5519c047
Use rancher-mirrored busybox for local-path-provisioner (#2257)
Related to #1908

Will be fixed upstream by
https://github.com/rancher/local-path-provisioner/pull/135/ but we're
not going to update the LPP image right now since it's undergoing some
changes that we don't want to pick up at the moment.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-15 18:02:51 -07:00
Brian Downs 3a2aff67da update shell if syntax
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2020-09-15 12:31:57 -07:00
Brian Downs 75209a7ec7 add support for arm
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2020-09-15 12:28:46 -07:00
Brian Downs c53f7e99e2 update error message
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2020-09-15 11:54:34 -07:00
Brian Downs f4c12a44ee add trivy scans for built images
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2020-09-15 11:43:27 -07:00
Erik Wilson a08e998bc5 Import containerd images with all platforms
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-14 20:44:58 -07:00
Brad Davidson fcaeebaa18 Add support for disabling all staged content
This reduces the binary footprint for downstream users that won't use
these files anyway.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-14 14:21:37 -07:00
Menna Elmasry edb3e5b7a7
Add error logger to http server (#2242)
* add error logger to http server

Signed-off-by: MonzElmasry <menna.elmasry@rancher.com>
2020-09-14 23:14:30 +02:00
Hussein Galal beab211685
update etcd to use rancher fork (#2238)
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2020-09-12 01:13:27 +02:00
Craig Jellick f26785c0e2
Add guidance on PRs and git commits (#2236)
We need to make the expectations around git commits and the process for
pull requests more clear to contributors.

Signed-off-by: Craig Jellick <craig@rancher.com>
2020-09-11 12:46:21 -07:00
Brad Davidson 617b34c588 Update golang to 1.15.2
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-11 11:52:03 -07:00
Brad Davidson 5ad76043ac Replace unmount read loop with awk
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-11 10:33:37 -07:00
Hussein Galal 041f18f6da
pin down grpc and related library in go.mod (#2222)
* pin down grpc and related library in go.mod

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go mod tidy

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2020-09-10 00:08:21 +02:00