diff --git a/docs/adrs/ca-cert-rotation.md b/docs/adrs/ca-cert-rotation.md new file mode 100644 index 0000000000..876a94c09e --- /dev/null +++ b/docs/adrs/ca-cert-rotation.md @@ -0,0 +1,83 @@ +# Support CA Certificate Renewal / Rotation, Signing by External Root + +Date: 2022-12-19 + +## Status + +Accepted + +## Context + +On the first startup of a new cluster, K3s currently autogenerates a number of self-signed cluster CAs and keys: +* Cluster Server CA + Key (used to sign server certificates) +* Cluster Client CA + Key (used to sign client certificates) +* Request Header CA + Key (used to sign certificates for apiserver aggregation) +* etcd Peer CA + Key (used to sign certificates for authentication between etcd peer servers) +* etcd Client CA + Key (used to sign certificates for etcd clients, ie the apiserver) +* ServiceAccount Token Signing Key (used to sign ServiceAccount JSON Web Tokens) + +These CAs are all self-signed, without any cross-signing or common root or intermediates, and are valid for 10 +years. When any of these certs expire, any certificates issued will be invalid, causing a significant outage +to the cluster. + +### Server CA Pinning + +The Cluster Server CA is used in node bootstrapping. The full `K10` format token includes a SHA265 sum of the +Cluster Server CA file's on-disk PEM representation. Nodes that join the cluster using a full token perform a +set of checks when starting up: +1. Download the cluster server CA bundle from `/v1-k3s/cacert` on the server they are joining. +2. Validate that the hash of the bytes in the CA bundle match the hash string following the `K10` prefix in the + token. +3. Validate that the certificate presented by the server they are joining can be validated using the roots and + intermediates present in the CA bundle. + +Realistically, this hash should have instead been derived from the DER encoding of the root certificate in +that bundle, as PEM format allows for variable padding, line lengths, and so on. Only DER format is guaranteed +to be stable, and hashing only the root of the chain would have allowed for rotating or renewing intermediate +CAs without breaking trust between cluster nodes. + +### Bootstrap Data Immutability + +There is not currently any way to write new certificates to the datastore. The certificates and keys are +written to disk once on initial startup, and from there written to the cluster datastore. From that point on, +the files in the datastore are considered authoritative; replacing the files on disk will result in either +replacement, or error, depending on whether or not the files on disk are newer than those in the datastore. + +The `secrets-encrypt` subcommand does currently mutate the bootstrap data, but it only touches the secrets +encryption configuration, not the CA certs or keys. + +### Summary + +For both of the above reasons (hash pinning, and lack of rewriteability) it is not currently possible to +renew or replace the cluster CA certs or keys. + +### Additional Considerations + +#### Aggressive Certificate Rotation + +Some users (particularly government or financial customers) attempt to implement the guidance from [NIST SP 800-57 +Part 1 Rev. 5](https://csrc.nist.gov/publications/detail/sp/800-57-part-1/rev-5/final). This document would +see users signing cluster CAs with a set of organizational root and intermediate certificates, and rotating +both the intermediate and cluster CA certificates and keys on at least a yearly basis. + +#### ServiceAccount Signing Key Rolling Replacement + +While the ServiceAccount signing key is not signed by any CA, rotation of the key must be done carefully so +as to avoid causing an outage. The apiserver and controller-manager must be updated to use a new key, while +still trusting the old key for a period of time. The old key can then be removed at a later date, once all +clients using tokens signed by the old key have received new tokens. + +## Decision + +* K3s will allow for use of CA certificates signed by an arbitrary set of external root/intermediate CAs. +* K3s will allow for nondisruptive renewal or replacement of the CA certificates and keys, if the cluster was + originally started using user-provided certificates signed by an external CA. +* K3s will allow for disruptive renewal or replacement of cluster CA certificates and keys, if the cluster was + originally started with autogenerated self-signed CAs. +* K3s will provide example tooling to allow users to generate cluster CA certificates and keys prior to initial + cluster startup, and provide tooling and process documentation to update the bootstrap data and prepare agents + to trust the new certificates (if necessary) + +## Consequences + +This will require additional documentation, CLI subcommands, and QA work to validate the process steps.