Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
track/close kubelet->API connections on heartbeat failure
xref #48638
xref https://github.com/kubernetes-incubator/kube-aws/issues/598
we're already typically tracking kubelet -> API connections and have the ability to force close them as part of client cert rotation. if we do that tracking unconditionally, we gain the ability to also force close connections on heartbeat failure as well. it's a big hammer (means reestablishing pod watches, etc), but so is having all your pods evicted because you didn't heartbeat.
this intentionally does minimal refactoring/extraction of the cert connection tracking transport in case we want to backport this
* first commit unconditionally sets up the connection-tracking dialer, and moves all the cert management logic inside an if-block that gets skipped if no certificate manager is provided (view with whitespace ignored to see what actually changed)
* second commit plumbs the connection-closing function to the heartbeat loop and calls it on repeated failures
follow-ups:
* consider backporting this to 1.10, 1.9, 1.8
* refactor the connection managing dialer to not be so tightly bound to the client certificate management
/sig node
/sig api-machinery
```release-note
kubelet: fix hangs in updating Node status after network interruptions/changes between the kubelet and API server
```
The kubelet uses two different locations to store certificates on
initial bootstrap and then on subsequent rotation:
* bootstrap: certDir/kubelet-client.(crt|key)
* rotation: certDir/kubelet-client-(DATE|current).pem
Bootstrap also creates an initial node.kubeconfig that points to the
certs. Unfortunately, with short rotation the node.kubeconfig then
becomes out of date because it points to the initial cert/key, not the
rotated cert key.
Alter the bootstrap code to store client certs exactly as if they would
be rotated (using the same cert Store code), and reference the PEM file
containing cert/key from node.kubeconfig, which is supported by kubectl
and other Go tooling. This ensures that the node.kubeconfig continues to
be valid past the first expiration.
Automatic merge from submit-queue (batch tested with PRs 58716, 59977, 59316, 59884, 60117). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Cap how long the kubelet waits when it has no client cert
If we go a certain amount of time without being able to create a client
cert and we have no current client cert from the store, exit. This
prevents a corrupted local copy of the cert from leaving the Kubelet in a
zombie state forever. Exiting allows a config loop outside the Kubelet
to clean up the file or the bootstrap client cert to get another client
cert.
Five minutes is a totally arbitary timeout, judged to give enough time for really slow static pods to boot.
@mikedanese
```release-note
Set an upper bound (5 minutes) on how long the Kubelet will wait before exiting when the client cert from disk is missing or invalid. This prevents the Kubelet from waiting forever without attempting to bootstrap a new client credentials.
```
If we go a certain amount of time without being able to create a client
cert and we have no current client cert from the store, exit. This
prevents a corrupted local copy of the cert from leaving the Kubelet in a
zombie state forever. Exiting allows a config loop outside the Kubelet
to clean up the file or the bootstrap client cert to get another client
cert.
Prevent a Kubelet from shutting down when the server isn't responding to
us but we cannot get a new certificate. This allows a cluster to coast
if the master is unresponsive or a node is partitioned and their client
cert expires.
The client cert manager uses the most recent cert to request new
certificates. If that certificate is expired, it will be unable to
complete new CSR requests. This commit alters the manager to force
process exit if no further client cert rotation is possible, which
is expected to trigger a restart of the kubelet and either a
re-bootstrap from the bootstrap kubeconfig or a re-read of the
current disk state (assuming that some other agent is managing the
bootstrap configuration).
This prevents the Kubelet from wedging in a state where it cannot make
API calls.
Ensures that in a crash loop state we can make forward progress by
generating a new key and hence new CSR. If we do not delete the key, an
expired CSR may block startup.
Also more aggressively delete a bad cert path
Before the bootstrap client is used, check a number of conditions that
ensure it can be safely loaded by the server. If any of those conditions
are invalid, re-bootstrap the node. This is primarily to force
bootstrapping without human intervention when a certificate is expired,
but also handles partial file corruption.
Automatic merge from submit-queue
delete ineffectual assginment
**What this PR does / why we need it**:
delete ineffectual assignment
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
Automatic merge from submit-queue (batch tested with PRs 49237, 49656, 49980, 49841, 49899)
certificate manager: close existing client conns once cert rotates
After the kubelet rotates its client cert, it will keep connections to the API server open indefinitely, causing it to use its old credentials instead of the new certs. Because the API server authenticates client certs at the time of the request, and not the handshake, this could cause the kubelet to start hitting auth failures even if it rotated its certificate to a new, valid one.
When the kubelet rotates its cert, close down existing connections to force a new TLS handshake.
Ref https://github.com/kubernetes/features/issues/266
Updates https://github.com/kubernetes-incubator/bootkube/pull/663
```release-note
After a kubelet rotates its client cert, it now closes its connections to the API server to force a handshake using the new cert. Previously, the kubelet could keep its existing connection open, even if the cert used for that connection was expired and rejected by the API server.
```
/cc @kubernetes/sig-auth-bugs
/assign @jcbsmpsn @mikedanese
After the kubelet rotates its client cert, it will keep connections
to the API server open indefinitely, causing it to use its old
credentials instead of the new certs
When the kubelet rotates its cert, close down existing connections
to force a new TLS handshake.
Automatic merge from submit-queue
Don't rerun certificate manager tests 1000 times.
**What this PR does / why we need it**:
Running every testcase 1000 times needlessly bloats the logs.
**Release note**:
```release-note
NONE
```
Changes the kubelet so it bootstraps off the cert/key specified in the
config file and uses those to request new cert/key pairs from the
Certificate Signing Request API, as well as rotating client certificates
when they approach expiration.
Automatic merge from submit-queue (batch tested with PRs 46635, 45619, 46637, 45059, 46415)
Certificate rotation for kubelet server certs.
Replaces the current kubelet server side self signed certs with certs signed by
the Certificate Request Signing API on the API server. Also renews expiring
kubelet server certs as expiration approaches.
Two Points:
1. With `--feature-gates=RotateKubeletServerCertificate=true` set, the kubelet will
request a certificate during the boot cycle and pause waiting for the request to
be satisfied.
2. In order to have the kubelet's certificate signing request auto approved,
`--insecure-experimental-approve-all-kubelet-csrs-for-group=` must be set on
the cluster controller manager. There is an improved mechanism for auto
approval [proposed](https://github.com/kubernetes/kubernetes/issues/45030).
**Release note**:
```release-note
With `--feature-gates=RotateKubeletServerCertificate=true` set, the kubelet will
request a server certificate from the API server during the boot cycle and pause
waiting for the request to be satisfied. It will continually refresh the certificate as
the certificates expiration approaches.
```
Automatic merge from submit-queue
add myself and liggitt to pkg/kubelet/certificats OWNERs
For as long a kubelet is using the internal client, this certificate
manager is bound to the kubelet. Once kubelet has moved to client-go we
plan to extract this library to be general purpose. In the meantime,
liggitt and I should handle reviews of this code.
@liggitt @timstclair
For as long a kubelet is using the internal client, this certificate
manager is bound to the kubelet. Once kubelet has moved to client-go we
plan to extract this library to be general purpose. In the meantime,
liggitt and I should handle reviews of this code.
Replaces the current kubelet server side self signed certs with certs
signed by the Certificate Request Signing API on the API server. Also
renews expiring kubelet server certs as expiration approaches.