mirror of https://github.com/k3s-io/k3s
![]() Automatic merge from submit-queue (batch tested with PRs 61284, 61119, 61201). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Prevent garbage collector from attempting to sync with 0 resources **What this PR does / why we need it**: As of #55259 we enabled garbagecollector.GetDeletableResources to return partial discovery results (including an empty set of discovery results). This had the unintended consequence of allowing the garbage collector to enter a blocked state that can only be fixed by restarting. From [this comment](https://github.com/kubernetes/kubernetes/issues/60037#issuecomment-372801088): > 1. The Sync function periodically calls GetDeletableResources > > 2. According to the comment above GetDeletableResources, All discovery errors are considered temporary. Upon encountering any error, GetDeletableResources will log and return any discovered resources it was able to process (which may be none)., an error in discovery causes the discovery client to no longer discover resources in the cluster, but instead of failing and returning an error, it simply logs the error as garbagecollector.go:601] failed to discover preferred resources: %vthe server was unable to return a response in the time allotted, but may still be processing the request and returns an empty list of resources > > 3. The Sync function, upon recieving an empty resource list from discovery, detects that the resources have changed, and calls resyncMonitors, which calls dependencyGraphBuilder.syncMonitors with map[] as the argument as shown in the log as garbagecollector.go:189] syncing garbage collector with updated resources from discovery: map[], which sets the list of monitors to an empty list because it thinks there are no resources to monitor. > > 4. Lastly the Sync function calls controller.WaitForCacheSync, which calls cache.WaitForCacheSync, which will continually retry the garbagecollector.IsSynced function until it returns true, but it will always return false because len(gb.monitors) is 0. This PR prevents that specific race condition from arising. **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes #60037 **Release note**: ```release-note Fix bug allowing garbage collector to enter a broken state that could only be fixed by restarting the controller-manager. ``` |
||
---|---|---|
.. | ||
bootstrap | ||
certificates | ||
cloud | ||
clusterroleaggregation | ||
cronjob | ||
daemon | ||
deployment | ||
disruption | ||
endpoint | ||
garbagecollector | ||
history | ||
job | ||
namespace | ||
nodeipam | ||
nodelifecycle | ||
podautoscaler | ||
podgc | ||
replicaset | ||
replication | ||
resourcequota | ||
route | ||
service | ||
serviceaccount | ||
statefulset | ||
testutil | ||
ttl | ||
util/node | ||
volume | ||
.import-restrictions | ||
BUILD | ||
OWNERS | ||
client_builder.go | ||
controller_ref_manager.go | ||
controller_ref_manager_test.go | ||
controller_utils.go | ||
controller_utils_test.go | ||
doc.go | ||
lookup_cache.go |