mirror of https://github.com/hashicorp/consul
0ac8ae6c3b
* Fix xDS deadlock due to syncLoop termination. This fixes an issue where agentless xDS streams can deadlock permanently until a server is restarted. When this issue occurs, no new proxies are able to successfully connect to the server. Effectively, the trigger for this deadlock stems from the following return statement: https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L199-L202 When this happens, the entire `syncLoop()` terminates and stops consuming from the following channel: https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L182-L192 Which results in the `ConfigSource.cleanup()` function never receiving a response and holding a mutex indefinitely: https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L241-L247 Because this mutex is shared, it effectively deadlocks the server's ability to process new xDS streams. ---- The fix to this issue involves removing the `chan chan struct{}` used like an RPC-over-channels pattern and replacing it with two distinct channels: + `stopSyncLoopCh` - indicates that the `syncLoop()` should terminate soon. + `syncLoopDoneCh` - indicates that the `syncLoop()` has terminated. Splitting these two concepts out and deferring a `close(syncLoopDoneCh)` in the `syncLoop()` function ensures that the deadlock above should no longer occur. We also now evict xDS connections of all proxies for the corresponding `syncLoop()` whenever it encounters an irrecoverable error. This is done by hoisting the new `syncLoopDoneCh` upwards so that it's visible to the xDS delta processing. Prior to this fix, the behavior was to simply orphan them so they would never receive catalog-registration or service-defaults updates. * Add changelog. |
||
---|---|---|
.. | ||
accesslogs | ||
config | ||
configfetcher | ||
extensionruntime | ||
naming | ||
platform | ||
proxystateconverter | ||
response | ||
testcommon | ||
testdata | ||
validateupstream-test | ||
clusters.go | ||
clusters_test.go | ||
delta.go | ||
delta_envoy_extender_ce_test.go | ||
delta_envoy_extender_test.go | ||
delta_test.go | ||
endpoints.go | ||
endpoints_test.go | ||
failover_policy.go | ||
failover_policy_ce.go | ||
golden_test.go | ||
gw_per_route_filters_ce.go | ||
jwt_authn.go | ||
jwt_authn_ce.go | ||
jwt_authn_test.go | ||
listeners.go | ||
listeners_apigateway.go | ||
listeners_ingress.go | ||
listeners_test.go | ||
locality_policy.go | ||
locality_policy_ce.go | ||
protocol_trace.go | ||
rbac.go | ||
rbac_test.go | ||
resources.go | ||
resources_ce_test.go | ||
resources_test.go | ||
routes.go | ||
routes_test.go | ||
secrets.go | ||
server.go | ||
server_ce.go | ||
testing.go | ||
xds.go | ||
xds_protocol_helpers_test.go | ||
z_xds_packages.go | ||
z_xds_packages_test.go |