mirror of https://github.com/hashicorp/consul
0ac8ae6c3b
* Fix xDS deadlock due to syncLoop termination. This fixes an issue where agentless xDS streams can deadlock permanently until a server is restarted. When this issue occurs, no new proxies are able to successfully connect to the server. Effectively, the trigger for this deadlock stems from the following return statement: https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L199-L202 When this happens, the entire `syncLoop()` terminates and stops consuming from the following channel: https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L182-L192 Which results in the `ConfigSource.cleanup()` function never receiving a response and holding a mutex indefinitely: https://github.com/hashicorp/consul/blob/v1.18.0/agent/proxycfg-sources/catalog/config_source.go#L241-L247 Because this mutex is shared, it effectively deadlocks the server's ability to process new xDS streams. ---- The fix to this issue involves removing the `chan chan struct{}` used like an RPC-over-channels pattern and replacing it with two distinct channels: + `stopSyncLoopCh` - indicates that the `syncLoop()` should terminate soon. + `syncLoopDoneCh` - indicates that the `syncLoop()` has terminated. Splitting these two concepts out and deferring a `close(syncLoopDoneCh)` in the `syncLoop()` function ensures that the deadlock above should no longer occur. We also now evict xDS connections of all proxies for the corresponding `syncLoop()` whenever it encounters an irrecoverable error. This is done by hoisting the new `syncLoopDoneCh` upwards so that it's visible to the xDS delta processing. Prior to this fix, the behavior was to simply orphan them so they would never receive catalog-registration or service-defaults updates. * Add changelog. |
||
---|---|---|
.. | ||
internal/watch | ||
api_gateway.go | ||
api_gateway_ce.go | ||
config_snapshot_glue.go | ||
config_snapshot_glue_test.go | ||
connect_proxy.go | ||
data_sources.go | ||
data_sources_ce.go | ||
deep-copy.sh | ||
ingress_gateway.go | ||
manager.go | ||
manager_test.go | ||
mesh_gateway.go | ||
mesh_gateway_ce.go | ||
naming.go | ||
naming_ce.go | ||
naming_test.go | ||
proxycfg.deepcopy.go | ||
proxycfg.go | ||
snapshot.go | ||
snapshot_test.go | ||
state.go | ||
state_ce_test.go | ||
state_test.go | ||
terminating_gateway.go | ||
testing.go | ||
testing_api_gateway.go | ||
testing_ce.go | ||
testing_connect_proxy.go | ||
testing_ingress_gateway.go | ||
testing_mesh_gateway.go | ||
testing_peering.go | ||
testing_terminating_gateway.go | ||
testing_tproxy.go | ||
testing_upstreams.go | ||
testing_upstreams_ce.go | ||
upstreams.go |