consul/test
R.B. Boyer 20caa4f744
test: for envoy integration tests, wait until 's2' is healthy in consul before interrogating envoy (#6108)
When the envoy healthy panic threshold was explicitly disabled as part
of L7 traffic management it changed how envoy decided to load balance to
endpoints in a cluster. This only matters when envoy is in "panic mode"
aka "when you have a bunch of unhealthy endpoints". Panic mode sends
traffic to unhealthy instances in certain circumstances.

Note: Prior to explicitly disabling the healthy panic threshold, the
default value is 50%.

What was happening is that the test harness was bringing up consul the
sidecars, and the service instances all at once and sometimes the
proxies wouldn't have time to be checked by consul to be labeled as
'passing' in the catalog before a round of EDS happened.

The xDS server in consul effectively queries /v1/health/connect/s2 and
gets 1 result, but that one result has a 'critical' check so the xDS
server sends back that endpoint labeled as UNHEALTHY.

Envoy sees that 100% of the endpoints in the cluster are unhealthy and
would enter panic mode and still send traffic to s2. This is why the
test suites PRIOR to disabling the healthy panic threshold worked. They
were _incorrectly_ passing.

When the healthy panic threshol is disabled, envoy never enters panic
mode in this situation and thus the cluster has zero healthy endpoints
so load balancing goes nowhere and the tests fail.

Why does this only affect the test suites for envoy 1.8.0? My guess is
that https://github.com/envoyproxy/envoy/pull/4442 was merged into the
1.9.x series and somehow that plays a role.

This PR modifies the bats scripts to explicitly wait until the upstream
sidecar is healthy as measured by /v1/health/connect/s2?passing BEFORE
trying to interrogate envoy which should make the tests less racy.
2019-07-10 15:58:25 -05:00
..
bin test: log exit code in cluster.bash 2017-06-08 14:06:10 +02:00
ca Update test certificates that expire this year to be way in the future 2018-05-12 10:15:45 +01:00
ca_path Add tls client options to api/cli 2017-04-14 13:37:29 -07:00
client_certs Adds enable_agent_tls_for_checks configuration option which allows (#3661) 2017-11-07 18:22:09 -08:00
command/merge Add utility types to enable checking for unset flags 2017-02-07 20:14:41 -05:00
hostname Update test certificates that expire this year to be way in the future 2018-05-12 10:15:45 +01:00
integration/connect/envoy test: for envoy integration tests, wait until 's2' is healthy in consul before interrogating envoy (#6108) 2019-07-10 15:58:25 -05:00
key tls: auto_encrypt enables automatic RPC cert provisioning for consul clients (#5597) 2019-06-27 22:22:07 +02:00
snapshot snapshot: read meta.json correctly. (#5193) 2019-01-08 17:06:28 +01:00
notes.txt Adding testing certificates 2014-04-07 15:07:00 -07:00