Tests for scaling down based on external metric are flaky. I think this
is because they:
- Start with 2 replicas,
- Export metric value == 1/2 target,
- Expect scale down to 1.
Since the expected recommendation is exactly 1 it might flake (and with
scale down stabilization any recommendations higher than 1 will
persist).
Change expected value of the metric so recommended size will be lower
than 1. This should make those tests less flaky.
- Scale down based on custom metric was flaking. Increase target value
of the metric.
- Scale down based on CPU was flaking during stabilization. Increase
tolerance of stabilization (caused by resource consumer using more CPU
than requested).
Rename the file to note that these are Stackdriver tests in anticipation of tests with fake custom metrics apiserver. Refactor the tests to be more structured.