k3s/plugin
Kubernetes Submit Queue bc7ccfe93b Merge pull request #50106 from julia-stripe/improve-scheduler-error-handling
Automatic merge from submit-queue

Retry scheduling pods after errors more consistently in scheduler

**What this PR does / why we need it**:

This fixes 2 places in the scheduler where pods can get stuck in Pending forever.  In both these places, errors happen and `sched.config.Error` is not called afterwards. This is a problem because `sched.config.Error` is responsible for requeuing pods to retry scheduling when there are issues (see [here](2540b333b2/plugin/pkg/scheduler/factory/factory.go (L958))), so if we don't call `sched.config.Error` then the pod will never get scheduled (unless the scheduler is restarted).

One of these (where it returns when `ForgetPod` fails instead of continuing and reporting an error) is a regression from [this refactor](https://github.com/kubernetes/kubernetes/commit/ecb962e6585#diff-67f2b61521299ca8d8687b0933bbfb19L234), and with the [old behavior](80f26fa8a8/plugin/pkg/scheduler/scheduler.go (L233-L237)) the error was reported correctly. As far as I can tell changing the error handling in that refactor wasn't intentional.

When AssumePod fails there's never been an error reported but I think adding this will help the scheduler recover when something goes wrong instead of letting pods possibly never get scheduled.

This will help prevent issues like https://github.com/kubernetes/kubernetes/issues/49314 in the future.

**Release note**:

```release-note
Fix incorrect retry logic in scheduler
```
2017-08-07 01:35:17 -07:00
..
cmd/kube-scheduler Merge pull request #47408 from shiywang/follow-go-code-style 2017-08-05 03:22:54 -07:00
pkg Merge pull request #50106 from julia-stripe/improve-scheduler-error-handling 2017-08-07 01:35:17 -07:00
BUILD Simply changed the names of packages of some admission plugins. 2017-06-05 22:23:42 +02:00
OWNERS Updated top level owners file to match new format 2017-01-19 11:29:16 -08:00