* Add --orderseed, shuffle order every time, report order for repeatability
* Add --times, acts like a multi-deck shoe
* Remove fixed numbering in TAP output (this is actually not needed;
TAP output is just done by outputting what assertion count you're on.)
This is essentially just a port of f3a992aa and 369064c6 (minus
reporting, which can be handled later when we make TAP, etc, better).
This syntax is akin to what Python unittest uses for running a subset of the tests.
If a test gets skipped, log it. If an invalid test test is passed to --test, warn about it.