Support multi outputs chunk search. Previously we only support single output chunk search. It is more flexible and improve performance by a large margin. For transformer, we reduce memory by 40% than previous search strategy.
1. rewrite search strategy to support multi outputs chunk search
2. fix many, many bugs
3. update tests