GPU vertex dispatch for MultiDrawIndirect and/or Instanced draw calls

For others that hit this MDI “small draws” inefficiency and find themselves researching, I thought I’d add some other references to it that I’ve found, along with some past solutions.

It’s been out there for quite a few years, and I just hadn’t tuned into it before.

2017-11:

2016-03:

https://frostbite-wp-prd.s3.amazonaws.com/wp-content/uploads/2016/03/29204330/GDC_2016_Compute.pdf

2016-02:

2015-08: