Let us assume that NVidia finally implements a working out-of-order queue, then the next step could be data transfer bandwidth specification.
Why would you need this? Say you have an iterative process, which consist for a large part of first uploading the data to GPU (only transfer), and then processing it (99% computation, 1% transfer). If you have an out of order command queue, you could start transfer for the next iteration at some point during the computationally intensive part. However, if small chunks of data transfer need to occur during this time, you need either to have reserved bandwidth, or a priority-based data transfer system.
That’s pretty much my situation now. Still waiting on NVidia to do out-of-order right though…