Layout transitions always apply to a particular image subresource range
furthermore, it says:
Layout transitions that are performed via image memory barriers execute in their entirety in
submission order, relative to other image layout transitions submitted to the same queue, including
those performed by render passes. In effect there is an implicit execution dependency from each
such layout transition to all layout transitions previously submitted to the same queue.
Does this mean that this dependency to other image layout transitions submitted to the same queue is established per particular image subresource range?
I.e., does this also mean that the driver/GPU can schedule other work in parallel which does not depend on a particular other image subresource range? Only for a particular image subresource range, a dependency chain in terms of their layout transitions is established, right?!
I’m not sure I understand the question. A layout transition operates on a subresource range of an image. There is implicit synchronizations between layout transitions with regard to submission order of barriers.
I don’t know what “other work” would be involved here. Since a layout transition is a modifying operation that happens as part of some kind of dependency, subsequent commands that want to access those parts of the image have to be included in the destination part of the barrier by stage and mode of access.
Yes, commands that aren’t affected by the layout transition because they don’t access the parts of the image that were transitioned don’t need such synchronization. But you knew that from the first sentence you quoted. Stuff that isn’t affected by a layout transition doesn’t have to care about the effects of a layout transition.
I guess I was just looking for some reassurance that all dependency chains that are established through layout transitions are always established on a per-subresource basis.
I first read the specification’s paragraph about “submission order, relative to other image layout transitions” and was afraid that there might be some queue-global dependency chain between all layout transitions. But that wouldn’t make a lot of sense, I guess.
From a practical point of view, my actual question is if a GPU can parallelize work as efficiently within one queue (given that all barriers/transitions are specified as narrow as possible w.r.t. subresource ranges) as it could when using multiple queues for parallel workloads (under the assumption that different work packages are independent of each other in both scenarios).
That is essentially what it’s saying. But there has to be.
Layout transitions modify images. But this access doesn’t happen within a “stage”; it happens outside of the stage system. As such, if these operations were not implicitly ordered, there would be no way for two layout transitions to modify the same subresource ever.
Also, layout transitions (usually) don’t happen without some kind of execution barrier. So you’ve already put some kind of synchronization into the renderer in order to transition the subresource layouts to begin with. So if you’re doing a layout transition, that means that some subsequent command/stage/access mode is going to use that image with its new layout. So the barrier has to provide execution and visibility to that subsequent command/stage/access mode for those subresources.
Since layout transitions modify data, that barrier also must include any other layout transitions that previously happened to those subresources. So such subsequent operations are inherently dependent on them.
This is the wrong question to be asking, especially in as general of a way as you state it here.
Barriers are something you explicitly ask for, so presumably they happen because you need them to happen. So if you have some barrier between two sets of commands, and you’re considering putting the two sets of commands in two different queues, you still need synchronization between them. Only instead of a lightweight event or pipeline barrier, it now much be a heavyweight semaphore. Which also means you need to split your submission into different batches, since you can’t wait for semaphores in the middle of a command buffer.