Because layout transitions wouldn’t make any sense otherwise.
Think of layouts like encryption schemes for data. A layout transition requires decrypting and re-encrypting the data. If you are handed a blob of bits, you don’t know which encryption scheme was used to make it, so you cannot decrypt it.
If you don’t want to deal with layout transitions, that means you don’t want to deal with layouts. So just use general.
Welcome to Vulkan; enjoy your stay!
No, really, the entire point of an explicit, low-level API is to redefine a bunch of things that were previously declared to be “the driver’s job” as “your job”. Command buffers, direct memory allocation, etc all used to be “the driver’s job”. They’re not anymore.
Layouts are just one more thing.
The fundamental problem with layout tracking in the driver is this: the driver simply cannot have the information in all cases. This is endemic to a command-buffer style API.
If I build a CB that uses an image, the code building that CB has no idea what layout the image was in prior to that. That information would be in a different command buffer which put it into that layout beforehand. But “beforehand” is defined at submission time, not at CB creation time.
It is 100% OK to create a CB that uses an image in one layout, then creates a CB that transitions that image to that layout, so long as you submit them in the proper order.
Which means that any layout-related logic that any particular CB command might need would have to be deferred until submission time. Which means submission time would be even more costly than it already is.
And that assumes that the change happened on that queue. Which it doesn’t have to. And with timeline semaphores, it’s even possible that the command that changes the layout has not been submitted to any queue yet at the time the CB that needs to access that image gets submitted.
What you’re asking for can only reasonably be done for hardware that effectively stores the layout as part of the image’s data, such that all operations that act on the image use that internal layout to figure out what’s going on. This would have to be done by the GPU itself, with no intervention by user-land code or CB generation or anything of that sort. NVIDIA does that in most of their hardware, but other driver makers don’t.
And so long as this is the case, user-land code must track the layout themselves.
This is also why Direct3D11’s attempt at deferred contexts were never implemented on non-NVIDIA hardware. Image layouts were not part of the API, but user-land code would have had to provide that information to make deferred contexts work. So drivers for such hardware didn’t implement them.
Any library that manages a Vulkan image and hands it to you with the expectation that you will use it ought to tell you what layout it left that image in (and you ought to tell it what layout it starts in when you tell it to do stuff with the image). If the API doesn’t do that (and I doubt OpenXR’s Vulkan interface has no way to get this information), then that’s a problem with the API.