Analogies are dangerous things. Allow me to demonstrate where yours fails with another CPU-to-GPU analogy (only one that works).
A single core of a CPU is not a single, monolithic processor. It is a pipelined, multi-processor computational device. Each opcode executes in multiple stages within the pipeline. Different opcodes can take different paths through the pipeline, depending on what the opcode is.
What this ultimately means is that multiple opcodes can be in the middle of being processed at the same time. Since there are different pathways through the opcode processing pipeline, you could have quite a few different opcodes going at once. So even on a device that is “single threaded” at the assembly level, it is hardly “single threaded” at the level of the processing system.
However, most programs (as well as the C and C++ standards before 2011) were single-threaded. Therefore, the CPU could not expose programs to this internal multi-processing.
Therefore, CPUs developed a number of systems to ensure that dependent opcodes don’t encounter problems. They turn a multi-threaded system into a single-threaded one. If one opcode depends on the results of another, that the opcode doesn’t proceed too far into the pipeline until the dependent opcode is finished and its results are available.
Many CPUs also have systems in place to execute instructions out-of-order; that way, if one opcode is stalled due to dependencies, it doesn’t stall the pipeline. Now, the out-of-order thing sounds a bit silly; why not just generate the assembly in the right order to begin with? Well, one reason would be that people are throwing pre-compiled binaries around that were compiled 25 years ago for ancient 286 machines. If you want to ensure that these programs don’t perform horribly on your modern CPU, out-of-order execution is a good idea. There are other reasons too, but that one is critically important for this analogy. Why?
Because GPUs have a very similar architecture. They have long processing pipelines. “Opcodes” (aka: rendering calls) sometimes depend on others, and they must not be executed until the dependent “opcode” is out of the pipeline. And so forth.
But there’s one big difference between a CPU and a GPU. Remember all those systems I talked about for out-of-order execution? Or the system that stops one opcode from executing while a dependent one is in the pipe?
None of that exists for GPUs!
For reasons that are ultimately irrelevant, GPUs can’t really do that. They are far more parallel internally than CPUs, but GPUs have none of the automatic means to prevent rendering commands from stepping on each other. Who’s responsibility is that?
It’s the graphics driver, the equivalent of the “compiler” for CPU assembly. The driver’s the one who has to issue synchronization to stop dependent rendering commands from being in flight at the same time. The driver has the responsibility to clear caches, to make data properly visible to incoming rendering commands. And so forth.
It is very possible for two rendering commands to be in the graphics pipeline at the same time. This is why new draw calls do not induce a pipeline stall. Depending on the hardware, it’s theoretically possible for different commands to be rendering to different framebuffers at the same time. It’s all about how scheduling happens within the GPU.
Scheduling that, I remind you, is completely blind to dependencies.
Therefore, your premise that single command buffer operations somehow don’t need synchronization primitives is complete bunk. They don’t in OpenGL, but only because the OpenGL implementation bends over backwards to ensure that. Vulkan does not. Vulkan is as thin a wrapper around the graphics system as possible. That’s why it exists.
In fact, Sellers was quite adamant that what you claim is exactly what would not happen. And I quote: “We’re not going to track the state of a resource. It’s up to you that, when you’re rendering to a texture, and you want to go read from it, you have to tell the driver, ‘I’m done rendering to this; now make it readable.’ And then the driver will do the work right there to make the texture readable. If you get it wrong, we will render garbage, or crash.”
That doesn’t sound like the system checking “which buffers are being used in the current queue and the submitted commands by checking the active commandbuffer’s read/write needs and insert barriers as needed to ensure coherency”.
So no, the graphics queue has no idea what command buffer A or B or C does. All it knows is that they execute some commands. If there are synchronization or coherency issues, it’s up to you to detect them and compensate for them.
Even ignoring the fact that a lot of GPUs today have multiple rasterizing units (and therefore, Vulkan would not be designed to exclude their capabilities), that’s still untrue. There’s no reason why you couldn’t have two sets of triangles that have been rasterized and both have fragments in the fragment processing pipeline. Or both have ROPs active.
It would depend on the hardware, but there’s nothing conceptually preventing it for some GPUs. And thus, there’s no reason for Vulkan to forbid it.
Especially when OpenGL does not.