Help to understand command buffer.

First question.
If I created three command buffers , all of them will be live in GPU or CPU ? Or when vkQueueSubmit the first command buffer copy from cpu to gpu and then run, then second command buffer copy from cpu to gpu and run. And how faster work vkQueueSubmit ?

How do fast render ? Draw all objects in one command buffer or few command buffers (i mean for example tree in cmdbuf1 and rocks in cmdbuf2)?

all of them will be live in GPU or CPU ?

What do you mean by “live in GPU or CPU”? Are you talking about memory storage? Because that’s up to the Vulkan driver.

Or when vkQueueSubmit the first command buffer copy from cpu to gpu and then run, then second command buffer copy from cpu to gpu and run.

Command buffers don’t do anything until you submit them to a queue. Whether submission does CPU-to-GPU copy is again, implementation defined and essentially irrelevant.

And how faster work vkQueueSubmit ?

From what we’ve been told, it’s not a particularly fast function, so you should minimize the number of calls to it per-frame.

How do fast render ? Draw all objects in one command buffer or few command buffers (i mean for example tree in cmdbuf1 and rocks in cmdbuf2)?

At present, there are no real guidelines. Having a command buffer per object is not likely to be a good idea; way too much overhead per submission. But at the same time, having a single command buffer for everything makes threading unworkable (since you can’t add commands to a single command buffer from multiple threads). There needs to be some kind of reasonable balance, based on your threading scenarios and object hierarchy.

I thought storage command buffers in GPU it is philosophy of Vulkan API.
For fast execution. And low CPU overload.
If Vulkan API copy static command buffer allays from CPU to GPU for execution, come up question why need Vulkan if there are other API , like Directx11 or OpenGL ?
If i created one time static command buffer and never changed , it should be in GPU.

I thought storage command buffers in GPU it is philosophy of Vulkan API.
For fast execution. And low CPU overload.

What does the location of the command buffer’s data have to do with low CPU overhead? If the most efficient way to deal with command buffer data for a particular piece of hardware is to allocate it in host memory, then that’s what the driver will do. If it’s more efficient to allocate it in GPU member, then that’s what the driver will do.

Stop trying to micro-optimize. Or at least, stop trying to micro-optimize the abstraction. Let the driver do what little work we still allow it.

If Vulkan API copy static command buffer allays from CPU to GPU for execution, come up question why need Vulkan if there are other API , like Directx11 or OpenGL ?

… If you honestly believe that the only point of having command buffers is to “allay from CPU to GPU for execution”, you have a dramatically incorrect understanding of what Vulkan is for. And it would take longer than I care to in order to explain all of the reasons for Vulkan’s existence.

Do some Googling on the subject.

If i created one time static command buffer and never changed , it should be in GPU.

If you created a “one time static command buffer”, then you told Vulkan it was going to be a “one time static command buffer”, yes? You didn’t use the transient bit, and you didn’t use the resetting bit. So you did your job: you told Vulkan what you were going to do.

Again, let the driver do its job.

Sorry.
I mean always from CPU to GPU for execution.
Not allays :doh:

Same difference. The copying of commands from CPU memory to GPU memory has never been one of the problems that Vulkan was trying to solve. With regard to commands, the problems Vulkan resolves are:

1: Command validation (as in, Vulkan doesn’t bother).

2: Using multiple CPU cores to generate commands.

3: GPU/CPU synchronization (as in, Vulkan doesn’t do any unless you explicitly tell it to).

So whether a command buffer stores its commands in GPU memory or CPU memory is just not important. This is a detail that is left to the implementation. It will decide where the most efficient place is for such commands.

There is a difference between primary and secondary command-buffers

In this sample

I have various command-buffer usage scenarios

  • vk re-use cmd The entire scene is encoded in a single big command-buffer, and re-used every frame.
  • vk re-use obj-level cmd Every object in the scene has its own small secondary command-buffer. This means less optimal state transitions, as each command-buffer must be self contained, there is no state inheritance (other than the rendertarget being used). At render time all the secondaries are referenced by a primary command-buffer that is built per-frame. Given the very few per-object commands, this serves rather as experiment and is not a recommended rendering method.
  • vk MT cmd worker process Each thread has FRAMES many CommandBufferPools, which are cycled through. At the beginning the pool is reset and command-buffers are generated from it in chunks. Using another pool every frame avoids the use of fences. USE_THREADED_SECONDARIES define controls whether the threaded command-buffers are secondaries (default).

Basically the answer is “it will depend” :wink:
However because there is no state inheritance across command-buffers, it means the state (viewport, bindings…) must be re-specified at the beginning of every command-buffer. That leaves less optimization potential, and increases sending redundant commands to the GPU, which is why many tiny command-buffers doing little work, will cause bottle-necks.

Ideally you can re-use your command-buffers (at least secondary ones) a good deal, and ideally each one does a good deal of work (number of triangles, compute threads…).