In another thread, the idea of space between the explicit low-level Vulkan and the user was discussed. I came to realize that there are some things which could be useful, that could form the basis of a utility library that could handle certain things.
The goal here is not to avoid the explicit nature of the API. It’s merely to deal with common patterns that most users of Vulkan encounter.
Here are some ideas I had:
Memory managers. These would be for managing allocations within an allocation from a Vulkan heap. Among them would be:
** A general allocator: handles allocations/deallocations of any size.
** A pool allocator: all allocations are of a fixed size defined at runtime (great for texture pools).
** An ever-increasing allocator: this would be for “streamed” kinds of data, like vertex positions for GUIs, skinning matrices, etc. You never deallocate memory directly; you just flush the whole thing at the end of frame.
Memory type selector. Vulkan has lots of memory heap/type flags, and different implementations support different sets of flags. It would be useful to have a way to select which memory type to allocate from based on certain common usage patterns for that memory.
Build a descriptor pool by passing a descriptor set layout and the number of those sets you want to allocate. The helper transforms the set layout into the pool creation info.
Keyword “Arena”: objects created in the Arena must all die together (like a gladiator arena…get it? O.o). Clean up the entire arena in O(1) time by just deallocating the whole thing at once.
I’ve done a lot of research at Uni with memory allocator performance, parallel processing and data centres and such.
For small object creation, grouping the objects using a memory allocator can be very helpful because it keeps like objects together for cache locality. Also google “O(1) memory allocator.” It’s O(log n) when you #define DEBUG, but it really is O(1) at runtime. That reduces the need for a memory pool that only lets you allocate one size of object. It might make your code easier to understand if you group allocations into an allocator per code module. For example, one allocator (all objects allocated out of a shared space) for textures, one for models, and one for skeletal matrices. Those code functions are in separate code modules, so you can separate their allocators very naturally.
For large blocks of memory – textures are a common example – pre-allocation is the fastest. Compute the dimensions of texture memory only once, then allocate the texture memory only once, then have a custom “rent texture space that I need” function. Id Rage streams textures from disk using this method. The GPU doesn’t know which pixels in the texture memory are “allocated” and which are “free,” which is very fast. But if your texture space gets too fragmented then you can fail to allocate space for a large texture.
Other examples of how large blocks should be pre-allocated: allocate the memory blocks for each CPU only once and bind the CPU to those blocks so the scheduler doesn’t shuffle it around to other work. This reduces cache misses, even if that memory is only used to prepare buffers which are then copied to the GPU.
The GPU has caches as well, but binding blocks is less easy to do on the GPU.
When the space gets too fragmented you can also start relocating textures using copies to coalesce the free space.
We can do that with gpu memory because the allocator can know what parts of memory will be used when and make copies as needed. (if the image is VK_SHARING_MODE_CONCURRENT then you can even do the copy while the source image is being used by a render).
[QUOTE=ratchet freak;40882]When the space gets too fragmented you can also start relocating textures using copies to coalesce the free space.
We can do that with gpu memory because the allocator can know what parts of memory will be used when and make copies as needed. (if the image is VK_SHARING_MODE_CONCURRENT then you can even do the copy while the source image is being used by a render).[/QUOTE]
That’s a very unpleasant prospect, since now you have to rebuild your VkImages. Which means you have to update some descriptors that you might not otherwise have had to update. It’s certainly not something that can happen without user intervention.
unless it’s done with sparse binding or residency.
You use issue a copy from the resource to the new location (using a temp resource) and then use semaphores to sync the vkQueueBindSparse with the copy and the other accesses. Though indeed having the user code initiate the relocation would be preferable.