Tile Based Rendering Best Practices

Are you optimizing Vulkan applications for mobile or tile-based GPUs? The Vulkan Guide has a dedicated section on Tile Based Rendering (TBR) best practices worth bookmarking.

Unlike traditional immediate-mode GPUs, tile-based architectures process the framebuffer in small screen regions, keeping work on fast on-chip memory before writing results to main system memory. For Vulkan developers, this means memory bandwidth is often the dominant performance factor.

Key takeaways from the guide:

Use load/store ops intentionally. Render pass attachment configuration is your primary tool for controlling bandwidth. Setting the right loadOp and storeOp values tells the driver whether to clear, load, or discard data, directly impacting whether data has to travel off-chip.

Keep depth and stencil transient. If depth and stencil buffers are not needed after a render pass, mark them with LOAD_OP_CLEAR and STORE_OP_DONT_CARE. This allows the driver to keep them entirely on-chip and avoid the cost of writing them back to external memory.

Favor compact pixel formats. On-chip tile memory is fixed in size. Smaller bit-depth formats allow the hardware to fit more data per tile, reducing spills to external memory and improving efficiency.

Optimize for the binning pass. Tilers process geometry twice: once to bin triangles into tiles, then again to shade pixels. Separating vertex positions from other attributes like UVs and normals lets the GPU read only what it needs during binning, cutting unnecessary bandwidth.

Provide clear intent to the driver. Because Vulkan abstracts hardware details like tile size, the best way to optimize is to use correct render pass configurations and memory flags so the driver can make informed decisions on your behalf.

The full guide covers these topics in depth, including guidance on MSAA, transient attachments, and dynamic rendering: