Drawing outside of clip space vs. updating VBO every frame

Let’s say I have a simple 2D scene with a certain amount of sprites being drawn from the same texture atlas (for example, a tile map). The game world itself can be big (possibly with large number of such sprites), but at any given moment only tiny portion of it is visible.

I can see two somewhat efficient ways to render such scene:

1. Use VBO for the whole scene

The easiest way would be to create one VBO for all sprites and render them in a single draw call (since they use same texture/shaders/blending etc.). With this approach, VBO only needs to be updated when sprites are added/removed or modified. But the disadvantage is that vast majority of sprites will be outside of clip space after vertex shader, so part of the GPU work will be wasted.

2. Update VBO each frame to only contain visible sprites

Alternatively, for each frame I could check which sprites are visible, and then update VBO to only contain these sprites with something like glMapBufferRange(…). Thus, only visible sprites will be rendered (no wasted GPU work). But obviously this would require some work on CPU and VBO updates every frame.

I am trying to figure out in general, when approach #1 is more efficient and when #2. For example, at what conditions extra work in the approach #2 can be justified in comparison to #1? Does drawing primitives outside of clip space trashes GPU cache? Would the answers be different for mobile (OpenGL ES) platforms?

Sorry for the vague question. I understand that definitive answer can only be given for a specific platform after profiling. But I am just trying to understand some general performance considerations and make an educated guess.

For a tile map that’s much larger than the screen, the first approach I would try is to split it into rectangular chunks, only rendering the chunks which intersect the viewport. Larger chunks require fewer draw calls, smaller chunks will result in fewer out-of-viewport triangles.

I wouldn’t try anything more involved unless you have actual measurements which indicate that out-of-viewport triangles have a non-negligible cost. Tight coupling between CPU and GPU (i.e. using data to render one frame, modifying it, using it to render the next frame, etc) can have a far greater effect upon performance than sending too much data to the GPU, particularly if the excess data is discarded at an early stage.

Thanks for the response.

Do you have any idea about when out of clip space drawing cost can be even measurable? Are we talking about 10k triangles, 100k, 1m etc.?

It depends upon the hardware, the resolution, how much work is being done in the vertex shader compared to the fragment shader, and probably some other things.

If you want a better answer than that, you’ll need to measure it. Tweak the numbers so that you end up sending more vertices each frame, and measure the ratio of the change in render time to the change in vertex count.

In was more concerned with wasting GPU cache by having giant VBO with most of it contents being “invisible”. But you probably right, it’s hard to say without measurement.
I guess I’ll keep it simple until optimization is needed.