I am trying to plot scientific data using OpenGL. Typical data vectors size is in the range of 5000-16000.
For the first draft, I used the VTF technique described in this tutorial (https://en.wikibooks.org/wiki/OpenGL_Programming/Scientific_OpenGL_Tutorial_02). This works really well, it allows me to do analysis within the vertex shader itself like keeping extreme values when the graph is displayed (required – for example, if you display the graph in a 200px window, you do not want to let OpenGL decide if it shall discard the data point or not – more details below). What I really like here is that this processing is done on the GPU instead of the CPU.
Now I am porting this test program to OpenGL ES. Unfortunately, this technique won’t work at all because GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS is 0 on the embedded platform (Tegra). Currently, I see 2 alternatives:
[li]Do pre-processing with the CPU in order to keep only the significant data before uploading XY coordinates to the GPU (like tutorial 1: https://en.wikibooks.org/wiki/OpenGL_Programming/Scientific_OpenGL_Tutorial_01)
[/li][li]Upload ALL XY coordinates (like tutorial 01 above) to the GPU – but then, how can I make sure that the rasterizer won’t skip my data point?
Typical use case: Let’s say you have data 5000 points. All of them have 0 as value, except one in the middle which have 1. When displayed in a 200px window, a vertical line in the middle of the graph shall always be visible.
Thanks for your input!
Find about a technique, called R2VB (render to vertex buffer).
Basically, draw a fullscreen quad to a FBO texture (RGBA32F usually). If your vertices are 10000, a 100x100 texture will suffice. Use the pixel-shader to do the sampling you needed for VTF, output as colour.
Then, use PBO (pixel buffer object) to copy data from the RGBA32F texture onto a VBO. Bind that VBO as an attribute, do your drawcall (which originally would be doing VTF).
Btw, instead of a fullscreen quad, sometimes you’ll have to render 10000 points, with their position being dependent on gl_VertexID, and send the VTF texcoord as a varying to the fragment shader.
Just a quick thing. There is no gl_VertexID in GLES, so OP may have to work a little harder there.
Yeah, fortunately around R2VB that feature didn’t exist yet, so he’ll find tutes that can be directly ported to ES2.
Also… there are no pixel buffer objects in core OpenGL ES2… if one is willing to use SGX, that GPU does have vertex texture fetch…
Damn. No way to copy pixels to VBs :S ? (vram-vram copy, that is)
… for many embedded gizmos, “VRAM” is “RAM”… i.e. fully unified memory model, which makes lots of parts of GLES2 quite embarrassing
Thanks Ilian, seems to work!
This is what I am doing right now:
Shader 1: vertex is a simple 128x128px square displaying the data encoded in a texture. In the fragment, I do the analysis and output the analyzed data as colors. The shader is rendered in an offscreen FBO. Then I glReadPixels, only keeping the useful data lines.
Shader 2: I update another buffer (y coordinates) with that data using glBufferSubData. The X coordinates are fixed and only populated once (VBO). Then I render the XY data.
This is working great. The downside is that I have to move a lot of data back and forth between the CPU and the GPU. I wonder if I could improve on that (any input appreciated), and as well coding the analysis only on the CPU to compare which is fastest.
As kRogue mentioned, sysram is vram on ES2 devices; so really the only thing, that could make you lose performance, is the waiting for a render to finish. (the implicit flush + wait during glReadPixels). And well, the copying of data. Fortunately:
glReadPixels(...., glMapBuffer(..) , ..); // this will solve one of the copies for you
The buffer should be created with “streaming” hint instead of dynamic/static if ES2 has such hints.
For further performance advice, keep texture sizes power-of-two, and multiple of 32. 128x128, 32x32, 64x64, 256x256, 128x32 are nice sizes. Also, try to read the whole size (e.g 32x32 or 128x32), as drivers have optimized code for those cases.
Though, do try the performance of non-power-of-two texture-height, too (on different gpus). E.g. 128x30 This could be a hint to some drivers to not organize texels in a “twiddled” way, but in linear. That way, the internal decoding of glReadPixels() will be a simple memcpy().
Well, ugh … assuming OP has GL_OES_mapbuffer.
ReadPixels may also be problematic due to limited formats available.
I guess I should read the ES2 spec, but the more I hear, the less I want to.