Step-by-step displaying

hello. I have to show image, using step-by-step method. I am using QT (QGLWidget) and basic GL functions.

For example:

matrix_reset(); // reset scene
glTexImage2D(GL_TEXTURE_2D, 0, 4, realW, realH, 0, GL_RGBA, GL_UNSIGNED_BYTE, rgba); // transparent image
updateGL(); // update

for(i = 0;i < finfo->h;i++) // from first scanline to last
lib->fmt_read_scanline(finfo, scan); // read scanline from image file into “scan”
glTexSubImage2D(GL_TEXTURE_2D, 0, offsetX, offsetY+i, finfo->w, 1, GL_RGBA, GL_UNSIGNED_BYTE, scan); // copy “scan” into texture

updateGL(); // update

it shows an image line-by-line, but it works VERY SLOW (about 3000 ms.), while pure reading an image (w/o glTexSubImage2D) takes about 20 msec.

Why so ?

Do you know GL_TEXTURE_1D ? It mays help if you want to keep textures. Have a look at other possibilities to render images too.

Any specific reason why you want to do this with opengl? I’m just asking because doing this through the standard X11 api (or qt’s abstraction of it) would probably be faster. Keep in mind that although the gl is theoretically a general purpose graphics library (as its name suggests) it’s not so general for most purposes. All optimizations in the actual implementations(the drivers) have been going in specific directions(games and CAD mostly) so everything else is bound to be unoptimized and therefore slow. In your case for example you have to first read the data from the file, then upload it to video ram and later (when rendering) copy it to the framebuffer. Now this last step is the only one you really want (except for the loading part of course) and even this is propably performed slower than a simple blit because of all the (useless in your case) opengl operations performed until the pixels end up in the framebuffer.
Having said all this I must also say though that I don’t know that much about opengl internals so I might be wrong. Still if you’re not going to do anything fancy with the image (like blend it, deform it or something like that, where the HW acceleration of the GL will be beneficial) I bet a simple X11/qt blit will be much faster than the GL.

It’s very fast (zooming, rotating, etc.). QT can’t do it so fast as OpenGL. As I know X11 too. :frowning:

Well as I said, if you’re going to do something fancy with the image besides displaying it using gl (well the GPU actually) might well be worth it. In this case I’m not sure I can help you much further. The only advice I can give you is to call glTexSubImage as few times as possible (depending on your particualr problem) and that you propably should pass NULL as the pointer to the initial glTexImage2d. The latter will tell the driver to allocate the space but not try to load anything into it. Of course you might be doing that allready if rgba is NULL.

PS: btw why do you specify 4 as initial format, this is only used for backward compatibility with GL 1.0 AFAIK

Can you suggest me something else, exept GL ? (I don’t know much about rendering software).

Well, perhaps you should just read the entire image in at once and be done with it. If you want to, you can then display progressively more of it by altering the quad (and corresponding texture coords.) If you are trying to do incremental display because the reading is slow, perhaps you should gather as much as you can in say 1/10th of a second and send that.

Now that I think of it, if loading the image from disk takes 20ms then 3000ms is way too much just to upload it, even one scanline at a time. Of course I bet that your loading doesn’t just take 20ms, this is just the time it takes to load the cached image from RAM (it got cached the first time you loaded it) but 3000ms should be too much anyway. You can try the following:
a) time the actual opengl commands needed to load the image and see how much they actually cost.
b) show us exactly what part of your code you have timed.
c) best of all: read manpages/infopages on an application called gprof (assuming you’re running linux). You propably have it installed allready. It won’t take long to learn the basics and it’s quite invaluable for such cases. It can show you wich functions or even wich lines cost the most in your program. Of course its results should be interpreted with some care but it will definately be usefull.

  1. gprof: thanks, I’ll try it today.
  2. I timed those code, which I posted here, e.g. cycle for() { readscan(…); … };

w/o glTexSubImage2D this cycle takes 20 ms.

Which parameters should I pass to gprof ?

P.S. My program: (profiling enabled)

method, which reads an image and binds it to GL placed here: ksquirrel/sq_glviewwidget.cpp (slotShowImage(image-path), it’s last method in .cpp).

well you can call it like this:

gprof binary | less

or for line-by-line profiling:

gprof -l binary | less

of course you can pipe it to anything you like instead of less.

Although if removing the glSubImage line reduced the loop time from 3000ms to 20ms it rather obvious where your time is spent (better run it through gprof anyway in case there’s something wrong with your timing code). Is there a good reason for wanting to display the image line by line? Unless the data comes over a slow network connection (in wich case glSubImage overhead will propably not be noticable) I can’t see why this would be useful.

I am working on image viewer, so I need to display image line-by-line. BTW, GQview and ACDSee display images line-by-line => I can interrupt decoding of big images (1500x1200 etc.), if it’s too slow.

gprof: I did’t find in gmon.out anything useful :confused:

Well, I wrote some code, based on QT’s QTime:

QTime time, total_begin, total_end;

total_begin = time = QTime::currentTime(Qt::LocalTime);

Preparing from %2d:%2d:%2d ", time.minute(), time.second(), time.msec());

glTexImage2D(GL_TEXTURE_2D, 0, 4, realW, realH, 0, GL_RGBA, GL_UNSIGNED_BYTE, rgba);

time = QTime::currentTime(Qt::LocalTime);
printf(" to %-2d:%-2d:%-2d

", time.minute(), time.second(), time.msec());

for(i = 0;i &lt; finfo-&gt;h;i++)
	printf("Run #%-3d  ", i);

	time = QTime::currentTime(Qt::LocalTime);
	printf("Reading %2d:%2d:%-3d ", time.minute(), time.second(), time.msec());

	lib-&gt;fmt_read_scanline(finfo, scan);

	time = QTime::currentTime(Qt::LocalTime);
	printf("%2d:%2d:%-3d ", time.minute(), time.second(), time.msec());

	time = QTime::currentTime(Qt::LocalTime);
	printf("  Binding %2d:%2d:%-3d ", time.minute(), time.second(), time.msec());

	glTexSubImage2D(GL_TEXTURE_2D, 0, offsetX, offsetY+i, finfo-&gt;w, 1, GL_RGBA, GL_UNSIGNED_BYTE, scan);

	time = QTime::currentTime(Qt::LocalTime);
	printf("%2d:%2d:%-3d ", time.minute(), time.second(), time.msec());

	time = QTime::currentTime(Qt::LocalTime);
	printf("  Updating %2d:%2d:%-3d ", time.minute(), time.second(), time.msec());


	 total_end = time = QTime::currentTime(Qt::LocalTime);

", time.minute(), time.second(), time.msec());


Cycle takes: %2d:%2d:%-3d %2d:%2d:%-3d
", total_begin.minute(), total_begin.second(), total_begin.msec(),total_end.minute(), total_end.second(), total_end.msec());

****************** I got (with image 300x301):

Preparing from 14:57:489 to 14:57:502

Run #0 Reading 14:57:503 14:57:503 Binding 14:57:503 14:57:503 Updating 14:57:503 14:57:529
Run #1 Reading 14:57:530 14:57:530 Binding 14:57:530 14:57:536 Updating 14:57:537 14:57:537
Run #2 Reading 14:57:537 14:57:537 Binding 14:57:537 14:57:543 Updating 14:57:543 14:57:543
Run #3 Reading 14:57:544 14:57:544 Binding 14:57:544 14:57:550 Updating 14:57:550 14:57:550
Run #4 Reading 14:57:551 14:57:573 Binding 14:57:573 14:57:579 Updating 14:57:580 14:57:580
Run #5 Reading 14:57:580 14:57:581 Binding 14:57:581 14:57:586 Updating 14:57:586 14:57:586
Run #6 Reading 14:57:587 14:57:587 Binding 14:57:587 14:57:594 Updating 14:57:595 14:57:595
Run #7 Reading 14:57:595 14:57:596 Binding 14:57:596 14:57:601 Updating 14:57:601 14:57:601
Run #8 Reading 14:57:602 14:57:602 Binding 14:57:602 14:57:607 Updating 14:57:607 14:57:608
Run #9 Reading 14:57:608 14:57:609 Binding 14:57:609 14:57:616 Updating 14:57:616 14:57:616
Run #10 Reading 14:57:617 14:57:617 Binding 14:57:617 14:57:622 Updating 14:57:622 14:57:623
Run #11 Reading 14:57:633 14:57:633 Binding 14:57:633 14:57:640 Updating 14:57:640 14:57:640
Run #12 Reading 14:57:641 14:57:641 Binding 14:57:641 14:57:646 Updating 14:57:646 14:57:646
Run #13 Reading 14:57:647 14:57:647 Binding 14:57:647 14:57:653 Updating 14:57:653 14:57:653
Run #14 Reading 14:57:654 14:57:654 Binding 14:57:654 14:57:660 Updating 14:57:660 14:57:660
Run #15 Reading 14:57:661 14:57:661 Binding 14:57:661 14:57:667 Updating 14:57:667 14:57:667
Run #16 Reading 14:57:667 14:57:668 Binding 14:57:668 14:57:675 Updating 14:57:675 14:57:676
Run #17 Reading 14:57:676 14:57:677 Binding 14:57:677 14:57:682 Updating 14:57:682 14:57:682
Run #18 Reading 14:57:691 14:57:693 Binding 14:57:693 14:57:698 Updating 14:57:698 14:57:698

Run #286 Reading 14:59:933 14:59:933 Binding 14:59:933 14:59:939 Updating 14:59:939 14:59:939
Run #287 Reading 14:59:947 14:59:947 Binding 14:59:947 14:59:955 Updating 14:59:955 14:59:955
Run #288 Reading 14:59:956 14:59:956 Binding 14:59:956 14:59:962 Updating 14:59:962 14:59:962
Run #289 Reading 14:59:962 14:59:963 Binding 14:59:963 14:59:968 Updating 14:59:968 14:59:968
Run #290 Reading 14:59:969 14:59:969 Binding 14:59:969 14:59:974 Updating 14:59:974 14:59:975
Run #291 Reading 14:59:975 14:59:975 Binding 14:59:975 14:59:981 Updating 14:59:981 14:59:981
Run #292 Reading 14:59:982 14:59:982 Binding 14:59:982 14:59:987 Updating 14:59:987 14:59:987
Run #293 Reading 14:59:988 14:59:988 Binding 14:59:988 14:59:996 Updating 14:59:996 14:59:996
Run #294 Reading 15: 0:4 15: 0:4 Binding 15: 0:4 15: 0:10 Updating 15: 0:10 15: 0:10
Run #295 Reading 15: 0:11 15: 0:11 Binding 15: 0:11 15: 0:16 Updating 15: 0:16 15: 0:16
Run #296 Reading 15: 0:17 15: 0:17 Binding 15: 0:17 15: 0:23 Updating 15: 0:23 15: 0:23
Run #297 Reading 15: 0:23 15: 0:24 Binding 15: 0:24 15: 0:29 Updating 15: 0:29 15: 0:29
Run #298 Reading 15: 0:31 15: 0:31 Binding 15: 0:31 15: 0:37 Updating 15: 0:37 15: 0:38
Run #299 Reading 15: 0:38 15: 0:39 Binding 15: 0:39 15: 0:44 Updating 15: 0:44 15: 0:44
Run #300 Reading 15: 0:45 15: 0:46 Binding 15: 0:46 15: 0:52 Updating 15: 0:52 15: 0:52

Cycle takes: 22:41:868 22:44:417

As you can see, in every iteration reading takes 0-1 ms, glTexSubImage2D() takes 5-6 ms, updating takes 0-1 ms. In general:

3001 + 6300 + 1*300 == 2400 msec, and total (total_end - total_begin) == 2549 msec.

P.S. my video driver is, maybe it is driver’s fault ? Theoretically, glTexSubImage2D with height==1 behaves like memcpy, which is very fast ?

P.S. my video driver is, maybe it is driver’s fault ? Theoretically, glTexSubImage2D with height==1 behaves like memcpy, which is very fast ?
Not really. There are a lot of things that have to be done for the texture to get subloaded (most of wich I do not know about). For example:

  1. All the glTexSubImage command parameters have to be checked for validity.
  2. Unpack operations must be performed on the data you specify
  3. All this will propably be cached in system ram but also uploaded to video ram.
    There’s propably a lot more and some of this will be well optimized but as you understand there’s a lot of overhead introduced with each glTexSubImage command. That’s why I suggested you do it as rarely as possible (that is, uploading a 512x512 teture in one call will be much faster than uploading 512 scanlines of the texture in 512 calls, although the transfered amount of data will be hte same). So again if your viewer doesn’t support loading from network loacations, noone will ever notice that the image is paged line-by-line. So you could just as well load it in one go. Otherwise you could try to optimise that loop but it’s going to be much slower than read/memcpy no matter how much optimisation you do.

that’s ok. thanks. :slight_smile:
As I already asked, can you suggest me something else, exept GL ? Can anything be very fast on line-by-line displaying ?

And what about doing 16 or 8 lignes at each update ?
That way you still can abort at any time, and total time will be much shorter.

Just a wild guess, when I first read your post, I thought about vsync. Are you sure vsync is disabled ?

What you need is not fast displaying but fast manipulation of images. There’s no crossplatform hardware accelerated graphics library besides opengl that I know of. If incremental loading is of such importance (I still can’t understand why) the only thing I can think of is this:
You don’t need to manipulate images when loading them. So load the image using X/Qt then when the user selects some sort of real-time manipulation load the whole image as a texture and use the gl to render the image as long as the user manipulates it. Then after the user makes up his/her mind about how much to rotate/scale etc. the picture, perform the actual manipulation (or simply read in the gl framebuffer) and switch once more to X/Qt for viewing the (now static image). This way you can get the best of both worlds at the cost of extra complexity.

I think zen is right. Preload image first. Then manipulate them. More, you’ll be able to do what
you want with an image buffer in memory, instead
of doing the stuff in real-time.

But tell us the result as it’s interresting.

ZBuffer: how can I test vsync is on/off ?

Mmmm, As I said - Preloading image is NOT good for user, and for me too, for example I MUST wait, while my program is loading 1600x1200 image (or bigger ?), and I can’t interrupt it.

Originally posted by CKulT:

Mmmm, As I said - Preloading image is NOT good for user, and for me too, for example I MUST wait, while my program is loading 1600x1200 image (or bigger ?), and I can’t interrupt it.

But there are other ways as loading part of the image in order to have an acceptable time of loading or maybe with using single buffering.

And effectively, I think a library that is deserved to manipulate images should be the best for you. Gtk+ and more specifically glib may be what you’re looking for.

Hope this helps.