Linux fine, windows **** ?

jide · January 7, 2002, 2:50am

my programm works fine under Linux (around 40 im per second) and it’s fluid.
But under Windows, even if i have a better framerate, the render hangs many times in a second.
So we could see more than one car when accelerating or breaking.

Studying this, i think it’s a time problem.
why linux time is 1/1000000 second, otherwise windows time is 1/1000 second.
Does that may improper calculations, so the scene is not good ?
Another way is maybe the keyboard link.
What’s the problem ?

GPSnoopy · January 7, 2002, 5:37am

I bet you’re using GetTickCount() in your timer. Don’t use it, it’s not precise enough.

Use instead QueryPerformanceCounter() and QueryPerformanceFrequency().

They are a lot more precise, the only problem is that they work with integer on 64bits, and if you don’t use them correctly you could lose the precision you gained.
Try to stay in int64 as long as possible for the math parts before putting the results into double.

imported_tfpsly · January 7, 2002, 11:55am

Do not forget that Linux uses an interpolated timer that is precise to 1 ms, whereas windows uses the hardware clock

dorbie · January 7, 2002, 5:17pm

Once again I see an excuse to post my old win32 timing code.

// option to low pas high res timer
//#define LOW_PASS_HIGH_RES

// Compute elapsed time since last call (call once per frame)
float DeltaTime()
{
static int first = 1;
static int count;
static BOOL HighRes;
static DWORD this_time, old_time, oelapsed, elapsed[20];
static float this_timef, old_timef, oelapsedf, elapsedf[3], resolutionf;
LARGE_INTEGER pcount;

if(first)
{
first = 0;
// test for high res timer and get resolution in milliseconds
HighRes = QueryPerformanceFrequency(&pcount);
if(HighRes)
{
resolutionf = pcount.LowPart/1000.0f;
  	// init low pass array to 60 Hz assumption
  	for(count = 0; count < 3; count++)
  	{
  		oelapsedf = elapsedf[count] = 16.667f;
  	}

  	QueryPerformanceCounter(&pcount);
  	old_timef = (float) pcount.LowPart;
  	old_timef /= resolutionf;
  }
  else
  {
  	// init low pass array to 60 Hz assumption
  	for(count = 0; count < 20; count++)
  		oelapsed = elapsed[count] = 16;
  	// init time to current value
  	old_time = GetTickCount();
  }

  count = 0;
  return 16.667f;
}
else
{
if(HighRes)
{
// Use High Res Timer to compute elapsed time in ms
// low pas to eliminate any jitter is optional
QueryPerformanceCounter(&pcount);
this_timef = (float) pcount.LowPart;
this_timef /= resolutionf;
// stick with old elapsed if loopback detected
if(!(this_timef < old_timef))
{
oelapsedf = elapsedf[count] = this_timef - old_timef;
}
else
{
elapsedf[count] = oelapsedf;
}
  	old_timef = this_timef;
#ifdef LOW_PASS_HIGH_RES // option to low pas high res timer
count ++;
if(count == 3)
count = 0;
  	return (elapsedf[0] + elapsedf[1] + elapsedf[2]) *.3333333f;
#else
return (elapsedf[0]);
#endif
}
else
{
// Use Low res timer to compute elapsed returns ms
// Must low pass over several frames since timer
// res may be much < 1 ms
this_time = GetTickCount();
  	// stick with old elapsed if loopback detected
  	if(!(this_time < old_time))
  	{
  		oelapsed = elapsed[count] = this_time - old_time;
  	}
  	else
  	{
  		elapsed[count] = oelapsed;
  	}
  	count ++;
  	if(count == 20)
  		count = 0;
  	old_time = this_time;
  	return (elapsed[0] + elapsed[1] + elapsed[2] + elapsed[3] + elapsed[4] +
  			elapsed[5] + elapsed[6] + elapsed[7] + elapsed[8] + elapsed[9] +
  			elapsed[10] + elapsed[11] + elapsed[12] + elapsed[13] + elapsed[14] +
  			elapsed[15] + elapsed[16] + elapsed[17] + elapsed[18] + elapsed[19]) *.05f;
  }
}
}

jide · January 7, 2002, 9:54pm

sorry Dorbie, i don’t understand anything on what you’re doing.

I don’t use clock per sec (ãround 19 hits per sec). So, I use ftime() under Windows with double variables and gettimeofday() under Linux with double variables.

the first is precise at 1/1000 sec (Windows)
the linux is precise at 1/1000000 sec.

Is it a good solution , or does your
QueryPerformanceCounter() and QueryPerformanceFrequency() is better ?

And that’s maybe not a timer problem !

JD

jide · January 8, 2002, 12:56am

… and while passing float instead of double for all variables (time, and vertices…), all is more speedest, but hangs remain under Windows !!

hmmm… i use display list with small or huge models, and that’s the same thing (but the framerate different).

does multi-thread would forget this problem ?
can it be the keyboard function (under glut), or its implementation ?
I remember that, before, the keyboard callback was in the main file, and was implemented here. Now, i move the implementation in a class method. I remember that, under Linux, it slowed down the rotation speed.

I don’t see any other way.

please help, that’s very constraignant to have that under Windows

thanks a lot

JD

OldMan · January 8, 2002, 2:05am

Im not sure.. but GLUT has a limit of frequency that you can call the functions (that would explain the hangs). Did you tried to use the glut GameMode? I I remember well GLUT doesnt call display function more than 30 FPS… (I really dont remeber the number... but Im sure I read something about this ).

jide · January 8, 2002, 3:38am

Oldman, i already have a framerate superior than 150 im/sec, so I don’t think glut display function callback is too limited for having such a way under Windows.

It’s very strange because i remember it works properly at the beginning (around 1 year ago now).

No, i haven’t try to use glutGameMode(); I don’t know how to use it nor. Help would be appreciated here.

cordially,

JD

GPSnoopy · January 8, 2002, 5:54am

jide, under Windows, use the PerformanceCounter.

Both are explained at the bottom of this page: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sysinfo/time_4po3.asp

IMO dorbie’s code is way to complex for such a simple matter.
Also including another code path if the Performance Counter doesn’t work, as in dorbie’s code, is a bit useless IMO. All modern system (since the Pentium) have a performance counter included.

PS: LARGE_INTEGER is the same as __int64 under VC++.

[This message has been edited by GPSnoopy (edited 01-08-2002).]

imported_jwatte · January 8, 2002, 7:07am

QueryPerformanceCounter() is available on all PCI and better systems. However, it has a tendency to skip forward about 4 seconds every so often (Microsoft blames the chip sets, but all chip sets cause the same bug…)

Every CPU from the Pentium on has the RDTSC instruction, which returns a 64-bit integer which increments once per CPU cycle. Thus, if you know how fast your CPU is, you can get nanosecond resolution in timing (modulo instruction pipelining/scheduling). And calling RDTSC is much, much less overhead than calling timeGetTime() or QueryPerformanceCounter().

GPSnoopy · January 8, 2002, 8:05am

Isn’t QueryPerformanceCounter actually using RDTSC?

PS, RDTSC returned integer increments on a constant basis… but not specificly on each CPU cycles. (Current CPUs increments at each cycles, but it might change, espescially with the increasing clock speed)

Elixer · January 8, 2002, 9:25am

Originally posted by jwatte:
[b]QueryPerformanceCounter() is available on all PCI and better systems. However, it has a tendency to skip forward about 4 seconds every so often (Microsoft blames the chip sets, but all chip sets cause the same bug…)

Every CPU from the Pentium on has the RDTSC instruction, which returns a 64-bit integer which increments once per CPU cycle. Thus, if you know how fast your CPU is, you can get nanosecond resolution in timing (modulo instruction pipelining/scheduling). And calling RDTSC is much, much less overhead than calling timeGetTime() or QueryPerformanceCounter().[/b]

jwatte, Where did you find that it skips ~4 secs every so often? That would explain a situation I had last year, and I thought it was because I had a error someplace… I would think they would mention this in the MSDN docs?

imported_jwatte · January 8, 2002, 10:41am

Elixer,

The bug is documented in the knowledge base on the web MSDN site.

GPSnoopy,

No, QPC does not use RDTSC (I was also under that mis-impression for a long time).

The ia32 architecture definition (instruction reference) explicitly says that the processor increments the time stamp counter every clock cycle, and resets it to 0 whenever the processor is reset. I take this as an iron-clad guarantee that there is a 1:1 relation between clock cycles and RDTSC ticks. Of course, how many instructions/u-ops can actually get executed in a clock cycle may change between CPUs.

The trick is figuring out what your CPU speed actually is; especially when you’re on SpeedStep and it might change on the fly. I use one of the other timers as a reference now and then to measure CPU speed, and re-sync my CPU speed estimate. Works well.

dorbie · January 8, 2002, 1:56pm

It tries to compute the delta time in milliseconds between the last frame and this frame. It doesn’t round to 1000th of a second, it gives you a fractional result in milliseconds, feel alter the scale of the return value.

It doesn’t need a double, my computer isn’t that fast and neither is yours. A deltatime float in milliseconds is enough for anyone with a PC.

There are two methods used, one uses a high performance counter the other doesn’t it will only fall back on the slower counter if it can’t find the better one.

There is also the option to average the result over several frames to avoid jitter. This also helps with a slower counter because the resrult is some resaonably accurate average time based on several frames so it won’t get rounded as much by the slowe timer option.

Beyone this you don’t really need to understand it, if you throw it in your code, it will do what you need.

jide · January 8, 2002, 10:25pm

Dorbie, i would like to understand before putting anything in my code.
In all cases, my actual time function (its a method) is less big than yours. However, if it’s better to use PerformanceCounter, i will use it, almost if i’ll get nanosecond precision as under Linux.
The problem is that now, it will change almost my clock class: i use double (or float) values, and now i have to use __int64 nearby.

In all cases, i tried many time to count how much time does my system take to count
1000 000 000 in NULL (for( i=0; i<1000000000; i++) for example).
On my old system (AMD K6-2 300MHz) the frequency was about 275MHz, and now, on an Athlon 1600+Xp it’s about 690MHz. How i was estonished !!! – if anyone could explain it ?
I hope this your code will help (under Windows), but now i must find a better under Linux (it seems).

thank you all

JD

dorbie · January 8, 2002, 10:44pm

I admire and agree with your position, but would have expected you to examine the code. I was a bit surprised it attracted so many comments for something so simple.

I’m not sure I understand the rest of your post, but here goes. I think it’s somewhat naieve to use a loop like the one you have to measure performance, different compilers could optimize this differently, including unrolling it. It might even be possible to optimize it to the equivalent of:

i=1000000000;

In addition the ability to pipeline these instructions and the dependency of the loop on the previous itteration’s result would affect the performance severely. You also have a branch which will easily be predicted but it’ll still block on the increment.

Basically this is a VERY bad way to try and measure performance. Clock is not the whole picture and the ability to promote instructions is heavily dependent on the suitability of the code for pipelining and the availability of instructions and data being used, and the suitability of the instructions to be run on the multiple instruction units on the processor.

jide · January 9, 2002, 12:17am

yes, but that’s a simple command to execute!

OK for your code. it seems need the cpu frequency to work properly. But, in general, we haven’t got the exact cpu speed, so maybe time would loose exactitude (no ?).

// init low pass array to 60 Hz assumption
for(count = 0; count < 3; count++){ oelapsedf = elapsedf[count] = 16.667f;
}

this (16.667f) is not correct: using 16.6666666666666… would be better, isn’t it ?

Do you have real-time here ?
Anyway, i will test it under Windows. otherwise under Linux, I haven’t got such a way to do, and must stay on a basis command with GetTimeOfDay().

You may found your code so simple, but not anyone could have done that so easily (i didn’t know). MSDN doens’t give me this way when seeking for time, chrono, clock. So, i had to use ftime().

Thanks a lot, i will tell you (i hope soon) how Windows stand with it.

Ah, sorry, i have forgot.
Yesterday, i tried to execute my demo on a friend’s computer (under windows of course).
he has got a celeron 433. the demos seemed not hanging so much as on my computer.
may i have not correct drivers ? (that’s another way of solution for my problem).

JD

dorbie · January 9, 2002, 1:15am

Jibe,

It isn’t simple to execute when the processor is designed to simultaneously work on several instructions but must wait on the result of this instruction before proceeding to the next. You can assume it should be simple and continue to be shocked and surprised at the result or you can accept the explanation you asked for.

As for the 16.667 it is a gross assumption for the first time through the loop. What you are complaining about is an error of about 4 ten-millionths of a second in a piece of code which is GUESSING what the frame rate is likely to be. At this stage the frame rate could be anything, it’s just a filler which is better than zero. As for the rounding, it’s as likely the 60Hz video clock had more of an error in it that that number, which is in milliseconds. It is also more likely that the graphics is running at 30Hz or 100Hz and the guess is out by large ammounts for the first two frames.

jide · January 9, 2002, 2:45am

OK,

i think i accept your explanations.

If I understood well your second part, i think 16.667 is just to scale the first value for the next calls. to find higher or lower rates, i think.
Now, in your code, i didn’t understand the lowres part (maybe doing average time value?)
.

So, your code give laps time between two calls in milli second and is much better than ftime(). That’s all right !

JD

dorbie · January 9, 2002, 3:17am

The averaging is there to smooth out any noise of jitter over several frames. In the case of the high res timer it’s optional.

In graphics a timer like this typically measures the time taken for the last frame and uses this to animate for the next frame. That can have undesirable effects especially with load ballancing, so averaging can help a bit.

With the low res timer you don’t have enough resolution for the kind of measurements I was making so averaging helps you extract a reasonable high resolution result from a low resolution timer if you’re in a loop.

You can ignore the low res stuff, I think all PCs have the higher resolution timer now.

You also don’t want to average the high resolution timer because of the nature of your measurement. It was a reasonable option for me because I was in a rendering loop with consistent frame times, you are not.

You may even want to just look at how I use the QueryPerformanceCounter call and take your timings directly from that call.