running a 8X agp card on 4X port... how much improvement would an 8X port see? -nt-

opengl_enquirer · September 30, 2003, 9:33am

no text

Ostsol · September 30, 2003, 11:48am

Not a programming question, but. . . The only performance difference will come when the AGP bus is significantly stressed for bandwidth. Using functionality that constantly sends large amounts of data across the AGP bus will benefit from an 8x AGP bus. An optimized render path that stores as much as possible in video memory and only sends small amounts of data to the video card during runtime will not see any significant performance boost.

opengl_enquirer · September 30, 2003, 7:09pm

i apreciate it, i will think about this asap.

michael

imported_jwatte · September 30, 2003, 8:42pm

Note that AGP 4x is the same speed as 133 MHz RAM; AGP 8x is the same speed as “266 MHz” DDR RAM. Depending on what memory you have in your machine, you may or many not get much improvement.

Also, if you run AGP close to its peak, then you’re likely starving your CPU for memory bandwidth, too.

This is why dual-DDR-400 makes a lot of sense (Until 16x PCI-Express comes around, of course)

MZ1 · October 1, 2003, 12:13am

In case of CPU writing data to video memory (like vertex data to VAR (NV_vertex_array_range) allocated memory), I noticed AGP speed mattered only when Fast Writes were also enabled. The peak bandwidth increase was pretty near the nominal x1, x2, x4, according to the mode set. I don’t know much how exactly the HW works, the experience comes from simple “synthethic” test. I guess this would still apply to VBO (ARB_vertex_buffer_object) memory, should the driver generously allocate it in the video memory pool (not the AGP mem)

Note that some card generations or some motherboards (may depend on bios version) may not support the Fast Writes. For example GF2 did support it, but GF3/4 didn’t. Funny, now in GFFX it is back :eek: . I think new Radeons support FW too.

Also, if you run AGP close to its peak, then you’re likely starving your CPU for memory bandwidth, too.
Unless, of course, you don’t simply copy memory in 1-1 fashion, but decompress vertices somehow, or generate them procedurally (curved surfaces, for example)

[This message has been edited by MZ (edited 10-01-2003).]

imported_jwatte · October 1, 2003, 10:15am

I’m assuming the CPU has some other duties than just shuffling vertices around. Even if it’s shuffling vertices, that’s bad, because it effectively halves your possible throughput; dram set-up latency to switch between reads and writes will eat your time even if you de-compress vertices.

The only case where a CPU won’t compete with the AGP for memory bandwidth is when you’re procedurally generating vertices AND your CPU look fits in L2 cache, or if your DRAM has higher bandwidth than the AGP port itself.

opengl_enquirer · October 1, 2003, 11:07am

so what is the difference between video memory and agp memory? and is texture memory shared with the two, or either or, or is texture memory wholey independent. for instance if your system doesn’t use too much texture memory could that memory be used for storing vertex data for instance on the hardware vertex memory. i think i realize with the “vertex buffer object” extension these distinctions on non texture memory or moot as i understand it.

i mostly do simulation, or that is at least what i pride myself for. but naturally i use opengl for visualization, and in this last week and for the present i’m looking into optimizing rendering.

in my work typically i utilize a mixture of static and completely dynamic geometry. would extra agp memory bandwidth help in the case of dynamic geometry stored on the card which needs to be frequently updated.

and also are display lists a solution for static geometry, or can that memory be accessed in another fashion. i realize display list work better with smaller chunks of geometry. the static geometry i use is well partitioned for collision simulation, would i see significant improvement if i broke the static geometry into pieces each an ideal sized triangle strip and rendered them with display lists. or would this all be better handled through the “vertex buffer object” method.

this is off topic but rather than opening a new subject i will tack it on to here first. this morning i’m adding frustum culling. i’ve built the frustum and i’m considering how best to cull. my bounding boxes in local space are major axis aligned in local space. right now i’m considering transforming the frustum into that space for each bounding box for simple culling. but i’m wondering if there might be a simpler method. eventually i will also have to clip arbitrary hulls to the frustrum as well in the case of the dynamic geometry some how. anyone with experience in this matter i’m interested in optimal suggestions.

sincerely,

michael

Ostsol · October 1, 2003, 11:17am

Video memory is on the video card itself. AGP memory is a portion of system memory that is set assigned as extra memory that the video card may use if there is not enough on the card itself. Textures and vertices are generally stored in video memory unless there is not enough room, in which case it they are stored in AGP memory. Display lists may also be stored in video memory, depending on the implementation.

opengl_enquirer · October 1, 2003, 12:25pm

that makes since… i presume everything is handled transparently. then video memory is the 64/128/etc mb associated with the graphics card. i presume there this no distinction then between texture memory. i only say this because in the past i recall instances of the term “texture memory” which implies distinction.

i’ve noticed that cg distinguishes between half precision floats and other types. is it possible to load a float array into video memory with half precision?

i’m reading this example.

        // Define arrays
        BindBufferARB(ARRAY_BUFFER_ARB, 1);
        VertexPointer(4, FLOAT, 0, BUFFER_OFFSET(0));
        ColorPointer(4, UNSIGNED_BYTE, 0, BUFFER_OFFSET(256));

        // Enable arrays
        EnableClientState(VERTEX_ARRAY);
        EnableClientState(COLOR_ARRAY);

        // Draw arrays
        DrawArrays(TRIANGLE_STRIP, 0, 16);

is it possible or problematic to mix arrays for instance if in this example the color pointer was specified when the buffer object was not bound. then could the two be effectively mixed or must all elements be associated with the buffer object?

michael

Ostsol · October 1, 2003, 12:48pm

Well, ideally it should be transparent. Ideally the driver should toss data into AGP memory if there is insufficient video memory. Whether this gets done or not depends on who’s writing the video drivers.

Texture memory is just whatever memory the video card has available to it. Perhaps in the past only textures were stored in video memory, but now that memory can store a variety of data.

You cannot mix VBO arrays with standard vertex arrays in the same draw call. The only way to use standard arrays after using VBO is to bind vertex buffer number zero, but that cancels any further drawing with VBO until you bind a valid buffer. You can, however, bind mulitple buffers and use them all in a single draw call. You could, for example, put vertex coordinates in one buffer and colour in another.

opengl_enquirer · October 1, 2003, 1:44pm

surely there must be some benefit from using contiguous arrays as the functionality exists and appears to be advocated by the VBO specifications.

/* //remark this block
multiple buffers would mean binding each buffer and assigning each associated pointer. then when draw elements is called does it not work on the currently bound buffer.
*/

/edit: i still don’t follow MapBuffer() but don’t express the same sentiment as below. i think MapBuffer is used when memory is not to be uploaded to the hardware but i’m proably wrong. i’m currently implimenting VBO with vertices and normals and triangle strip indices for the static geometry i calculate i should have plenty graphics memory to spare so no worries there for now/

i imagine MapBuffer() must be used but i admit i do not really follow its functionality. for in all cases of its use in the specifications such as quoted below it simply returns a pointer which i imagine maps to the hardwares memory but then never shows what is done with the pointer and rather “says” ambiguously “// Fill buffers…” in each example.

…

Mapping multiple buffers simultaneously:

    // Map buffers
    BindBuffer(ARRAY_BUFFER_ARB, 1);
    float *a = MapBuffer(ARRAY_BUFFER_ARB, WRITE_ONLY);
    BindBuffer(ARRAY_BUFFER_ARB, 2);
    float *b = MapBuffer(ARRAY_BUFFER_ARB, WRITE_ONLY);

    // Fill buffers
    ...

    // Unmap buffers
    BindBuffer(ARRAY_BUFFER_ARB, 1);
    if (!UnmapBufferARB(ARRAY_BUFFER_ARB)) {
        // Handle error case
    }
    BindBuffer(ARRAY_BUFFER_ARB, 2);
    if (!UnmapBufferARB(ARRAY_BUFFER_ARB)) {
        // Handle error case
    }

…

why is it not possible to upload values to contiguous graphics memory blocks from seperate primary memory pointers? such functionality it would seem would prove envaluable.

michael

[This message has been edited by opengl_enquirer (edited 10-01-2003).]