ATI, glMapBuffer and memcpy incompatible?


I´m reading back from a GL_STATIC_DRAW VBO (bound as element buffer) using glMapBuffer(GL_ELEMENT_ARRAY_BUFFER_ARB, GL_READ_ONLY_ARB) and memcpy. Sometimes this copy failes and some of the elements in the destination buffer are faulty!
Now some interesting things:

  • Its always a range of 16 16bit elements starting with element 1008 (buffer offset 2016) going through 1023 (thus covering a range of 32 bytes).
  • if you do the same memcpy call a second time, the destination buffer will contain the correct data
  • the destination buffer seems to be just not written in the affected range. If you just allocated the buffer, in debug mode it will contain 0xbaad and 0xf00d (I think, you recognize them) as indices.
  • doing the copy with a manual loop:
for (int i=0; i<m_num_indices; ++i)

circumvents the problem (??)

A year or two ago I suffered from a similar problem, but related to read back vertex data from VBO. It got fixed by a driver update

I know that I generally should not read back data from the gpu… but I´m doing it and it is just expected to work.

Can anybody else confirm this problem?

thanks in advance.

PS: my specs:

  • xpertVision Radeon X850XT (256MB)
  • WinXP SP2
  • Abit NF7-S2 (nForce2 chipset() board
  • 1GB RAM, 256MB AGP aperture size, Fastwrites enabled
  • Visual Studio 2005 Express

memcpy is optimized to assume system memory.

Note that 2016 = 2048-32? It’s only the last cache-line that fails for you (assuming 32-byte cache lines), or a lingering “external bus” transaction (IIRC AGP is 32 bytes, I don’t know about PCI[eExX]).

This reminds me of a Linux/GLX (?) problem some years ago, where on AMD CPU’s there was some MTRR setup wrong (IIRC). This case it seems it could be similar.

If src memory is cacheable (a reasonable expectation, else performance would suffer a LOT) but for some reason the last cache line (or bus transaction) worth of memory isn’t transferred when you expect it to be. I could only speculate what the reasons for this could be, but I do have some suggestions for workarounds:

  1. After the memcpy, do a “manual” copy of just the last CPU-register-wide word (i.e. 32 bits for 32-bit CPU, 64-bit for 64-bit CPU), to force this last cache-line transaction to be completed. Especially if you unmap the buffer right after the memcpy.

  2. Page-align your destination buffer too, if possible (I here assume the src buffer already is page aligned).

Theoretically this could be a problem that the CPU, chipset, ICD or even driver for some reason hasn’t committed the last 32-byte chunk to the card (if so, I think it’s a logic error where there is a check “does this 32-byte write cross a boundary” where the check really should be “does this 32-byte write fully fill up the 32-byte something”), even if that sounds less likely with your problem description.

P.S. If possible, test turning off fast-writes.