update from VC6.0 to vs2003 or vs2005?

std::string is meant to be a common exchange structure for strings, as a direct replacement for char*. Extra functionality like printf should be added by deriving from it or using the algorithms, I’ve always assumed.
Humus, why would there be a compare for every character in the string when there’s block copy operations in CPU’s to be used for things like this. I remember looking at the assembly of VC6’s strlen function and it uses some fancy tricks to speed it up, like comparing a dword at a time.

Originally posted by Zulfiqar Malik:
The biggest problem (among others smaller ones) with std::string is the absence of an sprintf(…) style formatting function. Its the biggest nuisance and it has forced me to write my own string class.
std::string with std::ostream (or specifically std::ostringstream) should do the trick, as it’s meant to replace the C sprintf function.
The difference is that you need to setup the std::ostream instead of formating through the function call.

Well, it looks like this thread is going way off topic from OpenGL, but I’ll add to it anyways since I feel this is pretty useful information. std::ostringstream is horrendously slow. By “horrendously slow”, I mean 100 times per frame will severely impact your framerate. I encountered a performance bug a few projects ago that had to do with using std::ostringstream in UI code. Not fun. This was the case in VC6, but may no longer be the case since implementation Microsoft ships has changed since then. However, you will definately want to do some performance tests to see for yourself.

Kevin B

Originally posted by RigidBody:
[b] humus,

in your code the src string is read twice, too- character by character. for the first time when the while-condition is checked, and for the second time when it is copied to dst. [/b]
In source code yes, but obviously the compiler will optimize that and use the value already in the register.

Originally posted by knackered:
Humus, why would there be a compare for every character in the string when there’s block copy operations in CPU’s to be used for things like this. I remember looking at the assembly of VC6’s strlen function and it uses some fancy tricks to speed it up, like comparing a dword at a time.
Of course you can do DWORD compares and writes at the same time too. But what I’m thinking here is that reading the entire string first, and then copying it means that you’ll read the memory again. For short strings it will pretty much always be in the cache, so the practical difference may be small, but for longer strings it could affect performance quite a bit.

The block copy operations (assuming you mean REP MOVSD and the like) is AFAIK no longer recommended for use on modern CPUs as a normal loop with basic instructions doing the same work is faster.

Originally posted by Humus:
[b] [quote]Originally posted by RigidBody:
[b] humus,

in your code the src string is read twice, too- character by character. for the first time when the while-condition is checked, and for the second time when it is copied to dst. [/b]
In source code yes, but obviously the compiler will optimize that and use the value already in the register. [/b][/QUOTE]i didn’t doubt that :wink:

but if strlen + memcpy is used, the src string will be read from the cache. that’s not as fast as a register, but closer to the cpu than main memory.

anyway, we’re offtopic, and furthermore- i don’t think that anyone will ever have a performance problem caused by strcpy.

Originally posted by Humus:
The block copy operations (assuming you mean REP MOVSD and the like) is AFAIK no longer recommended for use on modern CPUs as a normal loop with basic instructions doing the same work is faster.
Gutted. How can that be? I always use that stuff with the assumption that it is a fair bit quicker.

Humus, you’ve out-geeked me - I’m not that familiar with the comings and goings of modern CPU’s (other than that they’ve got big caches and can perform the same instruction on multiple data), the last assembly I did was on the Z80. I take your word for it, and will regurgitate it as well-known fact to anyone else in the future, as I do with most of Korvals stuff.

Gutted. How can that be? I always use that stuff with the assumption that it is a fair bit quicker.
Entirely naive implementation would be linkely not faster however there are several tricks that can be done when the rep movsd is not used. Example of such things is prefetching, use of the SSE registers to copy 16 bytes at once and with special instructions that do not polute the caches. As example of such optimalizations can serve this memcpy optimized for the AMD Athlon procesors made by the AMD (I have once seen some paper about similiar optimalization with some performance numbers however I do not know where it was) or this page about optimalization done on old Pentium 1 based cpus.

On .NET 2003, strcpy works by copying DWORD. It is written in assembly.
I think the optimized memcpy using MMX (that AMD code) needs to be tested. I once borrowed someones code for memcpy_mmx and the website said it is actually slower than than rep movsd and to my surprise, they were right.

“rep movsd” use to be a ultra special instruction. If it is no longer so …

The runtime version of memcpy has been copying DWORD’s since at least VC6, but probably earlier. If you view the assembler source code (crt\src\intel\memcpy.asm) you’ll find it also takes into account the U and V pipes (for instruction pairing).

One thing that can seriously bite, is that the intrinsics of mem* functions (memcpy, strlen, strcpy and so on) at least up til VC7.0 (I haven’t tested this with 7.1 or 8) produced code as if “optimizing” for a 386SX. I have particular fond memories of how a string operation (probably strcpy) ran 3.5 times (!) faster using the runtime library version over the intrinsic version. What’s especially bad with this, is that turning on /O2 optimization brought with it /Oi (generate intrinsics).

If you ever see one of those memory-array functions becoming a bottleneck, knowing this can save a lot of frustration.