String Processing

I’m using C++ to do some data reading from a standard ASCII text file. Here is an example text file:


3 100,200 300,400 500,600
4 111,222 111,333 111,444 111,555
2 222,111 333,888

Notice how the first number of each row tells you how many pairs of numbers follow it.

Here is my problem, I can’t think up of a quick way to extract these pairs of numbers. Normally I use sscanf to read the values, but with sscanf you have to know exactly how many values to read in. Does anyone know of a way to read in the first number and then use some sort of loop the read in the pairs of numbers?

You probably can tell that I’m new to this. If you are wondering what these numbers mean: (number of polygon vertices) (vertex number), (texture vertex number)…( ),( )

basically, you read in the entire file to a buffer (I use an invisible MemoBox), then you go line by line, read in the first number, run a for loop, then find the position of the comma, and take substrings… example:

AnsiString word;
ifstream fin;
fin.open(file);
while(getline(word,fin)){
TMemo1->Lines->Add(word);
}
fin.close();

for(int count=0; countLines->Count;count++){
AnsiString line = TMemo1->Lines->Strings[count];
int dot = line.Pos(" “);
//finds whitespace
int total = StrToInt(line.SubString(0,dot-1));
AnsiString line_remainder = line;
for(int index=0;index<total;index++){
dot = line_remaider.Pos(” “);
line_remainder = line_remainder.SubString(dot,line_remainder.length()-dot)
//leaves the next spot open
dot = line_remainder.Pos(”,“);
int arg1 = StrToInt(line_remainder.SubString(0,dot-1));
int length = line_remainder.Pos(” ")-dot;
int arg2 = StrToInt(line_remainder.SubString(dot+1,length-1));
//take care of arguments here
}
}

this code is written on the fly, but it is very close to the parser I was using to import/export pov-ray files [successfully]. Unfortunately, I have been having problems with someone maliciously deleting files, so I don’t have that code handy.

[This message has been edited by Cardinal (edited 04-29-2001).]

The sample above used Unvisible memo to open the file, as I see. I think, programmer should avoid creating large objects like that for this task. I recommend using fread to read bytes to memory buffer, or, if you use Windows ReadFile, which is, maybe, more complex, but really is the same.

Ok simplest way…
fscanf(stream_in, “%d”, &no_of_verts);
for(x=0;x<no_of_verts;x++)
{
fscanf(stream_in, “%d, %d”, &pair_i, &pair_j);
/** do whatever…**/
}

I assume you are reading this into memory… Obviously you can’t (sorry shouldn’t) keep accessing the file.

I suppose you have no idea how much data there is in the file so fread will be how many bytes? (Randolph)

you could (I do sometimes) do a scanf("%c") for the whole file and find its size and then use fread.
Or count the number of lines and malloc an array of structs… or use linked lists with structs in which are even more dynamic…

gav

Oh sorry but I only do C…

Also put in:

 while( !eof ) {
   Gavin's code above
 }

I think I like Gavin’s method. It’s quick and easy. Thanks guys.

What’s the advantage of Cardinal’s method?

using ifstreams is fast and easy thanks to operator overloading:

Vertex vert[];
Texture tex[];
.
.
.
ifstream inFile(filename);
if (!inFile)
return;
while (inFile.good())
{
inFile >> num;
for (i = 0; i < num; i++)
{
inFile >> vert[i] >> tex[i];
}
.
.
.
}
inFile.close();
.
.
.

i’m not sure if the commas would cause problems, depends on how the operators are defined. good luck

b

The advantages of using the Memobox are:

  1. after reading in the contents, everything is accessed w/o file io.
  2. the length of each line/file and data like that is readily available through the AnsiString class.
  3. I can make the box visible, to debug my file io.
  4. Functions like LoadFromFile/ SaveToFile/ IndexOf/ Clear/ ect are already implemented.

For Gavin’s benefit, you can find the length of the file like such:

long current_pos=ftell(file);
fseek(file,OL,SEEK_END);
long length=ftell(file);
fseek(file,current_pos,SEEK_SET);

Of course, using fscanf or invisible memo may seem more simple, but I still insist that fastest way is reading file (or its part) directly into the memory, avoiding using formatted input and output, and analyzing it there. Although, if you dont need to analyze big file, where may be large number of such segments, your losses of speed will be unmarkable.

And I see one little disadvantage in Cardinal’s method. This is the size of the code. For example, if you use Borland C++ builder, as I think, Cardinal does (I could be wrong), including Memo causes linker to include also VCL libraries code to the exe file. Executables which use VCL are NOT LESS than 300 kb(BCB version 5.0). Although, you may specify not to include code of VCL libraries into executable, but then your program will not run on systems, where there is no C++ Builder (or Delphi). I think, this is worse mentioning.

[This message has been edited by Randolph (edited 05-02-2001).]

Also some note about fscanf. I dont know exactly, but I think (on Windows platform) it depends on the type of character ("," or “.”) which is set in system settings as a fractional part delimiter. So, such thing may occur - your program using fscanf works properly on one set of windows settings("," is delimiter) and works in different way when delimiter is set to “.”.