I have a problem which will take 1000000 lines of inputs like below from console.
0 1 23 4 5
1 3 5 2 56
12 2 3 33 5
...
...
I have used scanf,
Use binary I/O if you can. Text conversion can slow down the reading by several times. If you're using text I/O because it's easy to debug, consider again binary format, and use the od program (assuming you're on unix) to make it human-readable when needed.
Oh, another thing: there's AT&T's SFIO library, which stands for safer/faster file IO. You might also have some luck with that, but I doubt that you'll get the same kind of speedup as you will with binary format.
If your example is representative, that you indeed have a fixed format of five decimal numbers per line, I'd probably use a combination of fgets()
to read the lines, then a loop calling strtol()
to convert from string to integer.
That should be faster than scanf()
, while still clearer and more high-level than doing the string to integer conversion on your own.
Something like this:
typedef struct {
int number[5];
} LineOfNumbers;
int getNumbers(FILE *in, LineOfNumbers *line)
{
char buf[128]; /* Should be large enough. */
if(fgets(buf, sizeof buf, in) != NULL)
{
int i;
char *ptr, *eptr;
ptr = buf;
for(i = 0; i < sizeof line->number / sizeof *line->number; i++)
{
line->number[i] = (int) strtol(ptr, &eptr, 10);
if(eptr == ptr)
return 0;
ptr = eptr;
}
return 1;
}
return 0;
}
Note: this is untested (even uncompiled!) browser-written code. But perhaps useful as a concrete example.
fread will still return if you try to read more bytes than there are.
I have found on of the fastest ways to read file is like this:
/*seek end of file */ fseek(file,0,SEEK_END);
/*get size of file */ size = ftell(file);
/*seek start of file */ fseek(file,0,SEEK_SET);
/* make a buffer for the file */ buffer = malloc(1048576);
/*fread in 1MB at a time until you reach size bytes etc */
On modern computers put your ram to use and load the whole thing to ram, then you can easily work your way through the memory.
At the very least you should be using fread with block sizes as big as you can, and at least as big as the cache blocks or HDD sector size (4096 bytes minimum, I would use 1048576 as a minimum personally). You will find that with much bigger read requsts rfead is able to sequentially get a big stream in one operation. The suggestion here of some people to use 128 bytes is rediculous.... as you will end up with the drive having to seek all the time as the tiny delay between calls will cause the head to already be past the next sector which almost certainly has sequential data that you want.
Read a line at a time (if buffer not big enough for a line, expand and continue with larger buffer).
Then use dedicated functions (e.g. atoi) rather than general for conversion.
But, most of all, set up a repeatable test harness with profiling to ensure changes really do speed things up.
You use multiple read
s with a fixed size buffer till you hit end of file.
Use fgets(...) to pull in a line at a time. Note that you should check for the '\n' at the end of the line, and if there is not one, you are either at EOF, or you need to read another buffer's worth, and concatenate the two together. Lather, rinse, repeat. Don't get caught with a buffer overflow.
THEN, you can parse each logical line in memory yourself. I like to use strspn(...) and strcspn(...) for this sort of thing, but your mileage may vary.
Parsing: Define a delimiters string. Use strspn() to count "non data" chars that match the delimiters, and skip over them. Use strcspn() to count the "data" chars that DO NOT match the delimiters. If this count is 0, you are done (no more data in the line). Otherwise, copy out those N chars to hand to a parsing function such as atoi(...) or sscanf(...). Then, reset your pointer base to the end of this chunk and repeat the skip-delims, copy-data, convert-to-numeric process.