Buffered reading from stdin using fread in C

前端未结

关注

 11  1476

I am trying to efficiently read from the stdin by using setvbuf in `_IOFBF~ mode. I am new to buffering. I am looking for working examples

相关标签:

11条回答

感情败类

2020-12-24 04:20

The reason all this permature optimisation has a negligable effect on the runtime is that in *nix and windows type operating systems the OS handles all I/O to and from the file system and implements 30 years worth of research, trickery and deviousness to do this very efficiently.

The buffering you are trying to control is merely the block of memory used by your program. So any increases in speed will be minimal (the effect of doing 1 large 'mov' verses 6 or 7 smaller 'mov' instructions).

If you really want to speed this up try "mmap" which allows you direct access the data in the file systems buffer.

0 讨论(0)
发布评论:

提交评论
- 加载中...
情歌与酒

2020-12-24 04:21

Mabe also take a look at this getline implementation:

http://www.cpax.org.uk/prg/portable/c/libs/sosman/index.php

(An ISO C routine for getting a line of data, length unknown, from a stream.)

0 讨论(0)
发布评论:

提交评论
- 加载中...

有刺的猬

2020-12-24 04:22

If you are after out-and-out speed and you work on a POSIX-ish platform, consider using memory mapping. I took Sinan's answer using standard I/O and timed it, and also created the program below using memory mapping. Note that memory mapping will not work if the data source is a terminal or a pipe and not a file.

With one million values between 0 and one billion (and a fixed divisor of 17), the average timings for the two programs was:

standard I/O: 0.155s
memory mapped: 0.086s

Roughly, memory mapped I/O is twice as fast as standard I/O.

In each case, the timing was repeated 6 times, after ignoring a warm-up run. The command lines were:

time fbf < data.file    # Standard I/O (full buffering)
time mmf < data.file    # Memory mapped file I/O

#include <ctype.h>
#include <errno.h>
#include <limits.h>
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/stat.h>

static const char *arg0 = "**unset**";
static void error(const char *fmt, ...)
{
    va_list args;
    fprintf(stderr, "%s: ", arg0);
    va_start(args, fmt);
    vfprintf(stderr, fmt, args);
    va_end(args);
    exit(EXIT_FAILURE);
}

static unsigned long read_integer(char *src, char **end)
{
    unsigned long v;
    errno = 0;
    v = strtoul(src, end, 0);
    if (v == ULONG_MAX && errno == ERANGE)
        error("integer too big for unsigned long at %.20s", src);
    if (v == 0 && errno == EINVAL)
        error("failed to convert integer at %.20s", src);
    if (**end != '\0' && !isspace((unsigned char)**end))
        error("dubious conversion at %.20s", src);
    return(v);
}

static void *memory_map(int fd)
{
    void *data;
    struct stat sb;
    if (fstat(fd, &sb) != 0)
        error("failed to fstat file descriptor %d (%d: %s)\n",
              fd, errno, strerror(errno));
    if (!S_ISREG(sb.st_mode))
        error("file descriptor %d is not a regular file (%o)\n", fd, sb.st_mode);
    data = mmap(0, sb.st_size, PROT_READ, MAP_PRIVATE, fileno(stdin), 0);
    if (data == MAP_FAILED)
        error("failed to memory map file descriptor %d (%d: %s)\n",
              fd, errno, strerror(errno));
    return(data);
}

int main(int argc, char **argv)
{
    char *data;
    char *src;
    char *end;
    unsigned long k;
    unsigned long n;
    unsigned long answer = 0;
    size_t i;

    arg0 = argv[0];
    data = memory_map(0);

    src = data;

    /* Read control data */
    n = read_integer(src, &end);
    src = end;
    k = read_integer(src, &end);
    src = end;

    for (i = 0; i < n; i++, src = end)
    {
        unsigned long v = read_integer(src, &end);
        if (v % k == 0)
            answer++;
    }

    printf("%lu\n", answer);
    return(0);
}

0 讨论(0)

情话喂你

2020-12-24 04:23

Version 1 : Using getchar_unlocked as suggested by R Samuel Klatchko (see comments)

#define BUFSIZE 32*1024
int main(){
  int lines, number=0, dividend, ans=0;
  char c;
  setvbuf(stdin, (char*)NULL, _IOFBF, 0);// full buffering mode
  scanf("%d%d\n", &lines, ÷nd);
  while(lines>0){
    c = getchar_unlocked();
    //parse the number using characters
    //each number is on a separate line
    if(c=='\n'){
      if(number % dividend == 0)    ans += 1;
      lines -= 1;
      number = 0;
    }
    else
      number = c - '0' + 10*number;
  }

  printf("%d are divisible by %d \n", ans, dividend);
  return 0;
}

Version 2: Using fread to read a block and parsing number from it.

#define BUFSIZE 32*1024
int main(){
int lines, number=0, dividend, ans=0, i, chars_read;
char buf[BUFSIZE+1] = {0}; //initialise all elements to 0
scanf("%d%d\n",&lines, &dividend);

while((chars_read = fread(buf, 1, BUFSIZE, stdin)) > 0){
  //read the chars from buf
  for(i=0; i < chars_read; i++){
    //parse the number using characters
    //each number is on a separate line
    if(buf[i] != '\n')
      number = buf[i] - '0' + 10*number;
    else{
      if(number%dividend==0)    ans += 1;
      lines -= 1;
      number = 0;
    }       
  }

if(lines==0)  break;
}

printf("%d are divisible by %d \n", ans, dividend);
return 0;
}

Results: (10 million numbers tested for divisibility by 11)

Run 1: ( Version 1 without setvbuf ) 0.782 secs
Run 2: ( Version 1 with setvbuf ) 0.684 secs
Run 3: ( Version 2 ) 0.534

P.S. - Every run compiled with GCC using -O1 flag

0 讨论(0)

悲哀的现实

2020-12-24 04:27
One thing that I find confusing is why you are both enabling full buffering within the stream object via the call to setvbuf and doing your own buffering by reading a full buffer into buf.

I understand the need to do buffering, but that is a bit overkill.

I'm going to recommend you stick with setvbuf and remove your own buffering. The reason why is that implementing your own buffering can be tricky. The problem is what will happen when a token (in your case a number) straddles the buffer boundary. For example, let's say your buffer is 8 bytes (9 bytes total for trailing NULL) and your input stream looks like
```
12345 12345
```
The first time you fill the buffer you get:
```
"12345 12"
```
while the second time you fill the buffer you get:
```
"345"
```
Proper buffering requires you to handle that case so you treat the buffer as the two numbers {12345, 12345} and not three numbers {12345, 12, 234}.

Since stdio handles that already for you, just use that. Continue to call setvbuf, get rid of the fread and use scanf to read individual numbers from the input stream.
0 讨论(0)
发布评论:

提交评论
- 加载中...
南方客

2020-12-24 04:32

Well, right off the top, scanf("%d%d",&n,&k) will shove a value into n only and silently leave k unset - You'd see this if you checked the return value of scanf(), which tells you how many variables it filled. I think you want scanf("%d %d",&n,&k) with the space.

Second, n is the number of iterations to run, but you test for "n>0" yet never decrement it. Ergo, n>0 is always true and the loop won't exit.

As someone else mentioned, feeding stdin over a pipe causes the loop to exit because the end of stdin has an EOF, which causes fread() to return NULL, exiting the loop. You probably want to add an "n=n-1" or "n--" somewhere in there.

Next, in your sscanf, %n is not really a standard thing; I'm not sure what it's meant to do, but it may do nothing: scanf() generally stops parsing at the first unrecognized format identifier, which does nothing here (since you already got your data,) but is bad practice.

Finally, if performance is important, you'd be better off not using fread() etc at all, as they're not really high performance. Look at isdigit(3) and iscntrl(3) and think about how you could parse the numbers from a raw data buffer read with read(2).

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页