std::cin really slow

旧巷老猫 提交于 2019-12-04 15:13:57

问题


So I was trying to write myself a command for a linux pipeline. Think of it as a replica of gnu 'cat' or 'sed', that takes input from stdin, does some processing and writes to stdout.

I originally wrote an AWK script but wanted more performance so I used the following c++ code:

std::string crtLine;
crtLine.reserve(1000);
while (true)
{
    std::getline(std::cin, crtLine);
    if (!std::cin) // failbit (EOF immediately found) or badbit (I/O error)
        break;

    std::cout << crtLine << "\n";
}

This is exactly what cat (without any parameters does). As it turns out, this program is about as slow as its awk counterpart, and nowhere near as fast as cat.

Testing on a 1GB file:

$time cat 'file' | cat | wc -l
real    0m0.771s

$time cat 'file' | filter-range.sh | wc -l
real    0m44.267s

Instead of getline(istream, string) I tried cin.getline(buffer, size) but no improvements. This is embarassing, is it a buffering issue? I also tried fetching 100KB at a time instead of just one line, no help! Any ideas?

EDIT: What you folks say makes sense, BUT the culprit is not string building/copying and neither is scanning for newlines. (And neither is the size of the buffer). Take a look at these 2 programs:

char buf[200];
while (fgets(buf, 200, stdin))
    std::cout << buf;

$time cat 'file' | ./FilterRange > /dev/null
real    0m3.276s




char buf[200];
while (std::cin.getline(buf, 200))
    std::cout << buf << "\n";

$time cat 'file' | ./FilterRange > /dev/null
real    0m55.031s

Neither of them manipulate strings and both of them do newline scanning, however one is 17 times slower than the other. They differ only by the use of cin. I think we can safely conclude that cin screws up the timing.


回答1:


This is exactly what cat (without any parameters does).

Not really. This has exactly the same effect as /bin/cat, but it does not use the same method.

/bin/cat looks more like this:

while( (readSize = read(inFd, buffer, sizeof buffer)) > 0)
  write(outFd, buffer, readSize);

Notice that /bin/cat does no processing on its input. It doesn't build a std::string out of it, it doesn't scan it for \n, it just does one system call after another.

Your program, on the other hand, builds strings, make copies of them, scans for \n, etc, etc.

This small, complete program runs 2-3 orders of magnitude slower than /bin/cat:

#include <string>
#include <iostream>

int main (int ac, char **av) {
  std::string crtLine;
  crtLine.reserve(1000);
  while(std::getline(std::cin, crtLine)) {
    std::cout << crtLine << "\n";
  }
}

I timed it thus:

$ time ./x < inputFile > /dev/null
$ time /bin/cat < inputFile > /dev/null


EDIT This program gets within 50% of the performance of /bin/cat:
#include <string>
#include <iostream>
#include <vector>

int main (int ac, char **av) {
  std::vector<char> v(4096);
  do {
    std::cin.read(&v[0], v.size());
    std::cout.write(&v[0], std::cin.gcount());
  } while(std::cin);
}

In short, if your requirement is to perform line-by-line analysis of the input, then you will have to pay some price to use formatted input. If, on the other hand, you need to perform byte-by-byte analysis, then you can use unformatted input and go faster.




回答2:


The first thing you want to do to get good performance for the standard I/O stream objects it turn off synchronization with the standard C stream objects:

std::ios_base::sync_with_stdio(false);

Once you have done this you should get much better performance. Whether you get good performance is a different question though.

Since some people claimed funny things about what cat would do inside, here is what is supposed to be the fastest approach to copy one stream to another:

std::cout << std::cin.rdbuf();

I would love if the you could properly std::copy() one stream to another but this won't work too well with most I/O stream implementations:

std::copy(std::istreambuf_iterator<char>(std::cin), std::istreambuf_iterator<char>(),
          std::ostreambuf_iterator<char>(std::cout));

I hope I get to this being the best eventually...




回答3:


If you really would like to have much better performance with stdin you should try to use pure C.

vector<char> line(0x1000);
while(!feof(stdin))
    fgets(&line.front(), line.size(), stdin);



回答4:


I think the faster solution will be based on sendfile



来源:https://stackoverflow.com/questions/9025093/stdcin-really-slow

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!