问题
So I was trying to write myself a command for a linux pipeline. Think of it as a replica of gnu 'cat' or 'sed', that takes input from stdin, does some processing and writes to stdout.
I originally wrote an AWK script but wanted more performance so I used the following c++ code:
std::string crtLine;
crtLine.reserve(1000);
while (true)
{
std::getline(std::cin, crtLine);
if (!std::cin) // failbit (EOF immediately found) or badbit (I/O error)
break;
std::cout << crtLine << "\n";
}
This is exactly what cat (without any parameters does). As it turns out, this program is about as slow as its awk counterpart, and nowhere near as fast as cat.
Testing on a 1GB file:
$time cat 'file' | cat | wc -l
real 0m0.771s
$time cat 'file' | filter-range.sh | wc -l
real 0m44.267s
Instead of getline(istream, string) I tried cin.getline(buffer, size) but no improvements. This is embarassing, is it a buffering issue? I also tried fetching 100KB at a time instead of just one line, no help! Any ideas?
EDIT: What you folks say makes sense, BUT the culprit is not string building/copying and neither is scanning for newlines. (And neither is the size of the buffer). Take a look at these 2 programs:
char buf[200];
while (fgets(buf, 200, stdin))
std::cout << buf;
$time cat 'file' | ./FilterRange > /dev/null
real 0m3.276s
char buf[200];
while (std::cin.getline(buf, 200))
std::cout << buf << "\n";
$time cat 'file' | ./FilterRange > /dev/null
real 0m55.031s
Neither of them manipulate strings and both of them do newline scanning, however one is 17 times slower than the other. They differ only by the use of cin. I think we can safely conclude that cin screws up the timing.
回答1:
This is exactly what cat (without any parameters does).
Not really. This has exactly the same effect as /bin/cat, but it does not use the same method.
/bin/cat
looks more like this:
while( (readSize = read(inFd, buffer, sizeof buffer)) > 0)
write(outFd, buffer, readSize);
Notice that /bin/cat
does no processing on its input. It doesn't build a std::string
out of it, it doesn't scan it for \n
, it just does one system call after another.
Your program, on the other hand, builds string
s, make copies of them, scans for \n
, etc, etc.
This small, complete program runs 2-3 orders of magnitude slower than /bin/cat:
#include <string>
#include <iostream>
int main (int ac, char **av) {
std::string crtLine;
crtLine.reserve(1000);
while(std::getline(std::cin, crtLine)) {
std::cout << crtLine << "\n";
}
}
I timed it thus:
$ time ./x < inputFile > /dev/null
$ time /bin/cat < inputFile > /dev/null
EDIT This program gets within 50% of the performance of /bin/cat:
#include <string>
#include <iostream>
#include <vector>
int main (int ac, char **av) {
std::vector<char> v(4096);
do {
std::cin.read(&v[0], v.size());
std::cout.write(&v[0], std::cin.gcount());
} while(std::cin);
}
In short, if your requirement is to perform line-by-line analysis of the input, then you will have to pay some price to use formatted input. If, on the other hand, you need to perform byte-by-byte analysis, then you can use unformatted input and go faster.
回答2:
The first thing you want to do to get good performance for the standard I/O stream objects it turn off synchronization with the standard C stream objects:
std::ios_base::sync_with_stdio(false);
Once you have done this you should get much better performance. Whether you get good performance is a different question though.
Since some people claimed funny things about what cat
would do inside, here is what is supposed to be the fastest approach to copy one stream to another:
std::cout << std::cin.rdbuf();
I would love if the you could properly std::copy()
one stream to another but this won't work too well with most I/O stream implementations:
std::copy(std::istreambuf_iterator<char>(std::cin), std::istreambuf_iterator<char>(),
std::ostreambuf_iterator<char>(std::cout));
I hope I get to this being the best eventually...
回答3:
If you really would like to have much better performance with stdin you should try to use pure C.
vector<char> line(0x1000);
while(!feof(stdin))
fgets(&line.front(), line.size(), stdin);
回答4:
I think the faster solution will be based on sendfile
来源:https://stackoverflow.com/questions/9025093/stdcin-really-slow