I\'m a beginner C++ programmer working on a small C++ project for which I have to process a number of relatively large XML files and remove the XML tags out of them. I\'ve succe
I don't think you're doing anything "wrong" per-say, the C++ regex library just isn't as fast as the python one (for this use case at this time at least). This isn't too surprising, keeping in mind the python regex code is all C/C++ under the hood as well, and has been tuned over the years to be pretty fast as that's a fairly important feature in python, so naturally it is going to be pretty fast.
But there are other options in C++ for getting things faster if you need. I've used PCRE ( http://pcre.org/ ) in the past with great results, though I'm sure there are other good ones out there these days as well.
For this case in particular however, you can also achieve what you're after without regexes, which in my quick tests yielded a 10x performance improvement. For example, the following code scans your input string copying everything to a new buffer, when it hits a <
it starts skipping over characters until it sees the closing >
std::string buffer(size, ' ');
std::string outbuffer(size, ' ');
... read in buffer from your file
size_t outbuffer_len = 0;
for (size_t i=0; i < buffer.size(); ++i) {
if (buffer[i] == '<') {
while (buffer[i] != '>' && i < buffer.size()) {
++i;
}
} else {
outbuffer[outbuffer_len] = buffer[i];
++outbuffer_len;
}
}
outbuffer.resize(outbuffer_len);