Increase C++ regex replace performance

后端 未结 2 613
时光取名叫无心
时光取名叫无心 2021-02-08 07:03

I\'m a beginner C++ programmer working on a small C++ project for which I have to process a number of relatively large XML files and remove the XML tags out of them. I\'ve succe

2条回答
  •  难免孤独
    2021-02-08 07:38

    I don't think you're doing anything "wrong" per-say, the C++ regex library just isn't as fast as the python one (for this use case at this time at least). This isn't too surprising, keeping in mind the python regex code is all C/C++ under the hood as well, and has been tuned over the years to be pretty fast as that's a fairly important feature in python, so naturally it is going to be pretty fast.

    But there are other options in C++ for getting things faster if you need. I've used PCRE ( http://pcre.org/ ) in the past with great results, though I'm sure there are other good ones out there these days as well.

    For this case in particular however, you can also achieve what you're after without regexes, which in my quick tests yielded a 10x performance improvement. For example, the following code scans your input string copying everything to a new buffer, when it hits a < it starts skipping over characters until it sees the closing >

    std::string buffer(size, ' ');
    std::string outbuffer(size, ' ');
    
    ... read in buffer from your file
    
    size_t outbuffer_len = 0;
    for (size_t i=0; i < buffer.size(); ++i) {
        if (buffer[i] == '<') {
            while (buffer[i] != '>' && i < buffer.size()) {
                ++i;
            }
        } else {
            outbuffer[outbuffer_len] = buffer[i];
            ++outbuffer_len;
        }
    }
    outbuffer.resize(outbuffer_len);
    

提交回复
热议问题