Regex speed: Python x6 times faster than C++11 under VS2013?

后端 未结 2 714
独厮守ぢ
独厮守ぢ 2021-01-02 03:53

Could it be that python\'s C regex implementation is 6 times faster or am I missing something ?

Python version:

import re
r=re.comp         


        
相关标签:
2条回答
  • 2021-01-02 04:15

    The first thing to note is that in Python, regex (whether using the re, or regex module) occurs 'at the speed of c', that is the actual heavy lifting code is cold hard c, and thus at least for longer strings the performance is going to depend on the c regexp implementation.

    Sometimes python is pretty clever, python has no trouble performing in the vicinity of tens of millions of operations per second and it can create millions of objects per second - this is a thousand times slower than c, but if we're talking something that takes microseconds to begin with, the python overhead may not really matter, it will only add 0.1 microseconds to each function call.

    So in this case the relative slowness of Python doesn't matter. It's fast enough in absolute terms that what matters is how fast the regular expression functions do their thing.

    I rewrote the c++ case to be not subject to any criticisms (I hope, feel free to point out any), in fact it doesn't even need to create a match object as search simply returns a bool (true/false):

    #include <regex>
    #include <iostream>
    
    int main(int argc, char * argv[])
    {
        std::string s = "prefixdfadfadf adf adf adf adf he asdf dHello Regex 123";
        std::regex my(R"((HELLO).+?(\d+))", std::regex_constants::icase);
    
        int matches = 0;
        for (int i = 0; i < 1000000; ++i)
            matches += std::regex_search(s, my);
    
    
        std::cout << matches  << std::endl;
        return 0;
    }
    

    I wrote a comparable python program (although python did create and return a match object) and my results were exactly the same as yours

    c++   : 6.661s
    python: 1.039s
    

    I think the basic conclusion here is that Python's regex implementation simply thrashes the c++ standard library one.

    It thrashes Go too

    A while back just for fun I compared Python's regex performance with Go's regex performance. And python was at least twice as fast.

    The conclusion is that python's regexp implementation is very good and you should certainly not look outside Python to get improved regexp performance. The work regular expression do is fundamentally time consuming enough that Python's overhead doesn't really matter in the slightest and Python's got a great implementation (and the new regex module is often even faster than re).

    0 讨论(0)
  • 2021-01-02 04:15

    Using timeit to do benchmarks is wrong since it gives you best of 3 and not a statistical difference test.

    It's your code, not the language.

    1. Passing the function as a std::function will make the C++ code slower;
    2. Calling clock functions in every iterations;
    3. Creating new objects, such as the std::smatch match; in each iteration;
    4. The run function;
    5. Not precompiling the regex.

    I also wonder what optimization you are running with.

    The run() function is doing too much. Fix that. :)

    0 讨论(0)
提交回复
热议问题