Could it be that python\'s C regex implementation is 6 times faster or am I missing something ?
Python version:
import re
r=re.comp
The first thing to note is that in Python, regex (whether using the re
, or regex
module) occurs 'at the speed of c', that is the actual heavy lifting code is cold hard c, and thus at least for longer strings the performance is going to depend on the c regexp implementation.
Sometimes python is pretty clever, python has no trouble performing in the vicinity of tens of millions of operations per second and it can create millions of objects per second - this is a thousand times slower than c, but if we're talking something that takes microseconds to begin with, the python overhead may not really matter, it will only add 0.1 microseconds to each function call.
So in this case the relative slowness of Python doesn't matter. It's fast enough in absolute terms that what matters is how fast the regular expression functions do their thing.
I rewrote the c++ case to be not subject to any criticisms (I hope, feel free to point out any), in fact it doesn't even need to create a match object as search simply returns a bool (true/false):
#include <regex>
#include <iostream>
int main(int argc, char * argv[])
{
std::string s = "prefixdfadfadf adf adf adf adf he asdf dHello Regex 123";
std::regex my(R"((HELLO).+?(\d+))", std::regex_constants::icase);
int matches = 0;
for (int i = 0; i < 1000000; ++i)
matches += std::regex_search(s, my);
std::cout << matches << std::endl;
return 0;
}
I wrote a comparable python program (although python did create and return a match object) and my results were exactly the same as yours
c++ : 6.661s python: 1.039s
I think the basic conclusion here is that Python's regex implementation simply thrashes the c++ standard library one.
A while back just for fun I compared Python's regex performance with Go's regex performance. And python was at least twice as fast.
The conclusion is that python's regexp implementation is very good and you should certainly not look outside Python to get improved regexp performance. The work regular expression do is fundamentally time consuming enough that Python's overhead doesn't really matter in the slightest and Python's got a great implementation (and the new regex
module is often even faster than re
).
Using timeit to do benchmarks is wrong since it gives you best of 3 and not a statistical difference test.
It's your code, not the language.
std::function
will make the C++ code slower;std::smatch
match; in each iteration;I also wonder what optimization you are running with.
The run()
function is doing too much. Fix that. :)