It's not merely the fact that Python code is interpreted which makes it slower, although that definitely sets a limit to how fast you can get.
If the bytecode-centric perspective were right, then to make Python code as fast as C all you'd have to do is replace the interpreter loop with direct calls to the functions, eliminating any bytecode, and compile the resulting code. But it doesn't work like that. You don't have to take my word for it, either: you can test it for yourself. Cython converts Python code to C, but a typical Python function converted and then compiled doesn't show C-level speed. All you have to do is look at some typical C code thus produced to see why.
The real challenge is multiple dispatch (or whatever the right jargon is -- I can't keep it all straight), by which I mean the fact that whereas a+b
if a
and b
are both known to be integers or floats can compile down to one op in C, in Python you have to do a lot more to compute a+b
(get the objects that the names are bound to, go via __add__
, etc.)
This is why to make Cython reach C speeds you have to specify the types in the critical path; this is how Shedskin makes Python code fast using (Cartesian product) type inference to get C++ out of it; and how PyPy can be fast -- the JIT can pay attention to how the code is behaving and specialize on things like types. Each approach eliminates dynamism, whether at compile time or at runtime, so that it can generate code which knows what it's doing.